1 (edited by damaxxed 2015-09-12 20:20)

Topic: [solved] Duplicate content problem & SOLUTION

Hey

I used to love the simplicity and style of the old PunBB and now I love it even more smile Thanks for the new version.. I would like to contribute a little bit to the success of PunBB.

We all know that search engines dislike duplicate content. And we all know that PunBB produces a lot of duplicate content.. Just take a look at viewtopic.php?id=1. The topic view contains permalinks to each post - like viewtopic.php?pid=1#p1, viewtopic.php?pid=2#p2, ... On a full page, this means 26x the same content - not really healthy for the Pagerank.

My idea to solve it: No, please don't remove the permalinks, they're extremely handy. My solution is based on the tools, search engines provide us with. Simply add a rel="nofollow" to every post-permalink and a <meta name="robots" content="noindex" /> to each viewtopic.php?pid=X header. Now search engines won't scan the post-permalinked-pages and will only detect the content once => better rank

These are maybe 15 minutes of work, but definitely a boost of googliking smile

I'd be very happy if a hard-working dev would implement my idea.

Thanks

//edit: Damn sorry, The idea striked me as I crawled through the wrong board :-/
//edit2: Thanks to Neal for implementing it smile

Re: [solved] Duplicate content problem & SOLUTION

Moved to 1.3 discussion

Re: [solved] Duplicate content problem & SOLUTION

I was actually already considering an idea like this, to add a NOINDEX, FOLLOW tag when pid is used. I don't think it's necessary to add rel=nofollow.

Re: [solved] Duplicate content problem & SOLUTION

No, it's not necessary to use rel="nofollow", I just wanted to save Googlebot's time ;-)

Yeah, NOINDEX is enough, but FOLLOW is really needless, because
  1) afaik FOLLOW is standard and you don't need to explicitly use it
  2) i can't imagine any single case, where the permalinked-post-page links to a page, which isn't linked by the topic view

Re: [solved] Duplicate content problem & SOLUTION

Googlebot follow the "nofollow" links. It just doesn't index them publicly.

Re: [solved] Duplicate content problem & SOLUTION

damaxxed wrote:

No, it's not necessary to use rel="nofollow", I just wanted to save Googlebot's time ;-)

Yeah, NOINDEX is enough, but FOLLOW is really needless, because
  1) afaik FOLLOW is standard and you don't need to explicitly use it
  2) i can't imagine any single case, where the permalinked-post-page links to a page, which isn't linked by the topic view

We already have NOINDEX, FOLLOW built in to PunBB (we use it on certain pages), which is why I was going to use it.
As for #1, you might be right (I've read that the default is INDEX, FOLLOW without knowing whether specifying only one affects the other), but it's better to explicitly state it. Otherwise, a search engine could choose to assume "If I'm not supposed to be indexing this page, I shouldn't be following links from it either."

7 (edited by damaxxed 2008-03-17 00:43)

Re: [solved] Duplicate content problem & SOLUTION

Yeah, you're right.. Strangely, there is no kind of specification for the META-tag out there and each search engine seems to interpret the <meta name="robots"-tag different.. Your idea is better, it's safer to include it, even if some search engine won't understand it - it still should understand the rest of the meta-tag

//edit: could it be possible, that you integrate a redirection (HTTP status code 301 Moved Permanently), when PunBB detects, that it is called from a wrong URL, e.g.

//config.php
$base_url = 'http://example.net/punbb';

but the requested URL is the following: http://www.example.net/punbb

Re: [solved] Duplicate content problem & SOLUTION

I would leave that up to an extension/mod_rewrite to deal with, since there can be legitimate reasons to have a site accessible from two URLs.

Re: [solved] Duplicate content problem & SOLUTION

Yes, you're right. I created an extension for it smile

Big thanks to Neal, who implemented my idea, but there's a tiny mistake of beauty:

It's like that in the code

<meta name="ROBOTS" content="NOINDEX, FOLLOW" />

but afaik it's standard to write it in lowercase, like the following:

<meta name="robots" content="noindex, follow" />

10

Re: [solved] Duplicate content problem & SOLUTION

What about when fancy URLs are on and yet stuff is still accessible through links like viewtopic.php?id=1?

I think if a page works but it wasn't accessed using the current url scheme then it should 301 redirect to the same page but using the correct url. This will also prevent people losing rank if they ever turn fancy urls on or off and so on.

Re: [solved] Duplicate content problem & SOLUTION

damaxxed: It doesn't make any difference wink
Kyle: People have proposed that before, but it's much easier said than done. I can't even think of a decent, efficient way to detect which scheme is being used.

Re: [solved] Duplicate content problem & SOLUTION

Check the config? hmm

Re: [solved] Duplicate content problem & SOLUTION

Sorry, I should have been more specific. I meant based on a given request. So, a link between user/2/identity/ and the folder based URL schemes.

14

Re: [solved] Duplicate content problem & SOLUTION

Can't you use pun_link or whatever it is to generate the link for the current page from the id, title, etc and then compare it to the request_uri?

Re: [solved] Duplicate content problem & SOLUTION

Nice idea, but using pun_link would require knowledge of the right key in the $pun_url array, which we don't have.