1

Topic: punbb getting spidered and indexed

I did the following search:

http://www.google.com/search?hl=en& … gle+Search

to find other forums that google have found. I went through a lot of the results and went digging to find out what pages were being indexed successfully: I want to see how spiderable the current URL format is...or not - and whether the whole mod_rewrite thing is really necessary.

My conclusions were quite clear: user profile pages get indexed in greater numbers and far earlier than any other type of page - including the all important viewtopic pages. Why should this be?

Examples:

http://www.google.com/search?hl=en& … dersson%22

glop.org - 129 pages in index - half of which user profiles - 144 actual posts on forum


this site, punbb, 9000 pages out of 34000 posts


http://www.google.com/search?hl=en& … dersson%22

worrying...2800 posts on this forum, only 73 pages on google and almost all of them user profiles.

if you go through the list of the initial query, you will see the same thing repeated again and again. Any reasons why this should be?

Re: punbb getting spidered and indexed

every thread has links to the profiles, doesn't google rank 'linked to' pages higher than those not linked to?

3

Re: punbb getting spidered and indexed

yeah, but every thread also has links to the forum the post is on, and the index forum

I would like to add "ref=nofollow" (recent anti blog spam addition by the major search engines) to the profile links I think...is there an easy way to do this? What file would I have to edit?

4

Re: punbb getting spidered and indexed

I see that the profile of each logged in user has robots noindex nofollow tags, but not the pages under the "User List" link....how could I adjust that??

Re: punbb getting spidered and indexed

Interesting. Actually, it's "noindex, follow", not nofollow. The plan behind the meta tag was to prevent spiders to index e.g. post.php and the like. The profile view (when you're looking at someone elses profile and don't have rights to edit it) does not have the noindex meta tag because I thought indexing that view made sense. Searching for someone's username would return a hit pointing to his/her profile. Maybe it doesn't make sense. Maybe adding the meta tag to the userlist also makes more sense. I'm not an expert on search engine indexing.

Removing it from the user list is very easy, just remove the line define('PUN_ALLOW_INDEX', 1); from the script.

"Programming is like sex: one mistake and you have to support it for the rest of your life."

Re: punbb getting spidered and indexed

i think the profiles should be indexed, i do search for people occasionally and forum profiles is usually the best way to find people

7

Re: punbb getting spidered and indexed

right, I have done that...and I think it should be considered for the future changes to the script too. There has to be some reason why user profile pages are being indexed so much easier than other pages, all other things being equal.

8

Re: punbb getting spidered and indexed

Perhaps consider GUI's mod_rewrite mod?

http://www.google.com/search?q=site%3Aw … com+forums

Re: punbb getting spidered and indexed

The number of results is not an accurate way of determining what Google index more than what.

Re: punbb getting spidered and indexed

You could put a robots.txt file in the folder where punBB is, and set it to only index the home and view posts/topics pages. Indexing the user profiles is really a privacy issue. Who wants their MSN Messenger address in Google? Not me. Also, people are getting concerned about their phone number in Google's database.