1 (edited by pedrotuga 2007-01-01 12:05)

Topic: Punmap - html sitemap with fancy urls UPDATED - final release

Hey mates. This module generates an html sitemap with fancy HTML for SEO purposes. There is no need to go beyond the basic style included as the fancy urls will be used as landing pages from the search engines. Is not likely that a user would browse the archives.

You can see a demo here:
http://shareminer.com/forum/archive

This is vey simple to install. Simply upload these two files to  you punbb root directory.

EDIT: final release ready! big_smile
All the problems fixed.
I get three warnings on my validations but they simply don't make sense. I should check them up somewhere else.
The last code added its a bit spaggeti code, i was not expecting the default permitions not to be listed in the database, so i had to improvise. Still, in general i think it's simple code.

WARNING: If you are updating this script make shure you update both punmap.php and .htaccess

Smartys, feel free to list this module on punres.

punmap.php

<?php

define('PUN_QUIET_VISIT', 1);
define('PUN_ROOT', './');
require PUN_ROOT.'include/common.php';
$num_links = 100;

function ouptut_html_head($maintitle){
    ?>
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
    <html>
    <head>
    <title><?php echo $maintitle; ?></title>
    <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
    <style type="text/css">
        body,h1,h2,h3,h4,h5,h6    {background-color:#fff;color:black;font-family:Verdana;font-size:11px}
        h1,h2,h3,h4,h5,h6 {display:inline}
        h1{font-size:160%}
        h2{font-size:125%}
        h3{font-size:120%}
        h4{font-size:115%}
        h5{font-size:110%}
        h6{font-size:105%}
        a{text-decoration:none}
        a:hover {text-decoration:underline}
        a.description{color:black}
    
    </style>
    </head>
    <body>
    <?php
}



function clean_url($text){
    
    $text = ereg_replace("[^A-Za-z\ ]", "", $text);
    $text = str_replace (" ","-", $text );
    return $text;
}


if (!empty($_GET["forum"])){
    $forum_id = intval($_GET["forum"]);
    
    $result13  = $db->query("select read_forum from ".$db->prefix."forum_perms where forum_id=$forum_id and group_id=3 and read_forum=0");
    if($db->num_rows($result13)!=0){
        echo "permission denied";
        exit;
    }
    
    $result3  = $db->query("select forum_name from ".$db->prefix."forums where id=$forum_id");
    
    
    if($db->num_rows($result3)==0){ 
        echo "inexistent or empty forum";
    }
    else{
        //output the html head
        $forum_name_array     = $db->fetch_assoc($result3);
        $displaytitle       = $pun_config["o_board_title"]." // ".$forum_name_array["forum_name"];
        ouptut_html_head($displaytitle);
        
        //count the posts and paginate if necessary
        $result5            = $db->query("select count(*) num_posts from ".$db->prefix."topics where ".$db->prefix."forum_id=$forum_id");
        $num_posts_array    = $db->fetch_assoc($result5);
        $num_posts            = $num_posts_array["num_posts"];
        
        
        
        
        if($num_posts > 0){
            $paginate         = true;
            $num_pages        = ceil($num_posts/$num_links);
            $current_page    = intval($_GET["page"]);
            if(!isset($current_page) or $current_page < 1 or $current_page > $num_pages){
                $current_page = 1;
            }
            
            $lower_limit         = ($current_page - 1) * $num_links;
            
            $query_suffix        = "limit $lower_limit, $num_links";
            $page_navigation    = "page <br />";
            for ($i = 1 ; $i <= $num_pages ; $i++){
                $url_stuffing = clean_url($forum_name_array["forum_name"]);
                if( $i != $current_page){
                $page_navigation .=" <a href=\"".$pun_config['o_base_url']."/$url_stuffing-".$forum_id."-".$i."/\">$i</a>\n";
                }
                else{
                $page_navigation .=" $i";
                }
            }
            
        }
        
        $result4 = $db->query("select t.id topic_id, t.subject from ".$db->prefix."topics t where t.forum_id=$forum_id order by last_post desc $query_suffix");
        
        //output the topic list
        echo "<body>";
        echo "<h1>".$forum_name_array["forum_name"]."</h1>";
        echo "<ul>";
        while( $topic = $db->fetch_assoc($result4) ){
            echo "<li>";
            $url_stuffing = clean_url($topic["subject"]);
            $topic["subject"]= pun_htmlspecialchars($topic["subject"]);
            echo "<a href=\"".$pun_config['o_base_url']."/$url_stuffing-".$topic["topic_id"].".htm\">".$topic["subject"]."</a>";
            echo "</li>\n";
        }
        echo "</ul>";
        echo $page_navigation;
        echo "</body>";
    }
}
else{    

    //get the categories and forums and output them
    $result = $db->query("SELECT id cat_id, cat_name from ".$db->prefix."categories order by disp_position");
    
    //check wich forums are not guest readable
    $result12 = $db->query("select p.forum_id id from ".$db->prefix."forum_perms p where p.group_id=3 and p.read_forum=0");
    while ($row = $db->fetch_assoc($result12)){
        $hidden_forums[] = $row["id"];
        $hidden_forums_query_suffix = "NOT IN (".implode(", ",$hidden_forums).")";
    }
    
    $displaytitle = $pun_config["o_board_title"];
    ouptut_html_head($displaytitle);
    
    echo "<h1>".$displaytitle."</h1>\n";
    echo "<ul>";
    while ($category = $db->fetch_assoc($result)){
        
        $result2 = $db->query("select id forum_id, forum_name, forum_desc from ".$db->prefix."forums f where f.cat_id=".$category["cat_id"]." and f.id $hidden_forums_query_suffix order by f.disp_position");
        
        if ($db->num_rows($result2)>0){
            echo "<li><h2>".$category["cat_name"]."</h2></li>\n";
            echo "<ul>\n";
            while ($forum = $db->fetch_assoc($result2)){
                $url_stuffing = clean_url($forum["forum_name"]);
                echo "<li><h2><a href=\"".$pun_config['o_base_url']."/".$url_stuffing."-".$forum["forum_id"]."/\">".$forum["forum_name"]."</a></h2><br /><a href=\"".$pun_config['o_base_url']."/".$url_stuffing."-".$forum["forum_id"].".html\" class=\"description\">".$forum["forum_desc"]."</a></li>\n";
            }
        }
        echo "</ul>\n";
    }
    echo "</ul>\n";
}

echo "</body>";
?>

.htaccess (if you already have an .htaccess in punbb root directory you probably dont need this module at all)

Options +FollowSymLinks
RewriteEngine on

RewriteRule ^(.*)-([0-9]*).html viewforum.php?id=$2
RewriteRule ^(.*)-([0-9]*)-([0-9]*)/ punmap.php?forum=$2&page=$3
RewriteRule ^(.*)-([0-9]*)/ punmap.php?forum=$2
RewriteRule ^(.*)-([0-9]*).htm viewtopic.php?id=$2
ReWriteRule archive punmap.php

2 (edited by Dr.Jeckyl 2006-12-22 20:12)

Re: Punmap - html sitemap with fancy urls UPDATED - final release

only problem is it lists restricted forums ie: staff only. other than that nice job.

~James
FluxBB - Less is more

Re: Punmap - html sitemap with fancy urls UPDATED - final release

There are also some SQL injects

if (!empty($_GET["forum"])){
    $forum_id = $_GET["forum"];
    $result3  = $db->query("select forum_name from ".$db->prefix."forums where id=$forum_id");

No sanitizing of variables is done

$result4         = $db->query("select t.id topic_id, t.subject from ".$db->prefix."topics t where t.forum_id=$forum_id order by t.last_post desc");

Same unsanitized variable, different query

Re: Punmap - html sitemap with fancy urls UPDATED - final release

Jeckyl, i still have to work a bit on the sql to fix that. It will be done wink

Smartys, i stop worrying long ago with injections since magic_quotes_gpc became widly use, but you are right, the injection possibility is there.

mmm... so... how should i check that variable?

maye is  is_numeric($forum_id) is enough

Re: Punmap - html sitemap with fancy urls UPDATED - final release

Smartys, i stop worrying long ago with injections since magic_quotes_gpc became widly use

Then you made a huge mistake: I don't need "s to abuse most SQL injects (including the one here) tongue

mmm... so... how should i check that variable?

maye is  is_numeric($forum_id) is enough

Just use intval wink

Re: Punmap - html sitemap with fancy urls UPDATED - final release

Thanks for the fast repply.
Allways learning smile

Just so i be aware in future ocasions, could you give me an example of an injection that could be made in this case without the "s?

Re: Punmap - html sitemap with fancy urls UPDATED - final release

0 union select password from users where id = 2
wink

Re: Punmap - html sitemap with fancy urls UPDATED - final release

i added

if(!intval($forum_id)){
     exit();
}

I think it's no use to complicate.
If you find any other issue please repply.
Thanks smartys

Re: Punmap - html sitemap with fancy urls UPDATED - final release

Or just use

    $forum_id = intval($_GET["forum"]);

10 (edited by pedrotuga 2006-12-23 13:53)

Re: Punmap - html sitemap with fancy urls UPDATED - final release

Smartys wrote:

0 union select password from users where id = 2
wink

mmmm damn, that wouldnt be dificult at all :S thank you.
But still, i am wondering, that would return the md5 hash which doesnt is not so dangerous itself.


There is another issue that i think it should be pfixed with priority. The titles parsing. Theya re allowing html. Wich function should i use? i guess the parser.php should have a sutable one...

Re: Punmap - html sitemap with fancy urls UPDATED - final release

MD5 can be brute-forced.

Re: Punmap - html sitemap with fancy urls UPDATED - final release

pun_htmlspecialchars

And as Bekko said, MD5 can be brute forced.

Re: Punmap - html sitemap with fancy urls UPDATED - final release

smile is getting fixed!
Thanks guys.

i dunno if i will time have to fix the rest until next week or so. Xmas... boring :S

Re: Punmap - html sitemap with fancy urls UPDATED - final release

Guys, just to notify everybody that this mod is complete big_smile

check it out in action here:
http://shareminer.com/forum/archive

15 (edited by Dr.Jeckyl 2007-01-01 09:56)

Re: Punmap - html sitemap with fancy urls UPDATED - final release

very nice.

edit: what is the high limit before pagination of links? or is it set the same as Topics per page default and Posts per page default?

edit 2: make the board title clickable back to the forum hompeage. not a big deal.

~James
FluxBB - Less is more

Re: Punmap - html sitemap with fancy urls UPDATED - final release

Dr.Jeckyl wrote:

very nice.

edit: what is the high limit before pagination of links? or is it set the same as Topics per page default and Posts per page default?

Adjust the variable $num_links to the value of your preference wink
It's in the first lines of the script.

Dr.Jeckyl wrote:

edit 2: make the board title clickable back to the forum hompeage. not a big deal.

Didnt thought about that. Will do smile

17

Re: Punmap - html sitemap with fancy urls UPDATED - final release

And maybe fix the problem with moved posts (for exemple, link 2 here : http://shareminer.com/forum/General-talk-3/)
Thank you for sharing your work.

Re: Punmap - html sitemap with fancy urls UPDATED - final release

abclf wrote:

And maybe fix the problem with moved posts (for exemple, link 2 here : http://shareminer.com/forum/General-talk-3/)
Thank you for sharing your work.

mmm... didn't thought about that either.
I will put that on  the TODO list too.

19 (edited by Dr.Jeckyl 2007-01-02 05:16)

Re: Punmap - html sitemap with fancy urls UPDATED - final release

this makes a great WAP access point too. tried it out on my new LG Fusic phone, loads fast and looks good.

~James
FluxBB - Less is more

Re: Punmap - html sitemap with fancy urls UPDATED - final release

Dr.Jeckyl wrote:

this makes a great WAP access point too. tried it out on my new LG Fusic phone, loads fast and looks good.

Cool! Didnt even thought about that.
A good idea for a mod, like a fork of this one, or an option on this mod or something, would be an alternative viewtopic.php with minimalist template suted for wap.

This only takes a little change on the .htaccess an an aditional display topic file.

21

Re: Punmap - html sitemap with fancy urls UPDATED - final release

This is very neat.

But a question: will using it incur you a pagerank penalty from Google for duplicate content?

If you run this on your site, alongside your vanilla punBB, you will be showing the same content to visitors and spiders, just with different re-written URLs.

The Google algo will be able to figure out that you are putting forward the same content in two places.

If you used it just as an archive, and not an alternative way of browsing active content, that would not present the same problem, of course.

Users concerned about this I guess could setup a robots.txt exclusion for this lo-fi version.

Re: Punmap - html sitemap with fancy urls UPDATED - final release

I don't think that's an issue. Vbulletin and invision have a low-fi version and that's the one that is iindexed by google all the time. I guess that's because of the fancy urls.
I don't think google will pull your page rank down because of duplicated content.

Ok, but there's something i should say about the example link: in my site i pointed the links to the printable dopic version instead. The reason why i did that is because i have ads on those that i don't mind to show to those who come from google. On the other hand i want my regular users to browse a clean ads-free forum.
But, if you download this mod as it is it these alternative simplistic pages will point to your actual topic pages. Which means you won't have duplicated content.
Valid xhtml and meaningful fancy urls, how cool is that?

Since i've been using this for a while it's actually a good time to share the results whith you guys.
This worked, my site gets indexede by google everyday. The non-fancy urls gets indexed first while the fancy one takes a while, but once it gets indexed to it brings a lot of visits from google. Pretty much every topic titile of my site gets to the top of google. So this thing really worked big_smile

23

Re: Punmap - html sitemap with fancy urls UPDATED - final release

Thanks for that info.

pedrotuga wrote:

I don't think that's an issue. Vbulletin and invision have a low-fi version and that's the one that is iindexed by google all the time. I guess that's because of the fancy urls.
I don't think google will pull your page rank down because of duplicated content.

But if you have a look at:

http://www.google.com/support/webmaster … &type=

forums with lo-fi versions are the first item on the list when Google's Help Center talks about duplicate content:)

Duplicate content generally refers to substantive blocks of content within or across domains that either completely match other content or are appreciably similar. Mostly, this is not deceptive in origin. Examples of non-malicious duplicate content could include:

Discussion forums that can generate both regular and stripped-down pages targeted at mobile devices..

So it pays to be careful in setting up mods like this. Google has some suggestions on how to address any issues if they arise on that link.

It sounds like it is working well for you though, which is great.

Re: Punmap - html sitemap with fancy urls UPDATED - final release

sirena wrote:

Thanks for that info.

pedrotuga wrote:

I don't think that's an issue. Vbulletin and invision have a low-fi version and that's the one that is iindexed by google all the time. I guess that's because of the fancy urls.
I don't think google will pull your page rank down because of duplicated content.

But if you have a look at:

http://www.google.com/support/webmaster … &type=

forums with lo-fi versions are the first item on the list when Google's Help Center talks about duplicate content:)

Duplicate content generally refers to substantive blocks of content within or across domains that either completely match other content or are appreciably similar. Mostly, this is not deceptive in origin. Examples of non-malicious duplicate content could include:

Discussion forums that can generate both regular and stripped-down pages targeted at mobile devices..

So it pays to be careful in setting up mods like this. Google has some suggestions on how to address any issues if they arise on that link.

It sounds like it is working well for you though, which is great.

Most software that offers that kind of option these days is very careful to avoid such issues by using robots.txt and meta tags to direct Google to spider the lo-fi version.

Re: Punmap - html sitemap with fancy urls UPDATED - final release

If you red the page you point carefully it mentions the forum lo-fi version as a non-malicious situation of duplicate content. It also say's the in those cases the spider tries topick the best version. I wonder what do they mean by "the best"...
If you read a bit further, this becomes an issue when content start to be duplicated across the domain's domain ( domain domain??? big_smile )

Also, what google says it's basically an utopia. They try to convince the world that they have control over what's spam, or duplicated content, but in fact they don't.

If yout don't believe me test it yourself. Test both solutions: a sitemap and a completely mod_rewritten site. The second one might take a while longer to get indexed but it will completely beat the first one when it comes to show up on top of google. I don't care so much about google sitemaps nor even page ranks. What matters to me is how much traffic i get from google, and i get a lot.

As a sidenote i should mention that i laugh like crazzy a few months ago when the web out there found out that deleting the google sitemap from their sites would give them an immediate gain on the number of indexed pages.

My advice is: don't trust google blabla and go for the old techniques.

This is an open surce project, if anybody feels like adding 'proper' regex filtering in the robots.txt, i even encourage it.