Topic: All those new mod_rewrite rules in 1.3dev .htaccess?

I was just browsing the dev tree for v1.3, and noticed the new .htaccess in the /extras folder.

There are quite a few new rewrite rules in there, like:

RewriteRule ^topic/([0-9]+)/page/([0-9]+)/?$ viewtopic.php?id=$1&p=$2 [L,NC]

They seem to be there to allow search engine friendly URLs, I guess.

I have a few questions:

(a) Will they slow down a site that uses them? There are quite a few rules - over 50 - for Apache to parse in there. Is there a server load issue users should be aware of?

(b) Have you tested them on a server with mod_security running?

As to (b), sometimes mod_security and mod_rewrite can interfere with each other, since they both want to do URL parsing/filtering.

If mod_security parses the URL first, and a user has a aggressive or very specific set of rules enabled, theoretically mod_security can mangle/rewrite/'normalise' a URL request that mod_rewrite may be expected to act upon.

While I am not sure exactly of the sequencing of how Apache processes new URL requests if both modules are loaded in RAM at web server startup, I suspect the load sequence of modules in httpd.conf has some influence on how URL requests are processed by Apache.

So you might perhaps want to make a recommendation to users to make sure mod_rewrite is loaded up in httpd.conf before mod_security is, so that perhaps mod_rewrite gets to have a go at the URL first, and mod_security only gets to work with the result.

2

Re: All those new mod_rewrite rules in 1.3dev .htaccess?

I don't see what users acctually get with URLs like "topics/222/page/2/", beacuse Google will index Punbb default URLs as well, and since there is a Google Sitemap mod I don't see any need for this if the webmaster wont be able to male URLs SEO friendly (add some text to them).

If the rules are well writen (most used rules at the begining with a flag telling Apache not to parse any more rules) 50 rules won't be a big overhead to the server.

http://www.info-mob.com/forum/ - Croatian forum only, don't bother if you don't speak Croatian :)

Re: All those new mod_rewrite rules in 1.3dev .htaccess?

sirena wrote:

(a) Will they slow down a site that uses them? There are quite a few rules - over 50 - for Apache to parse in there. Is there a server load issue users should be aware of?

I really don't know. They will of course add some overhead, but I doubt Apache parses the rules every time. It will of course have to execute the regular expressions but the actual parsing must be cached.

I will keep the mod_security problem in mind. However, it's more a server configuration issue than a PunBB issue. Most users run their forums hosted by large hosting companies and one must assume they are aware of the issue.

"Programming is like sex: one mistake and you have to support it for the rest of your life."

Re: All those new mod_rewrite rules in 1.3dev .htaccess?

I feel not so confortable to use such a .htaccess (but maybe I'm wrong), so I suggest a method that implies only one rewrite rule and a new php file.

The rewrite rule is generic, like this one (to be optimized, with the right matching-option for each field):
(thi example is only for topic, post and forum viewing, to be extended)

RewriteRule ^(forum|topic|post)/?([A-Za-z0-9-]*)/?([A-Za-z0-9-]*)/?$ rewriter.php?rewriteselector=$1&id=$2&p=$3 [PT,NC]

I've called the new file rewriter.php. It contains a switch-case clause to include the right php file and do some variabile rewriting if necessary (like traslating the id in a pid if viewing a post).

switch($rewriteselector){
case 'forum':include('vewforum.php');
break;
case 'topic':include('viewtopic.php');
break;
case 'post':$_GET['pid']=$_GET['id'];
include('viewtopic.php');
break;
default: include('index.php');
break;
}

I see no drawbacks in this system, and I think this should be faster than the actual htaccess, but I'm really new to punBB, so maybe I've not seen the problems of this approach.
What do you think about this idea?

Mike

Re: All those new mod_rewrite rules in 1.3dev .htaccess?

you can create a rewrite map that does the exact same thing as your rewriter.php and does so inside the rewrite process.

whether the rerouting is done internally in apache, or externally through one homogeneous php script, its still doing the same thing, and I'd tend to bet on apache's solution being the faster implementation.

I've seen diagrams of apache's request routing and whether or not you use the rewrite rules, that request goes through the same process, so adding 1 step to your page loading I'd think wouldn't be that you'd notice.

6 (edited by guardian34 2006-12-19 01:45)

Re: All those new mod_rewrite rules in 1.3dev .htaccess?

sirena wrote:

(a) Will they slow down a site that uses them? There are quite a few rules - over 50 - for Apache to parse in there. Is there a server load issue users should be aware of?

AFAIK, it probably won't be parsing all 50 rules every time; the [L] flag should stop the parsing after the correct rule has been found for that URL.

Edit: miketk, why not use three rules for that? Like MadHatter said, something has to do the work; Why do you think a php file would do any better? (Odds are that php is being run through an apache module too.)

Re: All those new mod_rewrite rules in 1.3dev .htaccess?

@guardian34: I thought that a regex evaluation could be more resource expensive than a switch-case clause, so I was asking. Do you or MadHatter have some links on this topic? I've searched online but I found nothing that compares those different ways.
Thanks,

Mike

Re: All those new mod_rewrite rules in 1.3dev .htaccess?

miketk, isn't your above rule using a regex?

Re: All those new mod_rewrite rules in 1.3dev .htaccess?

its going to use a regex regardless.  that is just the way it evaluates the match pattern, so even if you have an exact match pattern (meaning no actual matching patterns) its still going to use regex.  and as mentioned the [L] parameter specifies that a correct match will be the last one tested (much like a break in a switch statement).  since switch is implemented as a test condition (like an if statement) on the machine level, yes, technically a switch would be "faster" though not as efficient and a verbose matching syntax (which is why regex exist in the first place), because to test the complex conditions that regex can handle, you're going to go through the same test cases regardless if you have a huge switch statement or a simple regex pattern.

this is the official documentation for the rewrite engine.  I think how it works is pretty plainly stated there.

webservers are fairly complex applications.  regardless of if you're using apache or IIS there are a lot of extra things that happen between request and response.  adding a URL rewrite to a request is not going to affect a webserver like putting a heavy processing script (an infinite loop in code for example).  Your point of diminishing return is almost always going to be the bottleneck caused by slow internet speeds, rather than a client waiting on a webserver to process some 200+ rewrite rules.

10

Re: All those new mod_rewrite rules in 1.3dev .htaccess?

guardian34 wrote:

miketk, isn't your above rule using a regex?

Yes, you're right, but it's only one. I know that in the actual system breaks when a condition is matched, but, in the worst case scenario, it has to parse 50+ rules.
Maybe it's only me too much paranoic.

@MadHatter: thanks for your link, but I did know how it works, I was looking for a document that explains which could be the faster system.

Re: All those new mod_rewrite rules in 1.3dev .htaccess?

Will it be possible to turn the good looking urls off?
I prefer having the normal urls with all the parameters then a sitemap that targets the same pages. This way when the users are browsing they don't throw any rewriting in the server. The sitemap is then only a google landtrack as all the forum navigation links still in the old way. This can be more SE friendly. Google indexes dynamic urls, evendthough they say they dont index so many of those per site, they do. But one thing is indexing the site and other thing is the relevance it gets on search results.
Like, i dont want to have 500000 pages indexed if none of thouse shows in the first page of any search. I prefer to have 100 pages that are all the time getting visitors from google then let those find their way through the site navigation.
That's why i think urls with content mater when it comes to be SE friendly.

I am using punbb 1.2... i don't know if i will upgrade to 1.3 as it goes out. Anyway, i guess the database model will be untouched.
What about the old urls? will they still be working? we cant forget that there are links spread on the web pointing to 1.2 forums that should not be made broken.

So... can i start writing a sitemap or will it be incompatible with punbb 1.3?
BTW... anybody interesed?

12 (edited by guardian34 2006-12-20 02:36)

Re: All those new mod_rewrite rules in 1.3dev .htaccess?

pedrotuga wrote:

Will it be possible to turn the good looking urls off?

Yes

Edit:

pedrotuga wrote:

What about the old urls? will they still be working? we cant forget that there are links spread on the web pointing to 1.2 forums that should not be made broken.

Look for yourself: http://dev.punbb.org/browser/branches/p … /.htaccess

pedrotuga wrote:

Anyway, i guess the database model will be untouched.

Check that too: http://dev.punbb.org/browser/branches/p … update.php

Re: All those new mod_rewrite rules in 1.3dev .htaccess?

pedrotuga wrote:

So... can i start writing a sitemap or will it be incompatible with punbb 1.3?
BTW... anybody interesed?

You may/may not need to rewrite parts of it to work with 1.3