Aha, that's an important point I missed! As you probably gathered, I thought it ran every time any BBCode was processed. I see the benefit of doing that now, thanks for replying.

I was having a browse of the parser implementation for inspiration, and wondered what the purpose of lowercasing the simple bbcode tags in preparse_bbcode() (from includes/parser.php) was?

Namely the following snippet:

// Change all simple BBCodes to lower case
$a = array('[b]', '[i]', '[u]', '[/b]', '[/i]', '[/u]');
$b = array('[b]', '[i]', '[u]', '[/b]', '[/i]', '[/u]');
$text = str_replace($a, $b, $text);

I understand fully what it's doing there - that wasn't what piqued my curiosity. It was over the decision to do that first and then run a case-sensitive regexp in do_bbcode(), rather than just using a single case-insensitive regexp in do_bbcode() instead. I ran a brief profile (PHP5) for comparison and running str_replace() first nearly doubles the processing time. 20s vs 35s for 100,000 iterations, comparing:

$pattern = array('|\[b\](.*?)\[/b\]|is');
$replace = array('<strong>$1</strong>');
$the_string = preg_replace($pattern, $replace, $the_string);

against:

$upper = array('[b]', '[/b]');
$lower = array('[b]', '[/b]');
$the_string = str_replace($upper, $lower, $the_string);

$pattern = array('|\[b\](.*?)\[/b\]|s');
$replace = array('<strong>$1</strong>');
$the_string = preg_replace($pattern, $replace, $the_string);

Please bear in mind I'm not trying to assert a 'right' or 'wrong' way, just curious about the reasoning, and whether it could be considered to have any real impact on performance? smile

Looking good mindplay smile.
Firefox users can make use of the BBCode extension for quicker post formatting.

4

(9 replies, posted in PunBB 1.2 show off)

Rewozz wrote:

Firefox interprets css differently than IE, and I think that sucks.

Firefox interprets CSS according to the specification and recommendations set out by the W3C. IE interprets CSS in its own rather peculiar manner.

Code first for Firefox (or any of the Gecko based browsers) and then fix/hack your CSS for IE if necessary smile.

5

(300 replies, posted in PunBB 1.2 discussion)

Rickard wrote:

Penfold: The problem with that solution is that it adds quite a lot of overhead for something that is rather trivial. You can still break validation by posting a "dodgy character" like Connorhd said (i.e. a non-iso-8859-1 character in a iso-8859-1 forum). There's just no way to prevent that from the server side.

My concern was just that, as far as I'm aware, a character not in the document's character encoding will just be ignored by an XML parser (replaced with a ? and so on), whereas malformed HTML will halt the parser completely and display an XML rendering error-page, as it's considered a fatal error.

I do appreciate the extra processing that would be required, but I'm in two minds whether it can be considered a trivial issue. It would be great future proofing, if and when people want to switch to using xhtml 1.1+ (which must be sent as xhtml) . Though when IE will get proper xhtml support is probably quite far in the future, so it's not too pressing an issue.

It's not something I'm too concerned about, however, and as it's still safe to send it as text/html it's certainly not going to stop me wanting to use PunBB over the other boards smile.

Paul wrote:

I think the best any cms system can do is advertise itself as being valid XHTML 1.0 Strict "out of the box".

Sure, I think that's the best position one can be in - especially for PunBB's aim of being fast and lightweight smile.

6

(300 replies, posted in PunBB 1.2 discussion)

Yes, as far as I can see angle brackets are escaped to their entity equivalent, so it's safe from that problem (same applied to escaping ampersands and so on)

I don't know if it's the cleanest solution (unlikely: regexps generally give me nightmares big_smile), but the BBCode parser could be altered to include the various cases of bad nesting of tags.

I'm thinking something like:

$pattern = array(
            '#\[b\]\[i\](.*?)\[/b\]\[/i\]#is',
            '#\[i\]\[b\](.*?)\[/i\]\[/b\]#is',
            ...);

$replace = array(
            '<strong><em>$1</em></strong>',
            '<strong><em>$1</em></strong>',
            ...);

However, as there are quite a lot of tags and thus many possible malformed patterns, I can't see it as the most manageable solution. I guess it needs a regexp that will only perform the parsing if it finds an opening and closing tag of the same type and not a tag of another type in between, if that makes sense.

7

(300 replies, posted in PunBB 1.2 discussion)

Connorhd wrote:

i think that will give you a BBcode error in 1.2

Do you mean it should display a message informing the user of 'incorrect' BBCode?
/edit - Ah yes, I see what you mean - just tried entering the code tags incorrectly. That's a nice feature.

Currently it's just outputting malformed HTML in the same way that the BBCode is, when using my example above with bold and italic tags, and not returning an error.

8

(300 replies, posted in PunBB 1.2 discussion)

I must say 1.2 is looking fantastic. Great work guys.

I do have one question, however. Sorry if it's been asked before!
Will the BBCode parser be XHTML-safe when 1.2 reaches a final release? By that I mean will it be outputting well-formed HTML?

The reason I ask is that I would like to integrate it into a (XHTML) site that sends itself with a mimetype of application/xhtml+xml and thus requires tags to be correctly nested.
e.g.

[b][i]The tags are not correctly nested[/b][/i]

will cause a XML rendering error in Gecko browsers etc.