Topic: punbb produces invalid xhtml
I tried validating a forum page and got loads of errors which could be traced back to microsofts browser uploading windows-latin data as iso-8859-1 in a new forum message.
I fixed the problem, see the code below. Since my customized punbb code differs too much from the official tree, I can't give you a patchfile. The name 'demoronize' is inspired by Demoroniser. This bugfix only corrects the most common offending characters and replaces the other ones by question marks since I was too lazy to look up the unicode numbers of the other ones (the whole range 0x80 to 0xa0).
Extra function in include/functions.php (the . in "&"."#8364;" is because otherwise a € will show up in this post:
// remove non ISO-8859-1 characters from $text, return cleaned $text
function demoronize($text)
{
// common Microsoft non ISO-8859-1 characters
$search=array("#\x91#", "#\x92#", "#\x93#", "#\x94#", "#\x95#", "#\x96#", // smart quotes
"#\x80#" // euro
);
$replace=array("'", "'", "\"", "\"", "-", "-",
"&"."#8364;"
);
$text = preg_replace($search, $replace, $text);
// delete less common Windows-only characters
return preg_replace("#[\x80-\xa0]#", "?", $text);
}
In post.php, after the "validate bbcode syntax" block somewhere around line 170:
// and validate character set iso-8859-1
$message = demoronize($message);
In include/parser.php (the last line is added):
// Convert applicable characters to HTML entities
$text = pun_htmlspecialchars($text);
$text = demoronize($text); // remove non iso8859-1 characters that are left in the database