I don't know how about others. Is it UTF-8 ? Is it care about multibyte?
Look at PunBB, search.php:
$keywords = (isset($_GET['keywords'])) ? strtolower(trim($_GET['keywords'])) : null;
In UTF-8 version strtolower brakes cyrillic text BEFORE real search.
I've made some test script with this fragment for illustration. testmb.php
<?php
mb_internal_encoding('utf-8');
$keywords = (isset($_GET['keywords'])) ? strtolower(trim($_GET['keywords'])) : null;
$keywords_len = strlen($keywords);
$author = (isset($_GET['author'])) ? strtolower(trim($_GET['author'])) : null;
$keywords_mb = (isset($_GET['keywords'])) ? mb_strtolower(trim($_GET['keywords'])) : null;
$keywords_len_mb = mb_strlen($keywords_mb);
$author_mb = (isset($_GET['author'])) ? mb_strtolower(trim($_GET['author'])) : null;
?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en" dir="ltr">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>UTF-8 text transformation test</title>
<style>
FORM {padding: 0; margin: 0; float: left; overflow: hidden; width: auto;}
FIELDSET, DIV, P {padding: 0; margin: 0; overflow: hidden; width: auto; clear: both;}
SPAN.lbl {position: absolute; left: 10px;}
P {padding: 10px; position: relative;}
INPUT {float: left; margin-left: 10em;}
SPAN {font-weight: bold;}
</style>
</head>
<body>
<form action="testmb.php" method="get">
<fieldset>
<p><label><span class="lbl">Keywords:</span> <input name="keywords" type="text" value="<?php echo htmlspecialchars(isset($_GET['keywords']) ? $_GET['keywords'] : '') ?>" /></label></p>
<p><label><span class="lbl">Author:</span> <input name="author" type="text" value="<?php echo htmlspecialchars(isset($_GET['keywords']) ? $_GET['author'] : '') ?>" /></label></p>
<p>
<input type="submit" value="OK" />
</p>
</fieldset>
</form>
<div>
<h3>Like in PunBB (non-multibyte):</strong></h3>
<p><span>keywords:</span> <?php echo $keywords ?></p>
<p><span>keywords len:</span> <?php echo $keywords_len ?></p>
<p><span>author:</span> <?php echo $author ?></p>
<h3>Multibyte:</h3>
<p><span>keywords(mb):</span> <?php echo $keywords_mb ?></p>
<p><span>keywords len(mb):</span> <?php echo $keywords_len_mb ?></p>
<p><span>author(mb):</span> <?php echo $author_mb ?></p>
</div>
</body>
</html>
Screenshot 1 (Windows): broken characters
Screenshot 2 (Unix): case not changed
may be it depends on locale or PHP version... but in both cases it wrong because PHP is not multibyte in core. it should use mbstring extension!
Edited: I've add strlen/mb_strlen into example code
Live example with russian text: http://tlogr.com/testmb.php?keywords=%D … 0%B0%D0%BD
P.S. I found why and when Windows default behaviour is deffer then Unix. In phpbb3 :
// Enforce ASCII only string handling
setlocale(LC_CTYPE, 'C');
when I copy this part into test script both installatons do the same (as in Unix screenshot)
DigitalOcean: VPS from $5/mon.
Get $10 bonus!.