1

(62 replies, posted in News)

Smells like Mambo and Joomla! to me.

One of the hosting servers I use limits the number of successive requests from the same address. If the requests are coming too fast, I see 403 Forbidden.

The cause appears to be the following JavaScript redirect function (present in admin_maintenance.php and many files in the PunBB Migration Tool):

<script type="text/javascript">window.location="foo.php"</script>

If many successive database operations are not database intensive, the page is small and this redirect function is called several times per second, and my host does not like when the browser requests pages that often.

I solved the problem by making sure that redirects happen 1 second after the page has loaded:

<script type="text/javascript">setTimeout(function() {   window.location="foo.php"  ;}, 1000);</script>

3

(20 replies, posted in PunBB 1.2 troubleshooting)

I found this page, PHP UTF-8 cheatsheet, which is essentially a quick guide for converting PHP applications to UTF-8.

However, they do not mention the importance of issuing SET NAMES 'utf8' to the database and do not mention the PCRE_UTF8 /u modifier for regular expressions.

4

(1 replies, posted in Archive)

http://www.artlebedev.ru/kovodstvo/51/

Renaming the scripts post.php, register.php and login.php did not help me to deal with spam. Bots appear to look for words like Post new topic and Register. However, when I preceded a duplicate link in the HTML comments, a-la

<!-- <li id="navregister"><a href="register2.php">Register</a></li> --><li id="navregister"><a href="register.php">Register</a></li>

bots started to follow register2.php because that link appeared first. It appears that they try to match the first link that has the word "Register".

To make the second link disappear from bots' view, I encoded the script names in genuine links as well as the link names like "Register" and "Post new topic" using SGML entities (a-la e-mail address encoding).

<!-- <li id="navregister"><a href="register.php">Register</a></li> -->
<li id="navregister"><a href="re.php">Register</a></li>

The spambots got lost. They now follow a non-existent link to register.php and get a 404. Real browsers will go to re.php. I did the same to post.php.

I am also thinking to add an automated ban on those who claim to be MSIE yet do not load any CSS files from the forum (am I right that a normal MSIE loads CSS files under all circumstances?). Spambots that are visiting my forum claim to be MSIE 6 on Windows 2000.

I moved topics to an archive and set that "archive" forum read-only for all (unchecked Post Topics and Post Replies). However, a quick test shows that the users can still edit their own posts.

Is this behaviour by design and how to change it so that no edits are allowed in read-only forums?

I had the same issue when converting from phpBB. The styling problems are due to improperly nested quote tags.

IMHO, PunBB tries to store valid input so that the nesting is correct and all open tags are closed. It enforces that when the users enter their messages. However, this cannot be enforced if the initial post data is not entered by users but comes from a board with more lax requirements (the phpBB and IPB do not seem to mind about improper nesting).

I had to read the whole board (fortunately, it was not large) and re-save faulty messages with proper nesting.

8

(25 replies, posted in Archive)

beotiger wrote:

??, ??? ???????????? utf, ?????????? ??????, ? ??????? ??? ? ???? ????????. ?? ????? ???? ???????? ????????? ???????? ???? ?????????, ??? ????? ?????? ???????? - utf-8, ????? ????? ?????????????????? ?????? ?? 2 ?? 6. (???????, ?????? 4-5).
?????? ???? ? ?? ?? ?? ? ?????????? cp1251 ? utf-8. ???????? ????? 5-7 ???.

? ????????? ????? utf ????? ???????????? ASCII ????????? ? ?????? ???? -
????????, ???:

&#1062;&#1045;&#1053;&#1058;&#1056;

?????????. ?? ?????? ??????? ?????? ?????????? 7 ????????. ??... ??? ????????? ????? ????? ????? ????? ??? text, ? ?? ??????? ?????????? ?????, ??? ????? ?????????.

???????? ?? ???????? ??? &#uxxxx; ??? &#dddd; - ??? Character entity references ? ?? Unicode ? ?? UTF-8. ??? ????? ???????????? ?????? ??? SGML/HTML/XHTML/XML, ? ??????????? (character sets) ?? ????????.

UTF-8 ? ?????? ??? ????????? ??? ? ??????????? ???? ?? ??????? ????? Latin-1 ???????? ??? ?????, ? ????? ?? ????. ??? ????????? ?????? - ??? ????? ?? ??????. ???? ??? ?????? ? ???? ?????? ?? ??????, ?? RFC 3629 ??? ?????????????????? ???????. ???????? ????????? Unicode ? MySQL ????????? ??? RFC.

9

(20 replies, posted in PunBB 1.2 troubleshooting)

To summarize what I have seen so far in this forum:

1) 'lang_encoding'    =>'utf-8' in /lang/English/common.php
2) ini_set("default_charset", "utf-8"); in /config.php
3) $db->query("SET NAMES 'UTF8'"); in /include/dblayer/common_db.php
4) utf_general_ci for database
5) word varbinary(20) for 'word' column in 'search_words' table

IMHO (correct me if I am wrong here)

(1) puts charset=utf-8 in mega tags
(2) makes PunBB output charset=utf-8 in HTTP headers.
(3) and (4) allows inserting UTF-8 into the database
(5) gets rid of the error "Unable to insert search index words".

However, there is this code in search_idx.php

    $text = preg_replace($patterns, ' ', ' '.strtolower($text).' ');

which is not Unicode-safe. And at exactly this step, the text gets mangled and after that, the junk is inserted into the search_words table instead of the lowercase words.

In search.php, there is this code

strtolower(trim($_GET['keywords']))

and this passes the junk to the search function.

The strtolower alters the individual bytes (because it assumes that characters are single byte); therefore it is not suitable for altering UTF-8 strings, where some bytes are not altered the same way. When I look into the database with phpMyAdmin, I see question marks inside black losanges instead of Russian words.

The solution posted by Julik is

mbstring.func_overload = 7
mbstring.internal_encoding = UTF-8

Setting mbstring.func_overload to 7 to replace string functions with their multibyte equivalents automatically. I tested it and found it working with PHP5 and .htaccess (because it is INI_PERDIR in PHP5). In PHP4, that parameter is INI_SYSTEM which is not usable in shared hosting environment. In addition to that, if I use that parameter, there is a performance hit. I only need some of the function to be UTF-8-aware, not all of them.

String functions that do not operate on individual characters in strings should be (IMHO) UTF-8 safe, because UTF-8 is backwards-compatible with 8-bit single-byte encoding. Sorting should work too, to some extent (alphabetical case-sensitive Unicode sorting). Functions like changing case, however, are not UTF-8 safe. This is why I replaced strtolower with mb_strtolower function where strtolower occurred.

I just changed varbinary(20) back to varchar(20) and rebuilt the index. The forum still works and I did not get the pesky "Unable to insert search index words" error. So, it looks like varbinary is not necessary. Now I think that varbinary(20) was used to accomodate the characters mangled by strtolower function.

I will do another install (with the latest release instead of the 1.3-dev this time) and then will post my own instructions on how to make PunBB work with UTF-8 (including searching).

10

(20 replies, posted in PunBB 1.2 troubleshooting)

Tried some of the changes above.

Looks like the the function strtolower was killing the multibyte characters, so, the regex was not really the problem. I had to replace strtolower  with mb_strtolower in both search_idx.php and search.php. After doing that, the words are inserted into the database properly and searching works too.

PHP5, MySQL 4.1, Apache 2.2, Win32. Will try my changes later on PHP 4.x on Apache 1.3.x / Unix.

So, an mbstring-based fork may be desirable, because hosts do run mbstring.