1 (edited by Fire Fusion 2006-06-03 16:03)

Topic: Are then any disadvantages to running the forums in UTF-8

I switched it to UTF-8 using the following method. Is anything bad going to happen?

XML wrote:

1. with folder LANG, open common.php

2. Replace this

lang_encoding' => 'iso-8859-1'

lang_encoding' => 'utf-8'

Re: Are then any disadvantages to running the forums in UTF-8

Not as long as you're running MySQL 4.1+ and have the mb_string stuff in php.

Re: Are then any disadvantages to running the forums in UTF-8

Well, PunBB doesn't understand that the content is UTF-8 so you might run into some problems with string lenghts. For example, the maximum allowed username length is 25 (IIRC), but if you have a lot of non-ASCII characters in the username and it's in UTF-8, PunBB will interpret these as two characters and thus, 25 might turn into 18 or whatever. There are some other minor issues, but nothing showstopping.

"Programming is like sex: one mistake and you have to support it for the rest of your life."

4 (edited by foremind 2006-06-10 05:01)

Re: Are then any disadvantages to running the forums in UTF-8

I run my forum in utf-8, for simultaneous display of traditional and simplified chinese characters. In chinese related web sites, traditional vs simplified chinese encoding is alway a sticky issue, utf-8 solves the problem cleanly.

In fact I think you don't even need the mb_string stuff if you don't care about the string length issues.

In functions.php, I do:

function pun_strlen($str)
{
        return mb_strlen($str, 'utf-8'); // and blindly hope that the function actually does it's counting properly
}

In common.php I do:

// Load DB abstraction layer and connect
require PUN_ROOT.'include/dblayer/common_db.php';

// common_db.php would have given us a db connection handle
// make sure we talk in utf-8 from this point on
// this is to get around the issue of the mysqlclient client library
// defaulting to iso-8859-1, thus inserting the wrong stuff into the db
$db->query("set names utf8");

// Start a transaction
$db->start_transaction();

5 (edited by Jérémie 2006-06-16 14:54)

Re: Are then any disadvantages to running the forums in UTF-8

Jansson wrote:

Not as long as you're running MySQL 4.1+ and have the mb_string stuff in php.

I run it with 4.0 from some time (started with 4.1, did a dump --compatible-mysql40), no issues.

Re: Are then any disadvantages to running the forums in UTF-8

Is sorting with "special" characters done correctly?

Re: Are then any disadvantages to running the forums in UTF-8

Haven't tested that far. You know, we still have a major search bug with single quote (and they are quite importanty in several languages), so proper alpha sorting isn't really on the top of the list smile

8 (edited by gog 2006-06-22 10:41)

Re: Are then any disadvantages to running the forums in UTF-8

I have an UTF-8 board.

1. Edit /include/dblayer/common_db.php, put this line at the end of the file, just afer:

$db = new DBLayer($db_host, $db_username, $db_password, $db_name, $db_prefix, $p_connect);

Add

$db->query("SET NAMES 'UTF8'");

This will make sure that mysql-connection handles everhing as UTF-8.

If you are running MySQL >= 4.1 edit search_words table in the database so that field "word" is varbinary, not varchar. Also check that all your collations are set to UTF-8. If you are running MySQL < 4.1 everything is fine.

And finaly change your language common.php to declare the proper code page to be inserted into <head></head>

lang_encoding' => 'utf-8'

Maybe we could also send UTF-8 header to the client so that this META code page stuff isn't really necessary?

Also, don't forget to edit all of your language files so that they are saved as UTF-8. This really doesn't matter if you use english lanugage, because iso-8859-1 characters are the same in utf-8. But if you use something else then you have to do this. For example I had a croatian translation with code page win-1250 and I had to open every file and change it's encoding for everything to work well.

Most users don't use language specific characters for the username, because if you find yourself abroad and try to login to the board there are big chances that you won't find appropriate characters on the keyboad smile

http://www.info-mob.com/forum/ - Croatian forum only, don't bother if you don't speak Croatian :)