Search options

2005-12-27 11:04

I don't know, I made this patch maybe a couple of years ago. Now I would suggest you to push Rickard to do it internally in the proper fashion.

2004-09-03 15:29

If you have your messages in ISO now the best option is to make a text dump of the database and just convert it with a text editor. Or use client encoding control if your database supports it (i.e. it must be either MySQl 4.x or PostgreSQL). MySQL 3.22 does not support UTF-8 properly, so you might get garbled search results, string functions (if used) like lower() etc.

2004-09-03 15:28

Thanks, got it working (albeit a little slow). The reason was exactly this - exit was called before my footer was wrapping up the node tree.

2004-09-02 00:27

The next techie question :-) I am trying to buffer punBB and stick it into my XML output, but apparently, independent of what buffer settings I set, punBB flushes it all out before my footer does his job (auto_append_file).

What are my options?

2004-08-29 21:29

Sure it is not. However, instead of declaring that 'punBB is not multibyte-safe' you can make a code fork (not a big one, I mind you).

What I thouhgt is that, essentially, you write in the readme - 'you need it to work multibyte - presto, substitute file X with file Y and you are all set, you need extension Z and extension K for this to work'). People who gonna need this (for example Asian PHP installs) usually have the extension builtin and usually they are familiar with the issue.

PostgreSQL for instance (as well as MySQL of recent versions) both have calls to determine the current client charset and change it. Th only problem I encountered as of yet is with this indexing feature (which is apparently related to the regexps, and it seems that mbstring is not overloading the PCRE extension).

I will send you the code - just make a diff with the one of yours (mine will surely work slowly because I never could grasp regexps).

BTW. a good candidate for a UTF-8 solution is this very forum - just look at these fancy ????? in my post in the russian section :-)

2004-08-29 14:21

Looks like I nailed it.

The regexes in the split_words functrion were destroying all multibyte text.

Currently I made the following fixes.

1. Went to the .htaccess and set the following:

php_value mbstring.internal_encoding 'UTF-8'
php_value mbstring.func_overload '7'

This way the mbstring extension will overload all string manipulation functions and make the multibyte-safe for us. I assume it is possible to include these calls in locale files of punBB (make a variable called, say, $mbstring_encoding, and set it to 'UTF-8' or whatever). If the mbstring extension is compiled, the respective calls should be made (see http://www.php.net/manual/en/function.m … coding.php).

2. Replaced regexes in split_words with array walks and string functions.
3. Replaced your regexes in index insertion SQL requests with pg_escape_string() calls (it is myltibyte-safe and also a native libpq function that should be faster). Essentially it is a good idea to make a qstr handler in the DBLayer (in a fashion of $db->quote ($words_array))

Seems to work, let me know if you need the code.

2004-08-29 13:02

I tried it with all good-mannered browsers I have (Firefox and Safari). Got this practice of developing for good user agents before anything even gets to IE.

There may (!) be a problem with the fact that the error screen I get is sent as ISO charset, but the POST data should still be accurate.

I have to find out what is going on with the words before they get to the database (instead of your regex I tried to substitute a generic database quoting routine from ADODB, same result, so I guess the characters get mangled even before they get to be quoted for the database), because everything else just works fine - postin g(if your post contains one long word), userlists etc.

How can I trace variables that go through this script, I tried putting print_r there but your script evidently suppresses all output.

Probably this is just a case of some array_ function of PHP being not multibyte-safe.

2004-08-28 21:31

Tried it, still a no go.

Here is what I get when I look at my PostgreSQL statement log (enabled it especially for the case). Please note that queries before this one are logged with proper letters I am using (russian UTF-8 in this case). What comes out of this regex certainly doesn't look healthy to me.

Aug 28 23:26:19 exile PostgresCluster[3218]: [16-1] LOG: statement: SELECT id, word FROM punbb_search_words WHERE word IN('XX','XX')
Aug 28 23:26:19 exile PostgresCluster[3218]: [17-1] ERROR: Unicode characters greater than or equal to 0x10000 are not supported

In place of XX I see 2 garbage characters of unknow origin. What is this query supposed to do actually? Sorry to be so anal, I am just wondering.

2004-08-28 10:22

If you can afford it you can use PostgreSQL.

Then you can map your users table into a view looking like the punbb_users table. Then you can define rules for INSERT and UPDATE on punbb_users and your systems will be transparently integrated. An hour of careful work is needed though.

For encryption look for function punbb_hash in punBB sources, there you can change the encryption mechanism used for passwords and cookies to something of your taste, even substitute a call to your own hashing routine.

If you want to stay with MySQL and your table you are set for some serious hacking on the punBB source (I am not sure that MySQL supports rules and triggers, although it definitely supports views in the latest version).

2004-08-28 09:55

http://punbb.org/forums/viewtopic.php?id=4509

????? ??? ???? - ????? ???????? ???? ?????? ????? ?????????????? ? ??????? punBB

2004-08-28 09:46

Hello, a wonderful engine and wonderful people here :-)

I installed punBB on my site. Did some little modifications to it as well (needed to integrate it into our own login system, but thanks to the advent of PostgreSQL this was a matter of making a view and a couple of triggers). The question follows.

I am storing my database in UTF-8. I used a russian localization and changed locale definitions there to ru_RU.UTF-8, after which I converted the localization files themselves into UTF-8. The board was working OK, but when creating a new message (or a new topic) with more than one word in it, the engine was giving me a "Cannot create search index" error (or the like), referencing to line 127 of search_idx.php

Currently I solved this problem by reinstalling russian language packs and hacking the DBDriver of PunBB th set the client encoding of the database connection accordingly. However, there is "one more thing" - alll of the output from the site goes into a UTF-8 encoded XSL template. If I will leave the forum engine in Win-1251 I am in for all kinds of problems when submitting forms (even if I will convert the forum data after with ob_get_contents()). I cannot use Win-1251 (most XML processors do not support it, and the main part of the site is anyways UTF-8 already).

The question follows - how can I modify the regex at line 127 so that it is multibyte-safe, or maybe it is possible to overload the regex engine with mbstring functions so that they can become multibyte-safe automatically?

Thanks in advance, and keep up the good work. PunBB certainly thrilled me :-)

I am running v. 1.1.5

Search options

Posts found: 11

1 Reply by Julik 2005-12-27 11:04

Re: Indexing is not unicode-safe (20 replies, posted in PunBB 1.2 troubleshooting)

2 Reply by Julik 2004-09-03 15:29

Re: Utf-8? ... (3 replies, posted in PunBB 1.2 troubleshooting)

3 Reply by Julik 2004-09-03 15:28

Re: Buffering punBB (2 replies, posted in PunBB 1.2 troubleshooting)

4 Topic by Julik 2004-09-02 00:27

Topic: Buffering punBB (2 replies, posted in PunBB 1.2 troubleshooting)

5 Reply by Julik 2004-08-29 21:29

Re: Indexing is not unicode-safe (20 replies, posted in PunBB 1.2 troubleshooting)

6 Reply by Julik 2004-08-29 14:21

Re: Indexing is not unicode-safe (20 replies, posted in PunBB 1.2 troubleshooting)

7 Reply by Julik 2004-08-29 13:02

Re: Indexing is not unicode-safe (20 replies, posted in PunBB 1.2 troubleshooting)

8 Reply by Julik 2004-08-28 21:31

Re: Indexing is not unicode-safe (20 replies, posted in PunBB 1.2 troubleshooting)

9 Reply by Julik 2004-08-28 10:22

Re: Question, Merge Forum Database to My Members Based DB? (4 replies, posted in PunBB 1.2 discussion)

10 Topic by Julik 2004-08-28 09:55

Topic: ?????????? ?? ???-???? ? ????? ????????? (punBB ? ???????) (0 replies, posted in Archive)

11 Topic by Julik 2004-08-28 09:46

Topic: Indexing is not unicode-safe (20 replies, posted in PunBB 1.2 troubleshooting)

Posts found: 11