Topic: search words containing swedish characters not working?
I never seem to get any matches when searching for words with åäö in it.
You are not logged in. Please login or register.
PunBB Forums → PunBB 1.2 bug reports → search words containing swedish characters not working?
I never seem to get any matches when searching for words with åäö in it.
In these forums or somewhere else? If it's in these forums, I know why. It's because the locale isn't set to Swedish on the server.
I found out on my forum first and I thought it was a bug in the old version I'm running. So I tried here but with the same result.
So it doesn't have anything to do with PunBB? How can I fix it?
It has to do with PunBB. Are you using the swedish language pack? If not, you should. If you are and it still doesn't work, there's a problem with your server setup. For some reason, the call to setlocale() doesn't work. That usually means the swedish locale isn't installed.
yes i'm using the swedish language pack. I guess I can't do anything about it then because it's not my server.
Contact the server administrator. He might be able to help you out.
yes, I will
EDIT: digged a little bit myself and found that
setlocale( LC_CTYPE, 'sv_SE' );
didn't return anything but
setlocale( LC_CTYPE, 'sv_SE.ISO_8859-1' );
returns the correct locale, but search isn't working anyway.
EDIT2: Now I see you've already thought of that in the new langpack Rickard.
The new langpack?
I meant the latest swe lang pack.
This is a bug, I cant use search with Chinese Words
machen: That doesn't tell me much. The search feature has a hack built it to allow searches in two byte character sets such as Chinese. It should work. It has worked before :)
I wonder if it is working when lang_multibyte is set true...
Yeah, lang_multibyte must be true for searches to work in multibyte languages such as chinese.
no more ideas regarding my problem? As I said setlocale seems to be working but åäö-searches still doesn't work.
How do you know setlocale() works? Try putting the following piece of code after the inclusion of common.php in e.g. index.php:
dump(setlocale(LC_CTYPE, NULL));
Then run the script. It should output the currently active locale which in your case should be Swedish.
I assumed it worked since it returned the locale and not false when setting the locale.
The output from dump () is C
yes C.
C is the default locale (POSIX). What it means is that setlocale() failed to set the swedish locale. Why that happens I don't know, but most likely it means you don't have it installed.
it might be wrong but how about trying setting lang_multibyte true? My idea is other language except english is much like multibyte, and when you see the source of swidish website it might show you using ? and the "?" is in the noise match and is replaced ' '. so you will not lose anything if you try setting it true.
jacobswell: That's not a good idea. Getting the locale installed shouldn't be a problem. lang_multibyte is a hack that ignores the search index and does a "raw" search in the posts table. It's many times slower than a regular search. There really isn't much I can do though seeing as PHP lacks native unicode support.
I mean, parden if I say wrong, that the when we post an article, is it saved into mysql with swidish characters or something like "Tack f? ditt bes? p?Yahoo! Sverige" if the latter is right, then the character "?" is replaced with ' '(space) by search.php's noise match filter. then we cannot get proper information. I think we can check like this
1. find in search.php
// Filter out non-alphabetical chars
$noise_match = array('^', '$', '&', '(', ')', '<', '>', '`', '\'', '"', '|', ',', '@', '_', '?', '%', '~', '.', '[', ']', '{', '}', ':', '\\', '/', '=', '#', '\'', ';', '!', '?);
$noise_replace = array(' ', ' ', ' ', ' ', ' ', ' ', ' ', '', '', ' ', ' ', ' ', ' ', '', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', '' , ' ', ' ', ' ', ' ', ' ', ' ', ' ');
$keywords = str_replace($noise_match, $noise_replace, $keywords);
2. add this code after that
exit($keywords);
3. finally we can check how the $keywords is changed. I think all the "?" character is replaced with space if the latter case is right.
this is the example. Richard's signature is shown in notepad.exe as "Nice catch blanco ni?, but too bad your ass got saaaaaaaaaaaaaacked!".
jacobswell: That's not his problem. The swedish characters are saved correctly into his database. The problem is that in searchidx.php, PunBB devides the message into words and in that process it uses preg_replace() which in turn relies on the locale being correctly set. If it isn't, any words that contain the Swedish characters å, ä and ö will be ignored and thus won't end up in the search index.
I see, thanks for your kind explaination.
I do have the swedish locale installed. I'm running FreeBSD and the locales are in /usr/share/locale.
Are you sure the actual locales are installed? I have a bunch of directories in my /usr/share/locale, but most of them are empty or more or less empty.
yes
[henke@flinta:~] $ ls /usr/share/locale/sv_SE.ISO8859-1
LC_COLLATE LC_MESSAGES LC_NUMERIC
LC_CTYPE LC_MONETARY LC_TIME
PunBB Forums → PunBB 1.2 bug reports → search words containing swedish characters not working?
Powered by PunBB, supported by Informer Technologies, Inc.