Topic: Search words are not indexed if a appostrophe is used l'éléphant
For instance i'm pretty sure than l'éléphant is not indexed
let's try!
You are not logged in. Please login or register.
PunBB Forums → PunBB 1.2 bug reports → Search words are not indexed if a appostrophe is used l'éléphant
For instance i'm pretty sure than l'éléphant is not indexed
let's try!
"éléphant" is not indexed but "l'éléphant" is ...
Is not so usefull.
By the way I found this "bug" when I met a problem with the global topic Plugin http://punbb.org/forums/viewtopic.php?id=6592
Could it be solved ?
Tx for your attention
its because it takes l'éléphant as one word not two as for the global topic plugin i thought i fixed that i'll look into it
I will look into this.
I'm not sure I understand what the problem. Searching for either l'éléphant or éléphant yeilds proper results for me.
Not here. Try to search for this post : http://punbb.org/forums/viewtopic.php?pid=37199#p37199 via the string "éléphant"
I just tested it, doesn't work.
Aha, that's what you mean. PunBB remove any ' it finds before it indexes the word. E.g. l'éléphant will be indexed as léléphant. The reason for this is that you want searches for e.g. did'nt to return posts containing both didn't and didnt. This will be difficult to get around.
In my point of view is much more often word like "éléphant" is searched by user than "didn't" or "I'm'" ... Perhaps do something following differents languages ...
Or maybe use MySQL search and index (or fulltext search) capability instead of builindg by hand in PHP. Would MySQL be slower/more ressources hungry that PHP in that area ?
No, it would not. The problem is what to do with PostgreSQL and SQLite.
Use their own search capability if they have one ? If they don't, use the old PHP-by-hand way (when they are used) ? Or set an admin option to use one way or the other ?
If it were that easy. Thing is, search.php will have to be more or less completely rewritten to work with MySQL fulltext indexing.
Any news on this ? Right now, the search is "can't write english friendly" but very unfriendly to some languages, including some vastly used.
Well, the only "solution" I can think of is to not strip out '. The problem with that is that allt he stopwords lists will have to be reviewed.
Any plan on this ?
Sorry to bump this rather old topic... but replacing ' with a [space] instead of a [null] wouldn't solve this issue?
Apostrophe is also used a lot in italian and having the apostrophed article merged with the following noun/adjective drastically reduce the efficiency of the search engine. Anyway, this is a common issue of many softwares (mysql full text search is affected by this too).
Sorry to bump this rather old topic... but replacing ' with a [space] instead of a [null] wouldn't solve this issue?
No, because then it would be treated as two words.
No, because then it would be treated as two words.
well, they actually are two words (but probably I am missing something ).
l'éléphant now is treated as léléphant.
replacing ' with [space] we would get "l éléphant" and the "l" would be ignored.
With "don't" we would have "don" and "t"... and they would be probably both ignored.
(sorry to bother you, I'm just tring to understand)
True. In the case of "l'éléphant", it would be good because the "l" doesn't affect the "meaning" of the word. On the other hand, "shouldn't" would be treated as "should" and "t" where "t" would be ignored and then it inverts the meaning of the word.
You're right...
This is strictly language dependent, it seems very hard to find a way around this problem.
Anyway, this is a common issue of many softwares (mysql full text search is affected by this too).
Uh ? It's the first time ever I saw anything like it... are you sure about MySQL fulltext search ?
Uh ? It's the first time ever I saw anything like it... are you sure about MySQL fulltext search ?
Well, mysql has not the same exact issue but as far as I know the ' (apostrophe) is not a word-boundary character so the search engine is not 100% reliable. That means that "l'éléphant" is treated as a single word all together with the article and the apostrophe.
The only way I know to solve this problem is by recompiling mysql adding the ' as word-boundary (you should find tutorial about this on da' web)... or you may do some voodoo programming in PHP adding a [space] just after any apostrophe before updating your mysql table.
In punBB you just need to change an array insearch_idx.php
find:
$noise_replace = array('', '', '', '', '', '', '', '', '', '', '', '', '', '', ' ', ' ', ' ', ' ', ' ', ' ', ' ', '', '', ' ', ' ', ' ', ' ', '', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', '' , ' ', ' ', ' ', ' ', ' ', ' ');
replace:
$noise_replace = array('', '', '', '', '', '', '', '', '', '', '', '', '', '', ' ', ' ', ' ', ' ', ' ', ' ', ' ', '', ' ', ' ', ' ', ' ', ' ', '', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', '' , ' ', ' ', ' ', ' ', ' ', ' ');
Then you need to rebuild your search_words table.
Use at your own risk, this replaces all apostrophes with a space instead of a null character. It should work but --as Rickard pointed-- it may have some drawbacks. My opinion is that for some languages (italian, french, german, ..) the apostrophe can be safely replaced with a [space]. I'll give it a try and let you know.
PS: wow! I used to love Shadowrun!
Ok so it's not perfect, but at leat the seach is founded with the ' character. The thing is, right now PunBB doesn't find it if there are any apostrophes in the search string
Ok so it's not perfect, but at leat the seach is founded with the ' character. The thing is, right now PunBB doesn't find it if there are any apostrophes in the search string
If you don't mind having 's in your search table you may just remove the single quote from the search/replace arrays... but "l'éléphant" would be stored with ' and "éléphant" would not match anyway.
PunBB Forums → PunBB 1.2 bug reports → Search words are not indexed if a appostrophe is used l'éléphant
Powered by PunBB, supported by Informer Technologies, Inc.