Topic: Search words are not indexed if a appostrophe is used l'éléphant

For instance i'm pretty sure than l'éléphant is not indexed

let's try!

Re: Search words are not indexed if a appostrophe is used l'éléphant

"éléphant" is not indexed but "l'éléphant" is ...

Is not so usefull.

By the way I found this "bug" when I met a problem with the global topic Plugin http://punbb.org/forums/viewtopic.php?id=6592

Could it be solved ?

Tx for your attention

Re: Search words are not indexed if a appostrophe is used l'éléphant

its because it takes l'éléphant as one word not two as for the global topic plugin i thought i fixed that i'll look into it

Re: Search words are not indexed if a appostrophe is used l'éléphant

I will look into this.

"Programming is like sex: one mistake and you have to support it for the rest of your life."

Re: Search words are not indexed if a appostrophe is used l'éléphant

I'm not sure I understand what the problem. Searching for either l'éléphant or éléphant yeilds proper results for me.

"Programming is like sex: one mistake and you have to support it for the rest of your life."

6 (edited by Jérémie 2005-03-12 01:55)

Re: Search words are not indexed if a appostrophe is used l'éléphant

Not here. Try to search for this post : http://punbb.org/forums/viewtopic.php?pid=37199#p37199 via the string "éléphant"

I just tested it, doesn't work.

Re: Search words are not indexed if a appostrophe is used l'éléphant

Aha, that's what you mean. PunBB remove any ' it finds before it indexes the word. E.g. l'éléphant will be indexed as léléphant. The reason for this is that you want searches for e.g. did'nt to return posts containing both didn't and didnt. This will be difficult to get around.

"Programming is like sex: one mistake and you have to support it for the rest of your life."

Re: Search words are not indexed if a appostrophe is used l'éléphant

In my point of view is much more often word like "éléphant" is searched by user than "didn't" or "I'm'" ... Perhaps do something following differents languages ...

Re: Search words are not indexed if a appostrophe is used l'éléphant

Or maybe use MySQL search and index (or fulltext search) capability instead of builindg by hand in PHP. Would MySQL be slower/more ressources hungry that PHP in that area ?

Re: Search words are not indexed if a appostrophe is used l'éléphant

No, it would not. The problem is what to do with PostgreSQL and SQLite.

"Programming is like sex: one mistake and you have to support it for the rest of your life."

11 (edited by Jérémie 2005-03-14 05:28)

Re: Search words are not indexed if a appostrophe is used l'éléphant

Use their own search capability if they have one ? If they don't, use the old PHP-by-hand way (when they are used) ? Or set an admin option to use one way or the other ?

Re: Search words are not indexed if a appostrophe is used l'éléphant

If it were that easy. Thing is, search.php will have to be more or less completely rewritten to work with MySQL fulltext indexing.

"Programming is like sex: one mistake and you have to support it for the rest of your life."

Re: Search words are not indexed if a appostrophe is used l'éléphant

Any news on this ? Right now, the search is "can't write english friendly" but very unfriendly to some languages, including some vastly used.

Re: Search words are not indexed if a appostrophe is used l'éléphant

No news ?

Re: Search words are not indexed if a appostrophe is used l'éléphant

Well, the only "solution" I can think of is to not strip out '. The problem with that is that allt he stopwords lists will have to be reviewed.

"Programming is like sex: one mistake and you have to support it for the rest of your life."

Re: Search words are not indexed if a appostrophe is used l'éléphant

Any plan on this ?

17 (edited by Cubiq 2005-11-01 09:33)

Re: Search words are not indexed if a appostrophe is used l'éléphant

Sorry to bump this rather old topic... but replacing ' with a [space] instead of a [null] wouldn't solve this issue?

Apostrophe is also used a lot in italian and having the apostrophed article merged with the following noun/adjective drastically reduce the efficiency of the search engine. Anyway, this is a common issue of many softwares (mysql full text search is affected by this too).

Re: Search words are not indexed if a appostrophe is used l'éléphant

Cubiq wrote:

Sorry to bump this rather old topic... but replacing ' with a [space] instead of a [null] wouldn't solve this issue?

No, because then it would be treated as two words.

"Programming is like sex: one mistake and you have to support it for the rest of your life."

19 (edited by Cubiq 2005-11-01 10:25)

Re: Search words are not indexed if a appostrophe is used l'éléphant

Rickard wrote:

No, because then it would be treated as two words.

well, they actually are two words (but probably I am missing something smile ).

l'éléphant now is treated as léléphant.

replacing ' with [space] we would get "l éléphant" and the "l" would be ignored.

With "don't" we would have "don" and "t"... and they would be probably both ignored.

(sorry to bother you, I'm just tring to understand)

Re: Search words are not indexed if a appostrophe is used l'éléphant

True. In the case of "l'éléphant", it would be good because the "l" doesn't affect the "meaning" of the word. On the other hand, "shouldn't" would be treated as "should" and "t" where "t" would be ignored and then it inverts the meaning of the word.

"Programming is like sex: one mistake and you have to support it for the rest of your life."

21

Re: Search words are not indexed if a appostrophe is used l'éléphant

You're right...

This is strictly language dependent, it seems very hard to find a way around this problem.

Re: Search words are not indexed if a appostrophe is used l'éléphant

Cubiq wrote:

Anyway, this is a common issue of many softwares (mysql full text search is affected by this too).

Uh ? It's the first time ever I saw anything like it... are you sure about MySQL fulltext search ?

23 (edited by Cubiq 2005-11-02 08:55)

Re: Search words are not indexed if a appostrophe is used l'éléphant

Jérémie wrote:

Uh ? It's the first time ever I saw anything like it... are you sure about MySQL fulltext search ?

Well, mysql has not the same exact issue but as far as I know the ' (apostrophe) is not a word-boundary character so the search engine is not 100% reliable. That means that "l'éléphant" is treated as a single word all together with the article and the apostrophe.

The only way I know to solve this problem is by recompiling mysql adding the ' as word-boundary (you should find tutorial about this on da' web)... or you may do some voodoo programming in PHP adding a [space] just after any apostrophe before updating your mysql table.

In punBB you just need to change an array insearch_idx.php

find:

$noise_replace =    array('',       '',      '',     '',     '',       '',       '',        '',       '',      '',     '',     '',       '',       '',        ' ', ' ', ' ', ' ', ' ', ' ', ' ', '',  '',   ' ', ' ', ' ', ' ', '',  ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', '' ,  ' ', ' ', ' ', ' ', ' ', ' ');

replace:

$noise_replace =    array('',       '',      '',     '',     '',       '',       '',        '',       '',      '',     '',     '',       '',       '',        ' ', ' ', ' ', ' ', ' ', ' ', ' ', '',  ' ',   ' ', ' ', ' ', ' ', '',  ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', '' ,  ' ', ' ', ' ', ' ', ' ', ' ');

Then you need to rebuild your search_words table.

Use at your own risk, this replaces all apostrophes with a space instead of a null character. It should work but --as Rickard pointed-- it may have some drawbacks. My opinion is that for some languages (italian, french, german, ..) the apostrophe can be safely replaced with a [space]. I'll give it a try and let you know.

PS: wow! I used to love Shadowrun! tongue

Re: Search words are not indexed if a appostrophe is used l'éléphant

Ok so it's not perfect, but at leat the seach is founded with the ' character. The thing is, right now PunBB doesn't find it if there are any apostrophes in the search string sad

25 (edited by Cubiq 2005-11-02 17:06)

Re: Search words are not indexed if a appostrophe is used l'éléphant

Jérémie wrote:

Ok so it's not perfect, but at leat the seach is founded with the ' character. The thing is, right now PunBB doesn't find it if there are any apostrophes in the search string sad

If you don't mind having 's in your search table you may just remove the single quote from the search/replace arrays... but "l'éléphant" would be stored with ' and "éléphant" would not match anyway.