Jérémie wrote:

Ok so it's not perfect, but at leat the seach is founded with the ' character. The thing is, right now PunBB doesn't find it if there are any apostrophes in the search string sad

If you don't mind having 's in your search table you may just remove the single quote from the search/replace arrays... but "l'éléphant" would be stored with ' and "éléphant" would not match anyway.

Jérémie wrote:

Uh ? It's the first time ever I saw anything like it... are you sure about MySQL fulltext search ?

Well, mysql has not the same exact issue but as far as I know the ' (apostrophe) is not a word-boundary character so the search engine is not 100% reliable. That means that "l'éléphant" is treated as a single word all together with the article and the apostrophe.

The only way I know to solve this problem is by recompiling mysql adding the ' as word-boundary (you should find tutorial about this on da' web)... or you may do some voodoo programming in PHP adding a [space] just after any apostrophe before updating your mysql table.

In punBB you just need to change an array insearch_idx.php

find:

$noise_replace =    array('',       '',      '',     '',     '',       '',       '',        '',       '',      '',     '',     '',       '',       '',        ' ', ' ', ' ', ' ', ' ', ' ', ' ', '',  '',   ' ', ' ', ' ', ' ', '',  ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', '' ,  ' ', ' ', ' ', ' ', ' ', ' ');

replace:

$noise_replace =    array('',       '',      '',     '',     '',       '',       '',        '',       '',      '',     '',     '',       '',       '',        ' ', ' ', ' ', ' ', ' ', ' ', ' ', '',  ' ',   ' ', ' ', ' ', ' ', '',  ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', '' ,  ' ', ' ', ' ', ' ', ' ', ' ');

Then you need to rebuild your search_words table.

Use at your own risk, this replaces all apostrophes with a space instead of a null character. It should work but --as Rickard pointed-- it may have some drawbacks. My opinion is that for some languages (italian, french, german, ..) the apostrophe can be safely replaced with a [space]. I'll give it a try and let you know.

PS: wow! I used to love Shadowrun! tongue

3

(10 replies, posted in PunBB 1.2 bug reports)

yes, it worked. Thank you man!

Gotta fuel your paypal account wink

4

(10 replies, posted in PunBB 1.2 bug reports)

probably there's something I'm doing wrong.

I added the word "noi" in the stopwords but if a post contains "noi." (noi+[period]), "noi" (w/o the period) is inserted into the search_words. Instead if the word "noi" appears alone (w/o period) it is correctly stripped out.

You can try to copy some of your stopwords, put a period at the end of each of them and paste into a post. Looking at the search_words table you should see all the words even if they are stopwords.

I updated to 1.2.10 and executed the "Rebuild search index" but this issue still persists... any clue?

You're right...

This is strictly language dependent, it seems very hard to find a way around this problem.

6

(10 replies, posted in PunBB 1.2 bug reports)

Rickard wrote:

Actually, that was an error. Adding it to the end of the march/replace arrays had some unwanted side-effects. I "re-removed" the period from those arrays and instead filtered it out with trim().

http://dev.punbb.org/changeset/284

ho, I see... are side effects related to the indexing of web domains? (eg: www.punbb.org)

mmmh I guess there's something else that should be done for this issue then. Words in the stopwords file followed by a "." [period] are still inserted into the DB, and two letters words followed by a period are treated as 3 letters words (eg: "dr." is inserted as "dr").

Rickard wrote:

No, because then it would be treated as two words.

well, they actually are two words (but probably I am missing something smile ).

l'éléphant now is treated as léléphant.

replacing ' with [space] we would get "l éléphant" and the "l" would be ignored.

With "don't" we would have "don" and "t"... and they would be probably both ignored.

(sorry to bother you, I'm just tring to understand)

Sorry to bump this rather old topic... but replacing ' with a [space] instead of a [null] wouldn't solve this issue?

Apostrophe is also used a lot in italian and having the apostrophed article merged with the following noun/adjective drastically reduce the efficiency of the search engine. Anyway, this is a common issue of many softwares (mysql full text search is affected by this too).

9

(10 replies, posted in PunBB 1.2 bug reports)

looking at file search_idx.php in 1.2.10 the changes pointed by Connorhd seem not to be present

        $noise_match =         array('[quote', '[code', '[url', '[img', '[email', '[color', '[colour', 'quote]', 'code]', 'url]', 'img]', 'email]', 'color]', 'colour]', '^', '$', '&', '(', ')', '<', '>', '`', '\'', '"', '|', ',', '@', '_', '?', '%', '~', '+', '[', ']', '{', '}', ':', '\\', '/', '=', '#', ';', '!', '*');
        $noise_replace =    array('',       '',      '',     '',     '',       '',       '',        '',       '',      '',     '',     '',       '',       '',        ' ', ' ', ' ', ' ', ' ', ' ', ' ', '',  '',   ' ', ' ', ' ', ' ', '',  ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', '' ,  ' ', ' ', ' ', ' ', ' ', ' ');

UPDATE: yes, adding "." and " " to the match/replace arrays works like a charm.

it would be great if alt+b and alt+i could add [ b ] and [ i ] bbcode automatically (and it would be a snap to develop). And this could be implemented with other tags as well.

11

(10 replies, posted in PunBB 1.2 bug reports)

I don't know if with "next release" you meant v1.2.10 (it would have been the fastest bug fix ever...), anyway just to let you know that the bug is still there tongue (or at least the search_words table is not fixed using the "Rebuild search index" tool).

12

(69 replies, posted in News)

analogue wrote:

In punbb-1.2.10-changed_files.zip the update file is 12_to_129_update.php. Need a fix ?

Yes, I feel that in "changed files only" the file 12_to_1210_update.php is missing... I guess you can grab the one in the full release, thou

13

(10 replies, posted in PunBB 1.2 bug reports)

I just installed PunBB and I absolutely love it!

I was working on an implemented version of stopwords file in my language (italian) but looking at the search_word table I found some oddities most of them caused by punctuation.

I added the word "noi" ("we" in italian) in the stopwords.txt but if a post contains "noi." (noi + [period]), the period is stripped but the word "noi" is inserted into the search_word table anyway.

The same strange behaviour happens when a post contains something like "word..." (word + [3 periods]). The word is stored in the database as "word.." (word + [2 periods]).

Another example is for "dr.", it's a 2 letters word, so it should not be considered but instead it is counted as a 3 letters word then the period is stripped out and "dr" (without period) is inserted into the db.

Is this a bug or a feature? smile