26

Re: The search function

Just on a slight side track with regards to this, are the db tables search_cache, search_matches and search_words only used for single byte character sets, i.e: not UTF-8? If so, and everything is running and converted to using UTF-8, code wise, is it safe to drop those tables and any reference to them throughout the code?


Cheers.

Re: The search function

At this moment PunBB 1.3 search cannot find really UTF-8 words. It's ok with 8bit only sad
I've test Russian: http://www.pundemo.org/post67.html

DigitalOcean: VPS from $5/mon. Get $10 bonus!.

Re: The search function

When searching for '*????*' it worked.

Re: The search function

It should be noted, of course, that part of the purpose of a beta would be to discover issues like this wink

Re: The search function

It should further be noted that upon investigation, I discovered that the problem was most likely due to 1.3's UTF-8 support not being entirely finished. Something that it would be for the beta. wink

Re: The search function

Hi good you fix the broken punbb search, it ages every week on our busy forum.. and never finds relevant posts in proper order, no matter how often I rebuild the 50MB?! index.

Mysql fulltext search is neat, it has some halfdocumented quirks though, some common signs like - and & are not counted as characters belonging to words, and there may be charset limitations in some cases.
(my word was "Ki-Duk" it didn't match with word length 4 (default) but it did with wordlength 2, I searched with "Ki-Duk" (with the quotes) so it should have matched with 4ft world length, but - isn't treated as part of a word)

A few tips: use " AGAINST ('xxx' IN BOOLEAN MODE)" and set the mysql min ft word length to 2 characters, chinese have to put it to 1.
this automatically allows "" and + - searches, it's lovely, it will blow punbb's old method out of the water even with the quirks.

Per default it matches utf8 fine on our hacked to utf8 1.2 board. it treats accented letters the same as non accented (you want this!), untill you double quote the search.

Is there a hack/mod that enables fulltext search in 1.2 available btw?

Re: The search function

oh just tested something, seems we forgot chasing content we injext with the backend through the search indexing code , so my complaint for 1.2 seraching may not be all that grounded ;p

fulltext search rules though.

Re: The search function

pheldens wrote:

Mysql fulltext search is neat, it has some halfdocumented quirks though, some common signs like - and & are not counted as characters belonging to words

I don't know a language where these glyphs are part of the alphabet, do you?

- is a word separator, and & is here a logical operator. It make sense to me that they behave differently.

(my word was "Ki-Duk" it didn't match with word length 4 (default) but it did with wordlength 2, I searched with "Ki-Duk" (with the quotes) so it should have matched with 4ft world length, but - isn't treated as part of a word)

Again, to me, it make sense.

But I'm guessing it should be possible to ask MySQL to behave as one need.

34 (edited by pheldens 2008-01-19 13:07)

Re: The search function

Jérémie: it may make sense to you, but it prevents you from  '"Ki-Duk"' searches with the default ft index word length (4) which is counterintuitive. and 'Ki-Duk' doesn't return anything either because the split words are sub 4.

Just wanted to point out this behaviour to save you some headache. According to the mysql bugtrack forum - and & and ' can be hacked to be parts of words, but this can have consequences for mysql operation. the only clean solution is in mysql 5.1+ via plugin module interface.

35

Re: The search function

Jérémie wrote:

- is a word separator, and & is here a logical operator.

In proper context of the English language, however, & is also a word unto itself, and the hyphen is a word join/combine, rather than a separator. I don't know regarding other languages, but the hyphen is perfectly legitimate occuring somewhere within a word, and some words are actually incorrect without it's use.

Code and language are two separate things. big_smile

36 (edited by artoodetoo 2008-01-25 16:49)

Re: The search function

elbeko wrote:

When searching for '*????*' it worked.

oh, jesus! strange solution, isn't it?

why it doean't find "????"? it's normal 4-character length word.

imho, fulltext search isn't an "absolute weapon". i know how to mod the old word-table-based search. but fulltext search is blackbox with it's own rules that i cannot moderate. most of forum owners cannot modify mysql settings on hosting and, of course, they didn't know what they have to do.

so, [i think] it's better to leave both options for mysql: the traditional and the new one.

DigitalOcean: VPS from $5/mon. Get $10 bonus!.

Re: The search function

Yup, in fact you're right, I was mostly wrong.

It make some sense for MySQL to handle these glyph in a specific way, but the end user shouldn't have to read the MySQL manual to perform a search (not that I'm now against fulltext in 1.3, the old system was slow, resources hungry, and bugged).

I don't have time to test several configuration of MySQL, but I'm sure there's a way to have him behave nicely.

38

Re: The search function

It's not so much a case of being wrong. smile There are just so many differences between languages that practically every character will have some legitimate use in some language. You do have to remember that we English are awkward devils too, due to the fact that our language consists of bits from just about every common language there is/was. big_smile big_smile

Re: The search function

MattF, what did you say? Is it "take it easy" or "there is a place for recast"?

DigitalOcean: VPS from $5/mon. Get $10 bonus!.

40

Re: The search function

artoodetoo wrote:

MattF, what did you say? Is it "take it easy" or "there is a place for recast"?

Whereabouts? smile If you can quote the text, I'll let you know. big_smile

Re: The search function

Hi, I'm using the punres fulltext search mod in 1.2 now, and dropped 50MB search_  tables.

http://www.punres.org/viewtopic.php?id=1677&p=1