Topic: Regarding search_matches
I'm currently testing a somewhat hacked up version of PunBB 1.2.6 for an internal project. Currently we have about 2k test users, but expect to be seeing from 10k-25k users once we are done testing. I've noticed a problem with the searching and have traced it back to the results stored in the search_matches table. For whatever reason, there are many omissions in search_matches. For example, I ran an ugly slow query to find all the posts with the word 'dazed' in it (using WHERE message LIKE '%dazed'%') and got 5 rows, one of which is a subject match. However, searching the search_matches table for the word_id corresponding to 'dazed' returns only 2 results. I tried to take one of the omitted results, copy its message and subject verbatim, and post it again in a test forum. This copy is then properly indexed in search_matches, which seems to indicate there is nothing within the actual text of the affected posts that is tripping up the search index updater. So I decided to totally rebuild the search index, which yielded the very same omissions in search_matches as before.
I checked search_idx.php against a stock copy just to see if I made any mistake (I made one very minor change to the noise_filter and noise_match arrays that isn't causing this), but that isn't the problem. I also have not made any changes regarding search word indexing in any of the scripts nor have I modified the default table layout of any of the search-related tables. I see no consistency in the search matches that are not properly entered in the search_matches table, but I'm happy to provide additional data if requested.
Any assistance is greatly appreciated, as currently I'm using an ugly hack to make search work (the aforementioned SELECT ... WHERE message LIKE '%keyword%') and because of the load it can cause I'm having to impose a very large throttle on the search rate limit for normal users. We are testing on a fairly meager box before moving to production, and database load is a very important issue (both now, given the low-powered test machine, and later because people will be breathing down my neck if I'm eating too many resources).
Thanks in advance!
Edit: Probably should mention for posterity that we are using MySQL 3.23.58.