Search options (Page 2 of 8)

2005-03-18 16:52

okay, I've lost interest in this project, and I just don't have enough spare time to invest in this now...

I've asked Rickard if he wanted to take over and make PunMod an official component of PunBB, but he doesn't have the time either.

so I'm looking for someone to adopt the project - if you're interested, send me an email, and I'll provide the latest version of the source code. It's fairly well structured and commented, so it shouldn't be too much work to figure it out - and I'll of course be happy to answer questions and provide help to get you started.

so... any takers?

2005-03-06 22:38

Certain characters are escaped, yes - like for example ' becomes \' ... well, it wouldn't work otherwise, so of course

2005-03-02 00:43

Please note, I have only done some basic testing and it seemed to work, so it might be a bit sudden to update the downloads page yet...

2005-03-01 23:37

The script has been updated for PunBB v1.2.x - see this thread...

2005-03-01 23:36

PunBB PHP Localizer (a PHP script for editing PunBB language file sets) has been updated to support PunBB v1.2.x - download here:

http://www.mindplay.dk/temp/localizer12.zip

This utility can no longer be used for editing PunBB v1.1.x language file sets - however, there is an import feature for bakwards support, which will enable you to upgrade your 1.1.x language files for 1.2.x compatibility...

Please see instructions within the comments in "localizer.php", which will show you how to edit, create, upgrade or update your language file sets.

Please post any comments or questions in reply to this thread.

2005-02-20 10:09

I will, eventually - but for the time being, I haven't even had time to install and test PunBB v1.2 ... too busy in "real life", new girlfriend and stuff - sorry ... I will update it eventually, just don't know at the moment when I'll have time

2005-02-08 09:52

You can download it from the PunBB downloads page -

http://punbb.org/downloads.php

I haven't had time to test PunBB v1.2 yet, so I don't actually know if it works for that?

2005-01-29 14:31

do you have a specific example? what would you search and replace?

2005-01-17 14:50

yep - I need to finish the editor and make it compatible with v1.2 ... expect it'll be a while before I get there though, since my first priority at the moment is finishing PunMod; after which I will make the mod in that

2005-01-15 00:10

I'm not sure what use that would be? - it operates on entire lines only, you can't replace substrings within lines.

I could support this of course, but it might be risky - what if certain lines in newer versions contain that particular data you were searching for, but it's something you DON'T want replaced? All sorts of random things could happen if we allowed something like this ... also, there's no guarantee that mod authors would be careful enough to ensure that they don't accidentally replace stuff in five different places when they were trying to replace something in just two places.

I think I prefer the search to be as explicit as possible - it's the safest way to avoid carelessness/lazyness from the mod developers ... a search and replace command could really cause havoc.

2005-01-14 20:48

neat

I'll integrate these changes into the next release of PunMod, thanks.

Please note that PunMod is still in beta phase, and some things will change - you will probably need to update the mod for the next version of PunMod ... as a minimum, you will be required to add ##UNSQL commands, you can read more about that in this thread, if you're not already following it:

http://punbb.org/forums/viewtopic.php?pid=30716#p30716

2005-01-14 12:17

I didn't know this type of tool even existed - looks very useful ... I'm not sure I have time to get into it now though, I think I should rather concentrate on finishing PunMod first

2005-01-14 08:08

Good, then that's final

Anyone had time to wrestle with the last test release yet? I would like to know how the backup/restore system works for you, and what you think of this solution in general...

2005-01-14 08:03

Well, would there be some way to get your IDs from the query as one comma-separated string, instead of as an array? You need to pass it in comma-separated form to subsequent SQL queries anyways, maybe there is no point in using arrays in the first place?

You should maybe try adding more timers around more blocks of code, not just SQL queries, to discover exactly where the bottleneck is, and then see if there's any way to optimize on it? Of course, for starters, try timing the SQL queries, then compare against the total time taken to generate the entire page, to see how much of the total time is spent by PHP vs how much time is spent by the SQL engine?

2005-01-13 17:31

That's what I thought ...

And what do you think about dropping the ##RUN command? - I'm not too fond of this command myself, since it could cause all kinds of havoc ... besides, why would anyone really need it? If you can already make source code modiciations, install new scripts and files, and flexibly alter the database, what possible use could anyone have for the the ##RUN command?

2005-01-13 17:25

Then what is the problem? ... is it simple the number of search queries? which undoubtedly is also a lot higher on forums with that much activity. If so, maybe we could improve things by somehow caching common searches? On a huge forum with a certain topic, there must be lots of searches that get repeated over and over again?

2005-01-13 16:57

Maybe the simplest solution would be simply to enforce SQL commands always appearing in two - one for adding the modification, one for removal, e.g.:

##SQL mysql

ALTER TABLE `punbb`.`users`
  ADD COLUMN `mytext` text NULL;

##UNSQL mysql

ALTER TABLE `punbb`.`users`
  DROP COLUMN `mytext`;

Maybe this would be the most flexible and simple solution. Any comments on this?

2005-01-13 12:30

Then how about indexing topics instead of posts? In search_matches, we currently have:

post_id <-> posts.id
word_id <-> search_words.id
subject_match

Suppose we now change it to:

topic_id <-> topics.id
post_id -> posts.id
word_id <-> search_words.id
subject_match

And then store only one record per topic, instead of one per post - the post_id you store for every topic, would be simply the last occurence of the word in each topic ... the way that "search for topics" works, it only displays each topic once, so the information about which other posts may have contained the same word in the same topic, is essentially never used, and as far as I can figure, could be safely discarded?

Of course then, "search for posts" wouldn't be possible - but the output from this mode of search is extremely confusing to browse anyways, as everything is displayed out of context ... I don't know anyone who uses that feature.

Even so, if you wanted to preserve this feature, you could do both - have two tables, one that indexes posts and one that indexes topics ... of course, this would make your search tables possibly take up a considerable amount of extra space. But searches for topics is definitely a lot more common than search for posts - I would guess at least 90% of all searches are probably searches for topics; if those 90% of all searches were working on a table that is probably one fourth of the size (assuming an average 4 posts per topics, which is probably low anyways), that should speed things up considerably.

What do you think?

2005-01-12 21:13

Here's the current version:

http://www.mindplay.dk/temp/punmod095b2.zip

Be sure to test this version on a test installation of PunBB - please test at your own risk.

Be sure to read the instructions on the pages, in order to fully understand what's going on - the backup/restore system is a major change from the way previous versions worked, and has not been tested extensively yet, so be careful please.

The first thing you should do, is perform a backup, and then a restore, to verify that the restore operation is working and has all the permissions it needs to complete - once you've seen both backup and restore working, it should be safe to try applying the included mods; none of the included mods make any database modifications, only source code changes. The ##SQL and ##RUN commands are still supported at the moment, but will almost certainly be deprecated.

Important: you're welcome to test this on PunBB v1.2 if you like, but most likely the mods supplied with this distribution will work for v1.1.x only, because of the extensive changes in the v1.2 source code. PunMod has not yet been tested on v1.2.

Lastly, to reiterate: test at your own risk

2005-01-12 20:29

I'm still here

Major changes have been made to PunMod since the last release - it now works with a transparent backup system, which means it starts out by archiving your whole PunBB installation to a single backup file, from which it restore dynamically; if you don't understand what this means, it basically means you can now add/remove sourcecode changes fully dynamically, e.g. add/remove mods as you please, or undo all mod changes and revert to your original PunBB installation.

However, major work lies ahead - one of the next things I have to do, is of course test and integrate it with PunBB v1.2.

But there is also a major challenge in making database commands which can fully support both install and uninstall.

What this will probably mean, is that commands like ##SQL and ##RUN will have to be dropped. This may sound pretty drastic, but both are virtually impossible to undo, and therefore can't possibly be properly supported. Instead, the ##ADD command will be replace with something more flexible, allowing for table/column/index creation with automatic undo.

Also done is a custom archive format support library (using zlib), currently only used for backup/restore, but which will be used for single-file mod support. Also, subfolders under the punmods folder is supported now, so you can already organize mods and their dependency files more neatly - not that this will be necessary really, once single file "mod archives" are fully supported.

Anyways, that's the status at the moment.

I'll put up a newer version for testing the new improvements at your request.

2005-01-11 11:02

How about separating into individual, numbered tables? Keeping maybe 10.000 records per table - since MySQL uses one physical file per table, maybe this will speed things up, as you will also get a considerably smaller index on each table?

2005-01-11 08:29

I meant, adding another indexed column, containing 0 for ids 0-999, 1 for ids 1000-1999, 2 for ids 2000-2999 etc. - this should give a small index for this column, which might help locate the correct group of records faster ... probably the underlying implementation already makes similar optimizations though

2005-01-10 22:42

Hmm, well still - there might be room for optimization.

I examined the SQL query times, and broke down some of the joint queries to see which part of them is taking up the time, and it seems (as you would expect) the bottleneck is the lookup in search_matches ... the bottleneck operation seems to be:

SELECT m.post_id FROM search_matches AS m WHERE m.word_id=...

So I wonder, can we speed this up somehow?

Some ideas: use a secondary index to group the records in thousands, or maybe tenthousands - I'm not sure if this would just add more overhead, what do you think?

Or, separate the records into multiple tables with a thousand or tenthousand records in each - this way you'd get smaller indexes. Although again, this might just add more overhead?

Maybe such "clever tricks" have already all been tried...?

2005-01-10 21:18

Well hey, will you look at that ...

I guess I assumed wrong - even though the search word base is 30% smaller, this apparently makes no real difference in terms of performance at all. In fact, with the stemmer mod enabled, it's slightly slower - I guess the stemming operation itself takes a little time too.

To profile the search engine, I'm executing 20 specific searches in row, and measuring the total time taken to execute each - the 20 searches are executed over again 10 times, and the average total time for those 20 searches calculated.

For added accuracy, I'm executing this test 10 times over, and the shortest total average execution time is the one I use.

Shortest execution time with the stemmer mod enabled: 4.090 seconds

Shortest execution with original search engine: 3.981 seconds

So it's about 2-3% slower.

2005-01-10 20:14

Yes - I would add an indexed column to the 'search_words' table, containing the metaphone representation of the words.

Why is the search feature unattractive? It is working very well for us - we've got 11.000 posts and only about 15.000 words in the search index, that's not even 1.5 words per post, that's pretty good ... and it doesn't seem to grow significantly anymore.

I've tried to rebuild the search index with and without the stemmer mod - with stemming, the search index for our body of 11.000 posts is 15.026 words, without stemming it's 20.789 words, which is about 30% more ... which probably means a speed increase of 30%.

I may have some other ideas for optimization, I have to do some tests - it has to be a generic solution though, something that works for all SQL engines. I'll let you know if I find anything useful.

Search options (Page 2 of 8)

Posts found: 26 to 50 of 194

26 Reply by mindplay 2005-03-18 16:52

Re: Announcing: PunMod beta v0.9.2 (72 replies, posted in PunBB 1.2 discussion)

27 Reply by mindplay 2005-03-06 22:38

Re: PunBB PHP Localizer for PunBB v1.2.x (beta) (5 replies, posted in PunBB 1.2 modifications, plugins and integrations)

28 Reply by mindplay 2005-03-02 00:43

Re: PunBB PHP Localizer for PunBB v1.2.x (beta) (5 replies, posted in PunBB 1.2 modifications, plugins and integrations)

29 Reply by mindplay 2005-03-01 23:37

Re: PunBB Translation beta 1 (15 replies, posted in PunBB 1.2 discussion)

30 Topic by mindplay 2005-03-01 23:36

Topic: PunBB PHP Localizer for PunBB v1.2.x (beta) (5 replies, posted in PunBB 1.2 modifications, plugins and integrations)

31 Reply by mindplay 2005-02-20 10:09

Re: Announcing: PunMod beta v0.9.2 (72 replies, posted in PunBB 1.2 discussion)

32 Reply by mindplay 2005-02-08 09:52

Re: PunBB Translation beta 1 (15 replies, posted in PunBB 1.2 discussion)

33 Reply by mindplay 2005-01-29 14:31

Re: Announcing: PunMod beta v0.9.2 (72 replies, posted in PunBB 1.2 discussion)

34 Reply by mindplay 2005-01-17 14:50

Re: Basic post editor (work in progress) (16 replies, posted in General discussion)

35 Reply by mindplay 2005-01-15 00:10

Re: Announcing: PunMod beta v0.9.2 (72 replies, posted in PunBB 1.2 discussion)

36 Reply by mindplay 2005-01-14 20:48

Re: Multigroup mod for PunBB 1.2 (20 replies, posted in PunBB 1.2 modifications, plugins and integrations)

37 Reply by mindplay 2005-01-14 12:17

Re: Improved search with Porter Stemmer Mod (40 replies, posted in PunBB 1.2 modifications, plugins and integrations)

38 Reply by mindplay 2005-01-14 08:08

Re: Announcing: PunMod beta v0.9.2 (72 replies, posted in PunBB 1.2 discussion)

39 Reply by mindplay 2005-01-14 08:03

Re: Improved search with Porter Stemmer Mod (40 replies, posted in PunBB 1.2 modifications, plugins and integrations)

40 Reply by mindplay 2005-01-13 17:31

Re: Announcing: PunMod beta v0.9.2 (72 replies, posted in PunBB 1.2 discussion)

41 Reply by mindplay 2005-01-13 17:25

Re: Improved search with Porter Stemmer Mod (40 replies, posted in PunBB 1.2 modifications, plugins and integrations)

42 Reply by mindplay 2005-01-13 16:57

Re: Announcing: PunMod beta v0.9.2 (72 replies, posted in PunBB 1.2 discussion)

43 Reply by mindplay 2005-01-13 12:30

Re: Improved search with Porter Stemmer Mod (40 replies, posted in PunBB 1.2 modifications, plugins and integrations)

44 Reply by mindplay 2005-01-12 21:13

Re: Announcing: PunMod beta v0.9.2 (72 replies, posted in PunBB 1.2 discussion)

45 Reply by mindplay 2005-01-12 20:29

Re: Announcing: PunMod beta v0.9.2 (72 replies, posted in PunBB 1.2 discussion)

46 Reply by mindplay 2005-01-11 11:02

Re: Improved search with Porter Stemmer Mod (40 replies, posted in PunBB 1.2 modifications, plugins and integrations)

47 Reply by mindplay 2005-01-11 08:29

Re: Improved search with Porter Stemmer Mod (40 replies, posted in PunBB 1.2 modifications, plugins and integrations)

48 Reply by mindplay 2005-01-10 22:42

Re: Improved search with Porter Stemmer Mod (40 replies, posted in PunBB 1.2 modifications, plugins and integrations)

49 Reply by mindplay 2005-01-10 21:18

Re: Improved search with Porter Stemmer Mod (40 replies, posted in PunBB 1.2 modifications, plugins and integrations)

50 Reply by mindplay 2005-01-10 20:14

Re: Improved search with Porter Stemmer Mod (40 replies, posted in PunBB 1.2 modifications, plugins and integrations)

Posts found: 26 to 50 of 194