Topic: index update issues

I am fetching news from rss feeds and post them as new posts. This is working pretty fine and was not so dificult to code.
Now i am facing some problems with the indexing. The posts are not indexed and therefore dont apear as search results.
An index update takes sometime. Now i dont have so much users/traffic, but i guess it can cause some errors if i have a lot of visitors online.

Any tips on how to optimize the updating process?
like... if i uncheck the "empty before update" ant then update will it only add stuff to the index?
can i do it via shell so i can automate the process with a crontab?

also... whats the index itself? is it a btree database index? is it a set of files? wich kind of files?

Re: index update issues

require PUN_ROOT.'include/search_idx.php';
update_search_index('post', $new_pid, $message);

2 lines of code to properly add stuff to the search index.
I'm not exactly sure what issue you have with the search indexing, it shouldn't take too much time tongue
And as for what it is...
http://punbb.org/docs/dev.html#search_matches
http://punbb.org/docs/dev.html#search_words

3 (edited by pedrotuga 2006-10-14 16:56)

Re: index update issues

OMG!
I had no idea... does this index every damn single word?

Ok... so what should i do? This sound prety logical, but i have zero experience on searches and indexes... ths whet i have in mind... Correct me if this is somthing that will pull my server down.

1. put all the words one by one in an array
2. Check if they exist, if they dont insert them
3. insert the search matches

now... sorry my inexperience... now.. is the "word" column unique? if it is i could just insert every word and let the unique restrition on the database level do the rest. Or should i "select ... where word='blabla' " and then check if any row is returned? That appears to take more time.

help!

EDIT: crap... didnt even red the first two lines of your post. lol. sorry. Anyway... i am still curious about all these searches are performed... so... if it's not to much effort i still what to know the answers to the questions i just made.

Re: index update issues

Well, look at include/search_idx.php, that should answer all of your questions wink
But your general idea is right smile

Re: index update issues

yep...
pretty interesting. Simple and killer smile

I have one more question thogh. I am using html in the rss news, i mean, i bypass the all parsings if posting news from rss so the orignal html can be rendered.
This is ok... but i guess i can't just index html source code. So i need to posted unparsed then parse it afterwards so it can be indexed.
So... i am a bit confused... i went throught the source code and i noticed the messages go through a lot oof parsing ( bbtags, html, autolink, etc )  , so i dont really know where to trigger the indexing.

like... will a parse_message($raw_html) be enough before indexing?
I checked the manual and its kind of weird.. isnt there any function that  simply takes away the html tags rather then enconding or scaping them?

Re: index update issues

pedrotuga wrote:

I checked the manual and its kind of weird.. isnt there any function that  simply takes away the html tags rather then enconding or scaping them?

strip_tags()

Looking for a certain modification for your forum? Please take a look here before posting.

Re: index update issues

pogenwurst wrote:
pedrotuga wrote:

I checked the manual and its kind of weird.. isnt there any function that  simply takes away the html tags rather then enconding or scaping them?

strip_tags()

ok... that is exactly what i am not looking for.
that escapes all the html tags so they can be displayed on a webpage. I am looking fora function that really takes away all html tags

Re: index update issues

pedrotuga wrote:
pogenwurst wrote:
pedrotuga wrote:

I checked the manual and its kind of weird.. isnt there any function that  simply takes away the html tags rather then enconding or scaping them?

strip_tags()

ok... that is exactly what i am not looking for.
that escapes all the html tags so they can be displayed on a webpage. I am looking fora function that really takes away all html tags

I think you're confusing htmlspecialchars with strip_tags. strip_tags does exactly what its name says it does: strips out HTML tags.

Re: index update issues

From the manual entry:

This function tries to return a string with all HTML and PHP tags stripped

Looking for a certain modification for your forum? Please take a look here before posting.