1 (edited by cuteseal 2005-03-22 10:23)

Topic: Related Topics

I created a related topics mod for punbb 1.1.4...

http://www.shuttertalk.com/forums/

Click on any topic, and down the bottom of the page, it'll display some related topics.

http://www.shuttertalk.com/forums/images/upload/related.gif

I know most people would probably be on 1.2.x by now, so I won't bother posting the mod until I port my board to 1.2.x.  However, if anyone is interested in having a look at the code, send me an email. big_smile

Digital photography news, reviews, discussions and more!
http://www.shuttertalk.com

The online bible for all
http://www.publicbible.com

Re: Related Topics

Cool. Might I ask how it works? I mean, how do you determine what topics are related?

"Programming is like sex: one mistake and you have to support it for the rest of your life."

Re: Related Topics

Hey Rickard,

Sure thing.  I was going to reuse your search algorithm, but decided for a quick and dirty solution instead.

I use the mysql MATCH ... AGAINST... function, which does a fulltext search.  I concatenate the subject and about the first 10 words of the message and use that as the search string.

To do fulltext searching I also had to add two fulltext indexes to the topic subject and post message fields.  Supposedly, there's a pretty hefty performance hit, but oh well, it's there to be used, right? big_smile

I can send you the code if you want...

Digital photography news, reviews, discussions and more!
http://www.shuttertalk.com

The online bible for all
http://www.publicbible.com

Re: Related Topics

Ah, OK. You don't have to send it. I was just curious smile

"Programming is like sex: one mistake and you have to support it for the rest of your life."

Re: Related Topics

No worries! big_smile

Digital photography news, reviews, discussions and more!
http://www.shuttertalk.com

The online bible for all
http://www.publicbible.com

Re: Related Topics

I know you stated that its for 1.1.4, but will you have a compatable version for 1.2.x ?

Re: Related Topics

looks like it big_smile

I know most people would probably be on 1.2.x by now, so I won't bother posting the mod until I port my board to 1.2.x.

Re: Related Topics

I've been thinking a bit on the related topic algorithm.  At the moment, I just get the subject and the first 10 words from the message body, and use those as keywords for searching for related posts.

Can anyone suggest a smarter algorithm?  I guess it would be something like:
1.  Get a list of all words in post
2.  Filter out stopwords, non-alphabetic characters, and words less than 4 characters
3.  Get 5/10/20? highest occurring words
4.  Do search with those words as keywords

An extension would also be to get all words in all posts in that thread, not just the first post.

Hm.. alternatively, I'm wondering whether the mysql match function can be passed the entire content of the post -- let it figure out which are the important words.  That's the function of a fulltext search, right?

Any thoughts, people?

Digital photography news, reviews, discussions and more!
http://www.shuttertalk.com

The online bible for all
http://www.publicbible.com

Re: Related Topics

It may be great to not only generate a list of single words, but also a list of phrases of two- and three-words, of course after first filtering "common" words such as prepositions, numbers, symbols, pronouns, and such. An additional thing you could do is weigh each keyword (be it a single word or a phrase), based on some algorithm, for example:

w_ik = (-log(N_k/N))(f_k)(w_f)

(Y. Bao, S. Aoyama, X. Du, 2002)

where w_ik is the weight of keyword k in post i, N_k is the number of posts containing the keyword k, N is the number of posts in total, f_k is the frequency of keyword k in post i, and w_f denotes the importance of a keyword (perhaps two-word phrases are more important than single-word keywords).

Of course, with the above weighing, you may need to redefine how you "match" for keywords. smile