Topic: most used words in topics?

is there a way (mod/hack/etc) to do a 1 time search of all the posts and come up with a list of the top 10/20/50/100 words used? i have a project in mind that this info would be useful. of course it would have to adhere to the normal search criteria and functionality ie: use the stop words list and forum specific search.

any help is appreciated.

~James
FluxBB - Less is more

2 (edited by Jansson 2007-03-13 17:22)

Re: most used words in topics?

This query should do the trick:

SELECT COUNT(m.post_id), w.word FROM search_matches AS m INNER JOIN search_words AS w ON m.word_id = w.id GROUP BY m.word_id ORDER BY COUNT(m.post_id) DESC LIMIT 100

Change limit to how many results you want.

Re: most used words in topics?

Dr.Jeckyl wrote:

is there a way (mod/hack/etc) to do a 1 time search of all the posts and come up with a list of the top 10/20/50/100 words used? i have a project in mind that this info would be useful. of course it would have to adhere to the normal search criteria and functionality ie: use the stop words list and forum specific search.

any help is appreciated.

What Jansson said, keeping in mind that the query is not per forum.

Edited, as per what elbekko said

Re: most used words in topics?

It should hide stopwords, as they're not included in the table?

Re: most used words in topics?

elbekko wrote:

It should hide stopwords, as they're not included in the table?

Right, I wasn't thinking there for a second wink

Re: most used words in topics?

ok i tried this in both the db management plugin and through phpmyadmin and got an error on both. this is right after a repair/optimize session and a rebuild of the search index only a day earlier. here is the error phpmyadmin gives me:

Error

SQL query: Documentation

SELECT COUNT( m.post_id ) , w.word
FROM search_matches AS m
INNER JOIN search_words AS w ON m.word_id = w.id
GROUP BY m.word_id
ORDER BY COUNT( m.post_id ) DESC
LIMIT 100

MySQL said: Documentation
#1111 - Invalid use of group function

the db plugin only gives a generic "SQL Error". any ideas?

~James
FluxBB - Less is more

Re: most used words in topics?

Worked for me:

mysql> SELECT COUNT( m.post_id ) , w.word
    -> FROM search_matches AS m
    -> INNER JOIN search_words AS w ON m.word_id = w.id
    -> GROUP BY m.word_id
    -> ORDER BY COUNT( m.post_id ) DESC
    -> LIMIT 100;
+--------------------+----------+
| COUNT( m.post_id ) | word     |
+--------------------+----------+
|                  2 | moved    |
|                  1 | needed   |
|                  1 | data     |
|                  1 | section  |
|                  1 | fairly   |
|                  1 | spam     |
|                  1 | maybe    |
|                  1 | old      |
|                  1 | strategy |
|                  1 | hosting  |
|                  1 | bots     |
|                  1 | put      |
|                  1 | imported |
|                  1 | improved |
|                  1 | links    |
|                  1 | flooding |
|                  1 | portal   |
|                  1 | forum    |
|                  1 | and      |
|                  1 | page     |
|                  1 | us       |
|                  1 | again    |
|                  1 | removed  |
|                  1 | new      |
|                  1 | posting  |
|                  1 | anyway   |
|                  1 | doubt    |
|                  1 | brand    |
|                  1 | guests   |
|                  1 | enjoy    |
+--------------------+----------+
30 rows in set (0.06 sec)

Re: most used words in topics?

I don't know what's wrong with Jansson's query, if anything, but you might try Rickard's query for the same purpose.

Looking for a certain modification for your forum? Please take a look here before posting.

Re: most used words in topics?

Rickard's query seems a fair bit slower (both are on the same DB):

mysql> SELECT sw.word, COUNT(sm.post_id) AS hits
    -> FROM search_words AS sw
    -> INNER JOIN search_matches AS sm ON sw.id = sm.word_id
    -> GROUP BY sw.id
    -> ORDER BY hits DESC LIMIT 50;
+----------+------+
| word     | hits |
+----------+------+
| moved    |    2 |
| enjoy    |    1 |
| posting  |    1 |
| and      |    1 |
| imported |    1 |
| maybe    |    1 |
| guests   |    1 |
| new      |    1 |
| forum    |    1 |
| put      |    1 |
| spam     |    1 |
| removed  |    1 |
| portal   |    1 |
| bots     |    1 |
| fairly   |    1 |
| brand    |    1 |
| again    |    1 |
| flooding |    1 |
| hosting  |    1 |
| section  |    1 |
| doubt    |    1 |
| us       |    1 |
| links    |    1 |
| strategy |    1 |
| data     |    1 |
| needed   |    1 |
| anyway   |    1 |
| page     |    1 |
| improved |    1 |
| old      |    1 |
+----------+------+
30 rows in set (0.10 sec)

Re: most used words in topics?

that worked. thanks guys. big_smile

~James
FluxBB - Less is more