Topic: [HOWTO] - Create a stopwords list
The information below only applies to language pack authors.
All PunBB language packs contain a list of stopwords. These words are words that help us humans communicate, but don't nessecarily "mean" anything or doesn't have any real "search value". The english word "the" is a classic example. Other classics are that, this, for and with. Stopwords are excluded from the search index and thus makes search faster and makes PunBB take up less space in the database. Stopwords are sometimes referred to as noisewords. Here's how to create a stopwords list:
1. Try to obtain a list of the most common words in your language (google is your friend here). This step isn't nessecary, but it can help you a lot. See "Alternative to 1" below if you are unable to find such a list.
2. Go through the first 100 words or so and pick out words that you consider to be stopwords. Words that are shorter than three characters should be left out of the stopwords list as they are ignored by the search engine anyway. Also, stopwords must not contain spaces, quotes or any other "special characters".
3. You should now have a list of anything between 20 and 200 words. If your list is shorter than 20 words or longer than 200 words, start over :)
Alternative to 1. If you already have a forum setup with posts in your language, you can run a database query to determine what the most common words are in your forum. The query looks like this:
SELECT sw.word, COUNT(sm.post_id) AS hits FROM search_words AS sw INNER JOIN search_matches AS sm ON sw.id = sm.word_id GROUP BY sw.id ORDER BY hits DESC LIMIT 50
The query will display the 50 most common words currently in the forum. You can then use that list to determine what words should be included in the stopwords list. Please note that not all words in that list are stopwords. Some words might be very common even though they aren't stopwords.
Important! The stopwords list in your language should NOT just be a translation of the English stopwords list. It should be a list of words that are considered stopwords in your language. A lot of stopwords in one language are also stopwords in another language, but just translating the English stopwords list doesn't help at all. It only makes things worse.