I've been using mice since they were invented, and my hands are pretty fucked up - so thanks, but I'll have to pass on that. Unless ... well, maybe I could just hang one on my wall. Let me think .. wink

How about this: smarter paging - so that it takes at 2-3 replies to generate a new page ... like for example, if you're displaying 10 posts per page, and 12 posts have been made, all 12 posts will be displayed on page 1, and there won't be a page 2 until the next post is made ... this was you'd never get a last page with only a single post or two on it, which is sometimes pretty annoying.

Good idea / bad idea? Perhaps I'm the only one who's ever been annoyed by that ... smile

153

(54 replies, posted in Feature requests)

pgregg wrote:

5. This isn't such a big hit on performance (or in php code) to do. See: http://www.pgregg.com/projects/encode/htmlemail.php  Source code available on that page.   If Rickard wants to use the code there, please do.

Unfortunately, this particular example probably won't increase safety much - lots of harvesters probably already decode %xx to chars before scanning for email addresses ... it's the simplest and most widely used hack, and it's only a few lines of code to decode all chars, before scanning the source for email addresses ... you must assume they're pretty dumb? I expect they'll put in at least a TINY bit more effort than that wink

For added safety, you have to make the source unreadable, which means you have to format the data, in the source, as something that can't be read directly out of the HTML source - what I did, for example, is a very simple solution, but will require the script to be parsed and executed ... probably the safest would be a technique similar to yours, but with some sort of added twist, for example XOR'ing all the values, or just adding one or subtracting two from each value - already then, the script would have to be parsed and executed to decode the result.

It's largely just a translation, as far as I can tell - the original was written in Java.

155

(9 replies, posted in PunBB 1.2 discussion)

Mod posted! smile

Here's a mod, which will integrate a Porter Stemmer into PunBB, thus
giving better search results and reducing the size of the search index
tables - see below for details. To try the enhanced search, go here.

Comments welcome! :)

Stemmer Mod version 1.3
===========

for PunBB v.1.1.x

Changes from initial release: problems would occur when rebuilding
the search index - fixed. Default search behavior "and" was added.
Fixed warnings when editing a post.

This modification is for english language boards only - it will
have no effect on boards where $language is not set to "en" in
"config.php".

By stemming keywords (reducing words to their basic form), you will
get improved search results - for example, a search for "explosive"
will also include posts with the word "explosion", and vice versa.
And as a side effect, your search tables will also become smaller,
as "explosion" and "explosive" will only be stored as one entry in
the keyword table; you would assume that this would also make
searches execute faster, but this is apparently not the case -
although my tests show a 30% smaller search table, there is also
some added overhead from the stemming operation itself, which
means that this mod makes searches about 2-3% slower.

To make this modification, first download the Stemmer class here:

http://www.chuggnutt.com/stemmer.php

Place the "class.stemmer.inc" file in your PunBB home folder, and
rename it to "class.stemmer.php" to avoid security issues.

Open "include/search_idx.php" and find this section of code:

    // Split old and new post/subject to obtain array of 'words'
    $words_message = split_words($message);
    $words_subject = ($subject) ? split_words($subject) : array();

Insert the following section of code after it:

    // Stem words:
    global $language, $stemmer;
    if ($language == 'en') {
        $words_message = $stemmer -> stem_list ($words_message);
        if (!$words_message)
            $words_message = array();
        $words_subject = $stemmer -> stem_list ($words_subject);
        if (!$words_subject)
            $words_subject = array();
    }

Now go to the top of the file, and find this statement:

if (!defined('PUN'))
    exit;

Insert the following code after it:

// Initialize the Stemmer:
if ($language == 'en') {
    global $stemmer;
    require ('class.stemmer.php');
    $stemmer = new Stemmer();
}  

Now open "search.php" and find this:

                    // Split up keywords
                    $keywords_array = preg_split('#[\s]+#', trim($keywords));

And insert the following code after it:

                    // Stem keywords
                    if ($language == 'en') {
                        require ('class.stemmer.php');
                        $stemmer = new Stemmer();
                        $keywords_array = $stemmer -> stem_list ($keywords_array);
                        unset ($stemmer);
                    }

That's it - now go to the admin control panel, and rebuild the search index!

Lastly, I would recommend the following small change - open "search.php" and
find this statement:

                $match_type = 'or';

Change it to:

                $match_type = 'and';

This will change the default search behavior to "and", which is how nearly
all other search engines work by default - the user will expect to be able
to narrow down the search by entering my keywords, as opposed to using "or"
by default, which will widen the search by entering more keywords; that is
not how other search engines work, e.g. Google uses "and" by default.

157

(9 replies, posted in PunBB 1.2 discussion)

I don't see how it could be confusing? The search feature as such won't change from the users point of view, at all - it'll simply function better, probably a lot more like the user would expect.

And there's no need to change any tables or alter the behavior of the code at all, the only change will be a couple of calls to the stemmer here and there.

I'll post a mod.

do they have one with a gel bump for the wrist? I'm a big fan, and I'd replace my mat instantly, but I am also concerned with my health wink

159

(54 replies, posted in Feature requests)

Frank H wrote:

some of the 'unsubscribe from spam list' things really seems to work ... but there are rumors that these is a 'confirmation' thingie wink

I'd say about 90% of them are. But it's probably not a big deal as you're using Hotmail, I hear it has really good spam filtering smile

160

(9 replies, posted in PunBB 1.2 discussion)

As an option, sure - why not? most boards are in english anyways, but of course it should be entirely optional, and still work without it. You have an extremely fast search engine - apparently faster than in most PHP based boards - why not make it great as well? smile

161

(54 replies, posted in Feature requests)

So you're still considering it, but nothing can affect your decision - yeah, that makes sense wink

162

(9 replies, posted in PunBB 1.2 discussion)

You're so full of positive energy, Rickard - a real inspiration.

wink

163

(54 replies, posted in Feature requests)

Frank H wrote:

so, what are the proofs that using javascript really 'disables' the spambots?

well, we've been using the JavaScript cloak on this site for nearly three years now, and I have received very little spam on my email address which is shown there; sometimes maybe 1-2 mails per week or so, which is very good by today's standard, and I'm pretty sure these came from other sites where my email address might have been displayed. We do not use any kind of spam filtering on our server. I get as many virus mails as the next guy of course, but this will happen regardless, as anyone with your email address in their address book who gets hit by a virus, will start distributing your email address to other hosts and infect those etc., but that's another discussion really.

Frank H wrote:

I would never ever publicate an email adress of mine on the internet if I never want spam in that...

you and I are smart enough to know that this is the only way to be truly safe, but most people are not - so the question is, do we care about those less fortunate people, or is "their own damn fault"? I am passionately against spam, and anything I can do to screw things up for the spammers, I am happy to do wink

Frank H wrote:

but, I don't see why one javascript would make is "safe" ... safety is more or less just a time issue IMHO ... how long will it take for those that can 'sense' that this kind of javascript is in action, and still parse the email adresses? I don't think too long... as spammers make money, and they want to get new fresh mails ... and when parsing... it's easy to find that javascript ...

easy to find, but certainly not easy to run - I doubt if any of them will really make their own JavaScript parser just to get those few extra email addresses ... there are MASSES of completely unprotected pages from which they can rip billions of email addresses daily; a few hundred or even a few thousand email addresses won't make any difference to these people.

even if they did care enough to do it, I doubt they'd be able to do it in the first place - a JavaScript parser is not simple ... bear in mind, in order to get the email address, you have to not just parse, but actually execute the JavaScript, and if they did that, they'd also be running tons of other scripts; and there would be loads of other stuff they'd have to take into account then, like stopping popups, emulating the browser object and the entire DOM to keep scripted menus and effects from breaking the execution, etc.

Frank H wrote:

IMHO, there's only one way to be "safe" from spam, and that's not to show the mail in any way...

still true, but still not an excuse to sit back and not do anything about the problem - quite the contrary smile ... and if the only other option is to not allow people to show their email addresses at all, then I'd rather take the second-best option and at least leave them with a choice.

Oh, I forgot to mention, the convert_smilies() function of course needs to be customized in advance before converting your forum, since different people use different smilies. Might of course be useful if the converter could actually read and interpret the smilies config file from MiniBB and use that instead, but might also be overkill - and also, I was too lazy. big_smile

Chacmool,

You've saved me loads of time by writing this converter, thank you very much! smile

Here's my contribution ...

The converter was only doing a half-perfect job of converting the message contents in posts - you seem to have overlooked the fact that carriage return and linefeed in MiniBB posts don't mean anything? In PunBB they do - thus, you must first discard any carriage returns and linefeeds before you convert <br> to linefeed, otherwise you get excess whitespace in some messages. The font color tag was not handled. After converting everything possible, I would strip and wipe any excess HTML at the end - this should maybe be optional, for those who allowed HTML in their posts when they were running MiniBB. And finally, for those using "hack_smilies.php" with MiniBB (I was), I made a little conversion function, which will detect smilie images and translate them back to ordinary smilies for PunBB to display.

Here's the modified "functions.php":

<?
    function generatedtime($start, $finish){
        list( $start1, $start2 ) = explode( ' ', $start );
        list( $finish1, $finish2 ) = explode( ' ', $finish );
        return sprintf( "%.2f", ($finish1 + $finish2) - ($start1 + $start2) );
    } // end function generatedtime

    function html_stuff($message){
        $pattern = array(
            '/>/i',
            '/</i',
            '/&/i',
            '/"/i',
            '/'/i'
        );

        $replace = array(
            '>',
            '<',
            '&',
            '"',
            "'"
        );
        
        return addslashes(preg_replace($pattern, $replace, $message));
    }

    function convert_smilies($message) {
        $smilie_path = '/img/emos/'; // images in this folder will be converted to smilies - this path must including trailing and leading slashes, and may be partial (e.g. you can specify simply "/images/smilies/" instead of "http://www.mysite.com/myforum/images/smilies/")
        
        // map of which GIFs to convert to what smilies:
        $smilie_map = array(
            'grin.gif'     =>  ":)",
            'sad.gif'      =>  ":(",
            'shock.gif'    =>  ":o",
            'lol.gif'      =>  ":D",
            'wink.gif'     =>  ";)",
            'tongue.gif'   =>  ":p",
            'biglol.gif'   =>  ":biglol:",
            'confused.gif' =>  ":confused:",
            'grin.gif'     =>  ":grin:",
            'mad.gif'      =>  ":mad:",
            'sad.gif'      =>  ":sad:",
            'wink.gif'     =>  ":wink:",
            'sad.gif'      =>  ":cry:",
            'lol.gif'      =>  ":lol:",
            'cool.gif'     =>  ":cool:",
            'hihi.gif'     =>  ":hihi:",
            'biglol.gif'   =>  ":rofl:",
            'tongue.gif'   =>  ":tongue:",
            'shame.gif'    =>  ":shame:",
            'shock.gif'    =>  ":eek:",
            'rolleyes.gif' =>  ":rolleyes:",
            'idea.gif'     =>  ":idea:",
            'neutral.gif'  =>  ":|"
        );
        
        $smilie_path = preg_replace('/\//i', '\\\/', $smilie_path); // escape slashes
        
        foreach($smilie_map as $img => $smilie)
        {
            $img = preg_replace('/\./i', '\\\.', $img); // escape dots
            $message = preg_replace('/\[img\][^\[]*'.$smilie_path.$img.'\[\/img\]/im', $smilie, $message);
        }
        
        return($message);
    }

    function convert_posts($message){
        $pattern = array(
            // returns and linefeeds
            '/\r/i',
            '/\n/i',
            
            // <br>
            '/<br>/im',
            
            // bold, italic, underline
            '/<b>/i', '/<\/b>/i',
            '/<i>/i', '/<\/i>/i',
            '/<u>/i', '/<\/u>/i',

            // Mail
            '/<a href="mailto:([^"]+)">([^<]+)<\/a>/im',

            // URLs
            '/<a href="http:([^"]+)"(?:[^>]*)>([^<]+)<\/a>/im',
            
            // Images
            '/<img src="([^"]+)"(?:[^>]*)>/im',
            
            // Font color
            '/<font color="(#[0-9,a-f,A-F]{6})">([^<]*)<\/font>/im',
            
            // Illegal HTML
            '/<[^>]*>/'
        );
        
        $replace = array(
            // returns and linefeeds
             '',
             '',
             
             // <br>
             "\n",
             
            // bold, italic, underline
            '[b]', '[/b]',
            '[i]', '[/i]',
            '[u]', '[/u]',
            
            // Mail
            '[email=$1]$2[/email]',
            
            // URLs
            '[url=http:$1]$2[/url]',
            
            // Images
            '[img]$1[/img]',
            
            // Font color
            '[color=$1]$2[/color]',
            
            // Illegal HTML
            ''
        );

        return html_stuff(convert_smilies(preg_replace($pattern, $replace, $message)));
    }
?>

I only added convert_smilies() and modified the two arrays in convert_posts(), so if you already made other changes (mine is based on beta 3), you can simply copy/paste those.

Woot smile

166

(27 replies, posted in PunBB 1.2 troubleshooting)

"fucking" is quite high on all forum word lists, it's remarkable how rude people on forums can get ... ah, fuck it! wink

167

(9 replies, posted in PunBB 1.2 discussion)

Rickard wrote:

I would argue that searching for cat and getting a hit from the word cats means that the search engine is _inaccurate_

in that case, you'll be arguing against everyone who's ever implemented a search engine before you wink

think about it: if I'm searching for "PunBB", and an important post doesn't turn up in my results because the post said "PunBB's nice features" somewhere in it, and the word was indexed as "PunBBs". This is one of the first problems with developing a search engine.

"More accurate" is not simply the same as "Less results" - in many cases, leaving out results because people use a different form of a word (singular/plural etc.) gives you less accurate results. The technical aspects of searching are one thing, but you have to consider that searching is a human activity - we're not machines, and we do not want to have to learn how to write RegEx expressions or even simple wildcards, before we can get down to business - the computer has to do the dirty work for us, that's what it's there for wink ... Leaving out results because people fail to spell correctly is another cause for inaccurate results, which is why phonetic searching was invented. These (and many other) are the reasons why modern search engines like Google are so effective...

168

(54 replies, posted in Feature requests)

I agree, using Flash is overkill, and a bad idea for other reasons already mentioned here.

I still don't see the problem in using JavaScript though - yes, there are about 6% of users who have it disabled, but they will still be able to see (and thus copy/paste) the e-mail address displayed by the <NOSCRIPT> tag, and they will be able to use form-mail (again, use <NOSCRIPT> on the profile page) to contact persons.

With 60% of the world's email volume being spam today, I don't think it's a waste of time or effort for us to be having this discussion - being able to safely display your email address in public, without the fear of getting harvested by satan's little helpers, would be a great feature; it's simple to implement, and it's one of those crucial little features that could make PunBB stand out from the crowd smile

169

(9 replies, posted in PunBB 1.2 discussion)

The search engine in PunBB is remarkably fast, and this is of course a great strength. However...

Try a search for "cats" on this forum - one topic with the title "Wedding" shows up.

Now try a search for "cat" - a bunch of posts turn up, but the topic with the title "Wedding" is not among them.

This demonstrates that the search engine is apparently not very accurate.

What's missing is a stemmer - an algorithm that breaks down words to their basic forms ... before the post is indexed, and before a search is executed, you reduce words to their simplest form; "cats" becomes "cat", "flying" becomes "fly" etc. - similar words are then indexes as if they were the same, which means the search index tables become smaller, the search becomes even faster, and the search becomes a lot more accurate and useable.

There's an open-source stemming algo available here:

http://www.chuggnutt.com/stemmer.php

The problem is that this particular stemming algo works only for one particular language, namely english.

Some research has been done into language-independent stemming, and from what I've heard, it should actually be possible, although I have no idea how, or if the technology is free or open - there's some basic information here:

http://www.dei.unipd.it/~ims/multilingual.html

The search could also be made more accurate by implementing a vector space - this of course with some performance hit, and search index grow. A vector space basically means that you record how many times a word occurs in a post, not just whether it occurs or not - and when searching, posts with a given keyword repeated a higher number of times, rank higher. A more detailed vector space explanation is here:

http://www.perl.com/lpt/a/2003/02/19/engine.html

170

(9 replies, posted in PunBB 1.2 troubleshooting)

Made a small mod that removes the skin choice from the user profiles if only one skin is installed - Rickard said it'd be included in release 1.2.

171

(54 replies, posted in Feature requests)

Frank H wrote:

no, just render them when people register or change their email addys (not that often)

okay, so then you'll spend HD space and bandwidth on thousands of GIFs instead.

if something like this is implemented, it has to be optional, and IMO shouldn't be switched on by default.

172

(54 replies, posted in Feature requests)

that's maybe a bit overkill? it certainly will put considerable extra load on the server, having to generate and compress a GIF image every time an email address has to be displayed...

173

(54 replies, posted in Feature requests)

just add a function to functions.php, encode_email(email) or so, and reuse that. If there's a simple call to that function in one or two other places in the source code, that should hardly cripple the readability wink

174

(43 replies, posted in PunBB 1.2 discussion)

I think as long as you name your variables reasonably, and put enough comments in the code, the separation should not pose any problem for most PHP programmers? PunBB is well commented, and the variable names make sense, so as long as that style prevails ... smile

175

(54 replies, posted in Feature requests)

I don't see what harm a little extra safety and comfort can do, especially at the cost of a measly 200 bytes, but never mind, I'll just make a mod for it smile