1 (edited by a.rodier 2007-03-17 19:47)

Topic: Spam today on the web

Actually, I can see a lot of forums, implementing 'captcha' to fight spam bots and others web pollutants.
So, I believe there is a more intelligent and more robust solution, lighter to implement, because you don't need graphical library.
Again that be unreadable for some peoples, captcha has also the problem that it's not efficient against bad guys doing manual inscriptions, and using forums to reference their web site.
Often, the web site is pornographic, or sale money services or drugs. And I think this is the real problem. When you counter spam bots, you fight soldiers, else the commandment.

The next response against spam bots is fighting against peoples by analysing the contents and the links they provides.

PHP offers the possibility to download a web page. Coupled with the powerfuls regular expressions functions, you can do a lot of things :
- Use a customisable list of banned worlds, and compare the content of a web site, to perform a spam score.
- Track javascript redirection, Recognize pop-up open, virus and spyware, etc...
Of course, it's possible to change the UserAgent string of PHP to emulate a simple browser.

We can also use on-line database of blacklisted web sites addresses, the same as used against mails spams.

Regular expressions on urls is also a possibility.

What do you think about that ?

Re: Spam today on the web

Sounds to me like an interesting extension smile

Re: Spam today on the web

May be this simple algorithm can be included in punbb without heavy code ?

When a user submit an url :
- Prohibit numeric urls.
- Analyse the url, with a list of prohibited terms.
- Download the home page (maybe using a false user agent), and do a first and fast analyse of the content : extract all major text and links by using regexp, intersect with an array that contains a simple list of prohibited terms, and count the result.

I think this simple algorithm can eliminate at less 90% of spammers, because offence and advertising sites are often simple.

$webSite = fopen($url) ;
$homePage = fread($webSite) ;

// of course, a better way is using a text file, or a database. This is just an example.
$prohibited = array( 'lot', 'of', 'offense', 'text', 'here', ......, 'bad', 'finance', 'save money', 'investment', 'drugs' ) ;

$wordsOfPage = extractPageContents() ;

$score = count(array_intersect($wordsOfPage, $prohibited)) ;

if ( $score > 100 )
{
    // call 911...
}
elseid ( $score > 50 )
{
    // call my mother
}
else
{
}

etc...
/**/
Simple start ?

Re: Spam today on the web

I doubt it would become a part of PunBB's core: it sounds like it could almost become a project in and of itself, sort of like SpamAssassin for webpages.
In any case, there will be anti-spam extensions released for 1.3 (ie: integration with Akismet)

Re: Spam today on the web

OK,
Thanks for the link about Akismet.
I just have downloaded PunBB on my Debian Sarge using SQlite, And All seems to work perfectly. I'll regularly send some feedback to you.
The last time I have installed a forum (phpBB), I receive about 100 spams/day, so I close the forum.
If I need one, now I known I can create a very light and simple plugin for punbb.
Thanks again.