Home / Blog / Filtering Spam as a Web Service

Filtering Spam as a Web Service

why web services don't work

I noticed that I had some comment spam today. It seems to be geared towards breaking Bayesian filtering techniques. The comments were all a bunch of garbage sentence fragments with random BBcode inserted into them.

But, a fair amount of them were caught by cognifty's built-in spam control.

I think that the bayesian busting spam is meant to foil Web service type comment spam filtering systems, like Akismet, Akisment, Mollum, Defensio, etc. As you report garbage text as spam, it dilutes the power of the real spam words.

Take, for example:

"the tree, visit these trees took well and dream. 
by themselves I noticed about forts a pair"
If you flag this as spam, you're simply diluting the pool of words that are true spam. Yes, this is spam, it is unwanted, but most anti-spam systems rely almost entirely on content filtering, and not behavior filters (the unwanted part of spam).

Cognifty is, as far as I know, the only Web application framework with an anti-spam trust manager built into the core. This does not act as a plugin, you do not bolt it on to only the comment system or only the user registration system or only certain parts of your site. It filters, not only the entire post, but also the behavior of the client. If a client is firing comments at you once every second for an entire minute, it really doesn't matter if it passes content filtering or not, it's un-wanted access to your site.

The anti-spam engine is far from black and white. Cognifty will grade each request and let the application decide what to do. There's nothing worse than being stopped from posting your thoughts because they seem to look like spam, either you're talking about spam, or talking about something legitimate. With a pure content based anti-spam system, you are taking the user out of the equation. A long-time user of your site has established some trust with your site. You have supplied them with a way to publish their thoughts or contribute ideas, and they have supplied your site with great content or suggestion. You absolutely need to grade long-time registered users on a different scale than the rest of the world.

This post sums up some of the major problems with using a highly-content centric, centralized rule system for an anti-spam engine.
http://www.slightlyshadyseo.com/index.php/dear-akismet-you-suck-and-were-getting-a-divorce/



Comments on "Filtering Spam as a Web Service"

 

Add a comment