Finally – Blogging as it should be: I am spam free since I changed some basic things on my blog. Even the akismet queue is empty! For everybody else out there fighting with the same problem, here is a small howto on how I managed to get things smooth:
Please, do not expect a solution for all of your problems in here and be patient. Observing the situation is really an important point in my howto as spam differs from site to site. I know, this sounds strange, but it’s a fact. The points I will show up should be also easy to be implemented on other systems;
Common spam is spread via large botnets in an automatic manner. To be successful on most systems, all form fields found are filled in, to get past required fields. Even referrers might contain spam text.
Let’s have a closer look at those text fields and what abuse is done here. To be successful on a wide area of software BBCode is used. If you are not using it on your site, shoot on sight. A possible solution in PHP could look like that:
if (preg_match(‘|\[url(\=.*?)?\]|is’, $comment)) die(‘BBCode is not interpreted here.’);
The php function die() might be a bit hardcore in here. Of course, you can replace it with every other call dealing with your spam. But I would recommend spilling out a small note to the user why his comment was rejected as there are too many people unable to read instructions out there.
Another characteristic of spam are some words they love to use. ‘casino’ or ‘viagra’ are two of them. A filter on such words might not be the big deal for spam prevention. But it can stop some junk.
Comment Spam usually contains a large number of links. To be exact, I have not seen that many spam comments with less than 10 links in it. The number of links in a comment can be counted easily. I am counting HTML links as I have a look at every href and http(s) out there using PHP:
$linkCount = preg_match_all(“|(href\t*?=\t*?[‘\”]?)?(https?:)?//|i”, $comment, $out);
If the number of links is too high, the comment can be killed. A message to the author might be nice. Bots do not interpret them, but humans do.
Trackbacks are communication between Blogs – and even here spam hits us. The trackback system is rather difficult to check, as we have to parse the other website. I am doing this using curl, parsing the result to see if my website is mentioned somewhere. If there is no link to my blog, there is no link on mine either.
My Ideas presented here are easily implemented as WordPress Plugins. Some other people already did so on some ideas, some ideas were built by me. Modifications on CMS systems are not always a good thing as they can lead to problems on updates or are forgotten then.