Blacklisting comment spam

Everyone’s talking about it. Everyone’s getting it. The evil that is comment spam: blog comments and/or URLs which link to off-topic and usually questionable sites, posted with the sole purpose of improving a Google ranking.

I’ve been hit a few times recently, once by a huge HTML comment containing masses of links, similar to email spam. More common manifestations are short, innocent looking comments posted with a dodgy URL, and rogue entries on my referrers page.

Well it’s time to nip it in the bud. To start off, I’ve time-limited how often someone can post a comment – this should prevent the kind of robot attack which results in dozens of spam comments to the same post.

Following Simon Willison’s lead, I’ve also implemented a blacklist technique. Any comment or referral I judge to be spam will be deleted, and the offending domains will be blacklisted. Any future comments that contain links to those domains will be refused and the poster’s IP address logged. My blacklist is available at blacklist.txt. You are welcome to grab a copy of that file once every 24 hours and use it as part of your own comment spam prevention system. As a growing decentralised web of trust, other good folks have also been posting their blacklists:

If you start using a similar system, drop me a line and I will use your blacklist as well. Please don’t merge other people’s blacklists into your own public list. If I find non-evil URLs in someone’s blacklist, I will unsubscribe from it, so all your hard work may be undone by someone else’s carelessness or maliciousness.