I blame Tonga

You may have noticed that Clagnut was down for most of last week. The culprits were referrer spam robots maxing out my database connections, resulting in my ISP putting my account on hold. Unfortunately my ISP also prevented email and FTP access which didn’t exactly help the situation.

The main culprits were pharmaceutical pedlars using sub-domains such as buy-fioricet.drop.to. Fortune City owns the drop.to domain (and others like it), and flogs the sub-domains to spammers. No doubt Fortune City would claim they don’t knowingly sell to spammers, but they are well aware these lowlifes buy their services as the offending web sites have been shut down. Unfortunately shutting the sites down doesn’t stop the robots effectively inflicting a denial of service attack on sites like mine.

Clagnut suffers from the robots because it has dynamic pages – each blog entry is pulled from a database when you view the page. The queries and database tables are well optimised so it’s not normally a problem, but if an army of robots turns up, as happened last week, then they use up all the available database connections – in my case maxing out at 200 simultaneous connections per second. The irony is that all referrals listed here carry rel='nofollow' attributes so they won’t even gain any benefit from being shown.

As of this weekend I’m successfully fending off referrer spam robots with a blacklist of referrers which is checked before a database connection is made. In addition, a 403 Forbidden is given to all user agents claiming to have come from a .to domain, by using an .htaccess rule like this:

RewriteCond %{HTTP_REFERER} .to/? [NC]
RewriteRule .* – [F]

Clearly the blacklist approach is not scalable, but what else can I do? The robots usually identify themselves as IE6 so I can’t filter that way, and I wouldn’t want to keep out legitimate robots such as search engines, so I’m not really sure what my next steps can be. Is there something I should be getting my ISP to do as well? Any help gratefully received…