Hopeful, but Wary
Last month marked the most excessive period of referral spam I’ve experienced so far for this site. (See here for an explanation of referrer spam.) The problem itself started a year and a half ago, in November 2003, with a grand total of 7 “hits” from a Paris Hilton site. It was #36 on my referrer list, with the referrers before it all legitimate, adding up to an aggregate 1167 hits from 35 referrers before the one spam showed up.
I didn’t even notice it.
December’s experience was almost identical: the Hilton spammer was #35, with another 7 hits, 1153 other hits coming from the prior, all-legit referrers.
But then Hilton advanced: in January 2004, one Hilton referral was at #18 with 13 hits, while another Hilton spammer was at #18 with 8 hits (interestingly, the RIAA seemed to spam my site just in front of the first Hilton site, coming in at #17 with 14 hits–certainly faked, I doubt the RIAA linked to me from their main page).
This went on for a few months, but the spammers stayed out of the top 10–until April 2004. Suddenly, the top three referrers were spammers with a total of 333 hits, with others beginning to litter the spots below. For a few months they died down, scattering several dozen hits in the lower rankings, until in July, someone mysteriously spammed me 1556 times. It was mysterious because the spammer made it look like the links were coming from the IAEA, and I know the International Atomic Energy Agency was not linking to me from their home page. The only thing I can figure is that it was a spammer testing out a new script before using it to get referrers to point to his site. I am assuming it was Burnham, but I may be wrong–there is a lot of scum out there.
Spam referrals continued to persist in the lower rankings, sometimes hitting as high as the #4 spot, but outrageous spam held off–until October, when the infamous Burnham Internet Sales Cockroaches hit full-force, dominating not only the top 5 spots, but also 25 of the top 27 (and probably more that I couldn’t pin down as Burnham), their top 25 spam referrals totaling a record 3758 hits. This prompted me to blog about them, reporting on their scumminess, and I am happy to report that Google still ranks that post of mine as #1 in a search for “Burnham Internet Sales,” with their own site ranking #2. Score one for the little guy! Try to spam me to get high Google rankings at my expense, you do so at your own risk!
Although Burnham stopped harassing me, other spammers started glomming on, and since late last year my stats have been clogged with the lowlifes. I tried everything I could to get around them. I tried IP Deny in my site’s control panel, but it didn’t work. I tried to create an .htaccess filter myself, and it worked, but only for about 24 hours, after which the spammers blew through it like tissue paper. I wasn’t publishing a “top referrer” list which the spammers hope they will appear on, nor was my stats page open to public view (it’s in a password-protected directory), so I wasn’t encouraging them in any way. But this is a blog, and blogs are spam magnets. Spammers care little if one site which they’re decimating isn’t getting them any profit, it costs them next to nothing to do, and they probably enjoy defecating over sites that deny them a return. So I got more and more spam, and there was no way to block it.
Here’s where I get to the “Hopeful, but Wary” part. Last month, as I mentioned at the top, was the most excessive month of referrer spam. 24 of the top 25 referrers were spammers, accounting for more than 10,000 fake hits. So I looked into it again, and found a possible solution. There is an AwStats (the program used to get statistics) patch on SourceForge which claims to “check comment spam blacklist for referer spam.” The information on the patch is incredibly sketchy; I thank the author for making it, but I would think they could spend more than 50 seconds to type the instructions. There is no explanation beyond the title as to what it even does. But someone on a forum claimed it “scrubbed” the referrer logs of all spam, which sounded great to me.
What was clear was that the patch used the same blacklist that MT-Blacklist used to delete comment and trackback spam. All I would have to do was add the referral spammers to the blacklist (easy enough to do with MT-Blacklist’s interface–and many of the spammers were already there).
It turns out that I had even seen this before and blogged on it, but had not used it because it required admin skills I frankly do not have. As I said, the instructions were sketchy at best, and I could not come close to deciphering them. But after May’s spam numbers drove me to search some more, I found someone who suggested the web host’s tech people install it instead of the site owner. Worth a shot, I thought, so I asked–and they did it.
At first, I was disappointed; I expected the existing spam to melt away, leaving nice, clean stats behind, but alas that was not to be. Before the patch was installed, the top 26 spots were filled by spammers who had hit me a total of 867 times, and that was just after 5 days. After the patch was installed, the spam was still there. Failure again, I thought. But half a day later I noticed something: the spammers I had added to the blacklist had not hit me in that time. Other spammers, not on my blacklist, had hit me again. So I added more spammers from the referral logs to the blacklist, and they stopped adding hits to the list as well.
If my understanding now is correct, the patch does not clean up old spam, but it stops new spam that you’ve blacklisted from getting in. And that will do quite nicely in the interim. I’ll just have to keep an eye on the stats (which I usually do anyway) and blacklist spammers as soon as they hit me. With luck, the flood will return to the relative trickle that I was experiencing a year ago.
I’m also checking out ways that AwStats might be tinkered with to get rid of older spam, too. Perhaps by removing the AwStats log file for the month, a new one will be generated automatically, this time fully applying the blacklist filter. I’m almost afraid to try it, though, for fear of screwing things up.
Still, hopeful news. But I’m not counting my chickens yet–spammers have blown through defenses before. Time will tell. I’ll keep you posted.

As I’ve mentioned before, I’m a big fan of Dr. Dave’s referrer karma. Instead of working on the stat side, his script checks each referrer for a link to your site, and then black or whitelists the sites it has checked so that it won’t have to do so again. Even visitors from sites that inadvertently get mistakenly blacklisted will see your page, the script will throw an error, making the stats program ignore the hit with the referrer in it, and then redirect the user to the page sans referrer value. It sounds like a combo of this script you found and referrer karma could really slim down the spam.
I like Paris Hilton.