Referrer Spam Blitz
Well, I’ve pretty much given up on using my stats in any meaningful way. The referral spam I’m getting is simply too overwhelming. As I’ve stated before, the spammers don’t gain a damned thing–not a single link, no Google Juice, no benefit whatsoever, zilch. But that doesn’t stop them from sending massive amounts of referral spam. The top 107 “referring” pages in my stats are all spammers, the most prolific being a child porn site and some strange, undefined quasi-fanfic site; those 107 referrers slammed this site with 6,850 fake referrals in just two weeks. Only a tiny handful of the top 1000 referrers, responsible for over 20,000 hits on my site, are legitimate; hunting them down within the steaming morass of festering spam is hardly worth it.
Until the developers of AwStats (go ahead, tell them) take their heads out of the sand and finally realize that their product is worthless until they institute a simple spam filtering subroutine, I’m just going to have to do without stats except in the most limited of ways (like checking how many times a specific file was downloaded–for example, my brother’s fanfic novel, Harry Potter and the Veil of Mystery, is still enjoying about 160 downloads a month, half a year after I began offering it here). It’s too bad, stats are fun.

Pity. I enjoy statistics myself; I remember how I used to pour over the stats of people who used the programs I wrote in my university computer club days.
I still don’t *quite* get how this ‘blog referral’ thing works. Is there some recognised service – a protocol, a web address, a RFC? – that blogging software use to register the site from which they came? Or are you talking about the ‘referring-host’ header (or whatever it is) that comes along with the rest of the headers in a http connection to your blog host? Just curious.
Is the referral spam, which is the one you are concerned about in this post, caused by robots that scan the page? Do they hit every link? Do they hit links w/ white text on white background? If so, can one disregard referrers that hit those “fake links” or disregard referrers that hit > N links, where N is reasonably bit? How many links are being hit by the referrer spam people? Isn’t easy to identify them by their behavior? I would think there is much code out there that does that.
Brad:
It’s the referring-host header, if I read you correctly.
Referral spam technically has no relation to blogs directly, it can hit any web site that measures statistics, and probably does. Referral spam focuses on blogs more than other sites, though, because blogs have in the past been known to keep “top referrers” lists (see the blog post I linked to on referral spam). In the past, one feature blogs sported was a list on their main page which detailed which other web sites linked to them, a list of links back to the sites which linked and sent visitors to the blog, as a kind of “thanks and back atchya” reward. Blog writers want to be read; when other sites link to that blog, they want to reward that. The automated top-referrers list did that.
Spammers caught on. They created automated programs that did nothing but send repeated requests for blog sites to send them a particular page. This imitates the action of a person visiting the blog. When someone visits a blog page (or any web page, for that matter), certain data is given to the web page by the visitor, including the IP address of the visitor, the visitor’s OS, browser–and, among other things, the referring page. That is, the link that brought them to the new web site. That’s a referral. Let’s say you have a blog. You link to my site. John Q. visits your site, and clicks on your link to me. When I check my stats, I can see John Q.’s visit, including the note that he came to me via a link from your site. That’s the referral.
Referrals can be faked. Actually, all the visitor’s information can be faked. Most people don’t fake it because they don’t care or don’t know how. Your computer is sending this data to the web page you’re visiting. You control your computer. So if you know how, you can tell your computer to give any information. That’s what the spammers do. Their automated programs act like any visitor and request a page, offering fake information, especially the referral. So a spammer’s automated program can just hammer away at your site, pretending to be a real person, faking a link from the spammer’s page, and within hours it can make it seem like the spam site sent hundreds or even thousands of visitors to your blog.
Do they get a payback? Probably 99.99% of the time, no. But these automated programs don’t cost much to make or buy, and they cost next to nothing to operate–actually, you pay for it, because your site is the one paying for the bandwidth to send the data the spammer asks for (but no one sees, of course). The spammers don’t care if they ruin your site, they would gleefully burn your entire domain to the ground if they could make half a buck off it.
YKW:
Yes, you’d think there was a lot of code that could do that. In theory, it shouldn’t be hard, and even easier to write code that could retroactively filter out referrals that contain strings you designate.
But the code isn’t out there. Nobody’s doing it. I don’t know, maybe the referral spam is only hitting blogs, and the stats people don’t give a damn.
I don’t have a blog, but I think I felt a smidgin of the hatred you have for spammers today when I cleaned out my Yahoo ‘bulk spam’ folder. Simply amazing, the amount of garbage sent out by these parasites. Ugh.
I was not exactly sure what this referral spam was trying to do so I type “What is referral spam” into google and it referenced
http://www.wired.com/news/culture/0,1284,56017,00.html
which said these folks are trying to get on lists of people that view a blog, which I’ve never seen, yet I guess these are not going to be around if the referral spam is there.
I then typed killing referal spam into google and got:
http://www.coldforged.org/archives/2005/01/25/killing-referral-spam/
http://blogs.x2line.com/al/archive/2005/03/26/858.aspx
http://ekstreme.com/phplabs/crawlercontroller.php
http://www.floggin.us/archives/2005/02/13/my-attempt-at-killing-referral-spam/
I would think programmers would view this as a juicy challenge and go right at it.
And yet….