Home > BlogTech > New Spam Type May Destroy Blog Statistics

New Spam Type May Destroy Blog Statistics

July 2nd, 2004

When running a blog, one has to deal with the inevitable annoyance of spam. Not email spam–blogs don’t reveal your email address (unless you foolishly post it on your site). But blog comment spam will get you. Blog comment spam is when advertisers use automated software to pummel your blog with fake comments, usually in long-ago archived posts. The blog comment spam can be as innocuous as a single line of text (e.g., “Love your blog!”) with a link to the spam site, or it can be as obnoxious as a list as long as your arm, full of dozens or even more than a hundred links to spam sites, many or most of them obscene. The blog comment spam is primarily created by spammers so that your site will host one or many links to their site, thus improving their site’s search engine rating, better advertising their scummy business. Software add-ons like MT-Blacklist or comment registration sequences help stem the tide of spam.

As if that weren’t enough, now we have a completely new type of blog spam rearing its ugly head: “Referrer Spam” (sometimes called “referral spam” or “stat spam”).

Referrer spam is a fairly new phenomenon. I noticed it first in January, when, according to my statistics, a Paris Hilton site seemed to send 13 people to my site via a link from their site. Curious as to why they were linking to me, I cautiously checked them out (never by clicking on the direct link to their site in my referral logs, then they would see where I came from–instead, I copied the URL and pasted it into a new window). But their site had no link to mine. Puzzled, I set it aside and forgot about it–until March.

Then a Nude Celebrity web site seemed to send 12 people my way via an apparent link from their site, and again their site seemed to have no such link. Again, WTF? But I again dismissed it as an anomaly.

But then in April, my site was “visited” 156 times by a dating service, 125 times by another celebrity porn site, and 105 times by a generic porn site. That set off alarm bells, and I started looking into it more. And I found out what it was: Referral Spam.

So what is it? Referral spam is when spammers hit your site with a fake referrer (a “referrer” is the site which sent viewers your way via a link on that page). That is, they have an automated program “look” at your web site, while falsely telling your site that they came from the spammer’s web page. This is counted as a hit on your site, and is reflected in your web site’s statistics page.

So why do they do it? Well, first of all, it’s cheap and easy for them, so they’ll do it for even a negligible payback. But they do get a payback: a link to their site on your statistics page, or you clicking on the link to visit their site and maybe buying something, or–the holy grail for these scumwads–if your web site automatically displays your top referrers, they get a big, fat, free link on your main page, one that will increase their search engine standings and may get dozens or hundreds of people following the link on your site to their page. In other words, free advertising for them, at your expense.

So it benefits them just a little, but it annoys, inconveniences, and sometimes costs you–and will soon make all statistics for web pages completely useless. It’s annoying because when you look at your stats, you have to look at all those porn sites and other spam invade your numbers. It’s inconvenient because those hits are not “real” visits, and so in the total numbers, you’re not sure what’s real and what’s not. It costs you because sometimes these idiots will hit your site thousands of times, which takes up bandwidth, and that can cost money.

But most of all, we are just seeing the beginning of this. At the end of last August I got my first blog comment spam. Within 3 months, blog comment spam started getting too much to handle without MT Blacklist. And now, even with MT Blacklist, I am still having to manually delete several blog comment spams every day–and that’s after MT Blacklist stops dozens more each day, prompting me to look into establishing comment registration.

Following that model, we can expect referral spam to similarly explode over the next several months. Already, in just the first eight hours of July, out of 12 referrals to my site, 8 of them are from fake sites. And it’s not just the sites that throw 150 hits “from” one spam site, there are also a multitude of single hits. In addition, many of them are not identifiably from spam sites. You might get a couple of hits from “www.yourpetcrafts.com” (I just made that one up), a site that looks innocuous–but when you go to that site, it immediately redirects you to a heavy-duty triple-X porn site. Apparently, the porn and other spam sites are buying up innocent-sounding domain names so they can be set up as a conduit to the real–and more clearly offensive–spam sites.

The end game of this is that web site statistics will soon be completely meaningless. As the number of fake referrals from spammers starts to represent more and more of a web site’s traffic, it will be impossible to figure out how much of your web site hits are coming from spammers, and how many are real people visiting your site. Not that the spammers give a damn–it hardly costs them anything, and while the return for them is probably negligible, they would gladly make your life a living hell if they could get a nickel or two out of it.

These people have to be stopped.

Articles on this can be found at Wired News, Spyware Info, and M5 Computer Security, among other places.

Update (June 2005): FYI, referrer spamming has become devastating over the past year. In the month of May 2005, the top three “referrers” to my site were online gambling spammers; those top three referrers accounted for more than 4500 fake link visits to blogd.com. The next three referrers, also spam, also gambling sites, left another 3000 collective fake referrals. In fact, only one of the top 25 referrers was a legitimate site (http://blo.gs), and that was #21 on the list. The other top 24 “referrers” accounted for almost 10,000 false hits. Quite a progression from those first spammers leaving only a dozen fake hits almost a year and a half ago.

Categories: BlogTech Tags: by
  1. July 2nd, 2004 at 17:26 | #1

    Luis,

    I’m convinced that website statistics software can be able to sort out most of the referral spam, it’s just a matter of the sensors used to collect the statistics data.

    Logfile analysis is usually not able to sort it out, but I would wonder, if a referral spam script would be able to load a javascript beacon. Therefore the spammer had to parse your site, and then interpret and execute the javascript, hardly imaginable that simple spam robots will soon be able to do that. There’s really a lot free (or even public domain) website statistics software out in the world using javascript sensors.

    Comment spamming you can only prevent by configuring your posting software appropriately. There are some technics like image code verification to verify a human is posting, against human postings with inappropriate content helps only an editor review before release. Machine posted spam may increase, if you use well known templates from popular blogging software.

    As for our site, we hadn’t any problems with spam posting so far, so we just opened link posting in real time to anybody.

    Hope you don’t feel spammed by this comment.

    I’m real. :-)

    Marcel

  2. Luis
    July 2nd, 2004 at 17:49 | #2

    Marcel:

    I would figure that web stat software can do a fair job of filtering, but I am not hopeful for two reasons: it is free software, and likely not very often updated, and second, spammers tend to catch on to defense systems quickly and beat them almost as quickly.

    However, even basic filtering will be quite a problem–for example, the stats could filter out multiple visits perhaps, but with the spammers’ ability to spoof the IP address and other visitor info, that could be problematic for the single-hit stuff, which is just as damaging. For example, one porn site might acquire 100 domains of a seemingly innocuous nature, and have them redirect toward the porn site; the spammer’s software would then automatically introduce fake hits for each fake site to catalogued blogs and other sites seemingly at random. At the end of one week, your blog stats might have 5 or 6 hits from each off 100 fake sites redirecting to the one spam site. How could stats software filter that out?

    I don’t know about “javascript beacons,” but I’m sure they could find a way to defeat it. They’re on the ball, the spammers–I’ve heard people say that they implement what they believe are difficult barriers, only to have spammers defeat them in just a few days.

    As for the distorted-text registration system, not only does it present yet another level of hoops for regular people to jump through, that also is defeatable–someone I read in a spam discussion presented the idea that porn spammers could easily set up software that, when presented with that text test, would simply copy the image and present it to one of the multitudes of surfers constantly hitting the free porn pages, requiring them to solve it to see the free porn. The software would then take the answer back to the blog, input the data, and get by that level of security.

    I think that spamming is on par with the site’s ranking in search engines–I really started getting spammed after I started showing up in the top ten for a variety of topics–and I think that they especially prey on those who post about spam (those pages tend to get comment spam more, for example), though that’s not gonna stop me….

Comments are closed.