Home > Computers and the Internet > New Spam King: Texas Holdem

New Spam King: Texas Holdem

May 7th, 2005

Not that these slimeballs are anything new–they’ve been infesting my comment spam for some time–but now they’ve added referral spam to their repertoire, and have overwhelmingly hammered my site–more than 3,000 referral spam hits in just five days. And they’re rotating IP Addresses, so it’s completely screwing up my original-visitor stats.

Please, please, please someone finally come up with an app or plugin for your site that will filter out referral spam from the log files, and one that does not require programming expertise to use. The bandwidth they’re eating up I can handle, but they are making a hash of my stats, and the fact that they are doing that to me on my dime is all the more frustrating.

What we need is a programmer who can design an app that will work like MT-Blacklist, except instead of filtering the blog comment database, instead it filters a site’s referral logs, deleting from those logs all hits that match specified parameters. This would at least allow for logs to be cleaned up, and give you a chance to really know who’s visiting your site. Ideally, the people who make AwStats should have built this into their software long ago.

And yes, I’ve tried setting a .htaccess file–and it doesn’t work. It stops the spammers for a matter of hours, but then they just blast right through it.

Categories: Computers and the Internet Tags: by
  1. Brad
    May 8th, 2005 at 08:56 | #1

    I’m a Unix System Administrator of about 22 years standing, so I know bits and pieces about computers, but I’m not too ‘with it’ when it comes to the web side of things. Give me an 80×24 dumb terminal any day … (gives VT100 a hug).

    Anyway, could you give a couple of paragraphs maybe just detailing exactly what ‘referral spam’ is? I’m *vaguely* aware that visitors to your sites have a ‘referred by’ line in the http header … I thought you needed to code to pick out that line and use it, maybe Apache or other web servers do that more automatically these days.

    If ‘referral spam’ is just forging the ‘where I came from’ header, how is that spam? And how are they trying to get you to visit their site, buy their viaga anyway?

    I don’t want to take up your time, but a couple of paragraphs from you that sheds some light on how you web people use these ‘referral logs’, and how spam skews that use, would be most appreciated. Just enough to supply the missing bit of information so I can connect all the dots?

    Brad

  2. Luis
    May 8th, 2005 at 10:26 | #2

    Actually, I blogged on referral spam here a while back, but here’s something a bit more clear aqnd comprehensive. When someone creates a link from their page to yours and visitors follow that link, the recipient site’s logs can see that, as you mentioned, in the header. This information is logged, and when the statistics programs tallies everything up, it shows what sites linked to your site (“referred” to it) and how many visitors followed that link. Since people with web sites want more visitors, they value incoming links. To reward others for linking to them, they may publicly display a list of the sites that send the most visitors their way, each listing being a link back to that site. The idea is, more people will follow that new link back, thus rewarding the initiator for the original link, encouraging more people to make more links. So people have these “Top Referrer” links.

    Another benefit of having incoming links is that it builds up your search engine muscle. If 10 people link to your site, your site will rank a certain degree on search engine results. If 100 people link to you, your site will be higher on the list. So incoming links not only bring more visitors over the links, it also means your site will be higher-ranked on search engines like Google, which means even more people will come to you.

    Well, the spammers found a way to subvert this. Spammers want visitors, naturally, but of course nobody wants to link to the scum. So they find ways to make you link to them.

    One of them is blog comment spam. Automated programs locate blogs and hammer them with fake messages bearing URLs, because comments allow visitors to leave a link to a web site of their choosing. An automated program on my site (MT Blacklist) stops around 500 such spams every day. A few get by, and I delete those manually.

    If I didn’t clean up the spam or block it with that software, then every single post on this blog would be polluted with dozens if not hundreds of spam messages. Some spammers will hit you with a hundred comments to one post, others will leave one comment each on one hundred messages. If it weren’t for the automated software, I’d have to shut down comments entirely.

    Another type of spam is Trackback spam. That’s when another blog links to you in their post, as if they were talking about something and linked to your post as an information source. Some sites will publicly post such trackbacks, so this would also be a link back to the originator’s site—so spammers found that, and hammered it as well. My automated software stops maybe 400 of those per day, again leaving a few to be cleaned up by hand.

    But at least these types of spam can be cleaned up with existing apps.

    Then finally, there’s Referral spam (also known as “referrer” or “referer” spam). This is intended for sites which post “top referrer” lists with links to each referrer. The spammers never really link to your site, they have automated programs which fake their IP addresses and the “link” that they followed. They can make it look like there’s a link on their web site and 1000 people came through it to your site. Cool! So many visitors! You’re at the top of my referrer list, here’s a link back to ya buddy! Of course, there never was a link to your site, there never were any visitors—they just want to fool you into linking back to them, for the traffic and the search engine boost.

    The thing is, most blog sites don’t have referrer lists, only a very tiny percent—fewer than ever before, because bloggers have caught on and don’t want to reward spammers. But the spammers don’t care. This costs them virtually nothing, so they happily spew spam at everyone 24/7. The problem is, referrer spam completely screws up the whole idea of stats. You run a site, you want to know how many people are reading it, you want to know where they’re coming from. But with referrer spam, all that is destroyed. To find just five actual referrers, I have to wade through one or even two hundred spam listings.

    These days, my stat program records 1500 unique visitors per day, and about 22,000 per month. But I have no way of knowing how many are spammers because I can’t get the data on what IP addresses they used to make which comments/trackbacks/referrals. So maybe I get 1450 unique visitors a day, maybe I get 1000 or maybe even less. It’s impossible for me to tell. And with current tools, there’s no way I can filter out the spam and see the pristine stats.

    What would be great is if someone made an app which would reside in the web site (like the stats program) which would allow a site operator to identify a URL, IP address or even a keyword; it would then would scan the logs for those terms and delete any entries that match, allowing the stats program to then tabulate only the legitimate visitors, giving the site operator a clean view of who’s really coming to look at the site.

    Okay, I’m gabby. But that’s the problem, in a nutshell.

  3. Brad
    May 8th, 2005 at 17:50 | #3

    Luis, thanks for your time. It’s all clear now. Well, mostly.

    What amazes me is that people actually DO this so much. I would have thought the returns would have been infinitesimal for the spammers. But ‘infinitesimal’ is still greater than zero, and as you say, it costs them practically nothing, so they go ahead and do it … breaking the elegance and utility of the internet in the process. It’s hard for me to wrap my head around the sort of scale required for these buggers to make a profit out of this. That’s one of the biggest lessons out of reading your blog today.

    The link to your earlier spam blog entry was much appreciated. Most interesting. While reading that entry I was bouncing up and down shouting “I KNOW! I KNOW! WHY DON’T YOU USE ONLY-HUMAN-DECIPHERABLE IMAGES LIKE YAHOO DOES?” only to have you and Marcel chat about that in the comments. Drat and double drat. It really is *interesting* to read about the warfare out there – you make a better suit of armor, I’ll make a better arrow – but I don’t have a blog site that’s been attacked by those arrows.

    Not quite sure what the ‘javascript beacon’ is that Marcel was talking about. A click-on-this-to-enter-a-blog-comment widget that is created, or implemented, via javascript executed at the browser, rather than just being an embedded URL? Otherwise, I don’t get him/that.

    Also not quite sure what the ‘Trackback Spam’ is that you defined here. I grok the other two types but this one eludes me. Say I have my own blog … and I put up an entry one day, saying “That Luis is a pretty cool dude” and include a URL to your blog. How does that touch your blog, or affect you or your statistics?

    I love statistics, and when I had a couple of programs I wrote for the computer club back at Uni (so long ago) I loved keeping logs of who accessed the programs. Gave me a real sense of satisfaction. So sorry that these creeps are corrupting all that. This internet thing should WORK that way it’s supposed to!

    Thanks for your time, most fascinating.

    Brad

  4. Luis
    May 8th, 2005 at 18:50 | #4

    There was a story recently about a spammer who was arrested. He sent out as many as 10 million spam message each day. Made $750,000 a month. But he wasn’t arrested for that–he was arrested because just 53,000 of them did not properly identify him as the mailer, which went against statute.

    With automation and enough bandwidth, massively spamming people is a cinch. There was an ad out somewhere I recall bragging about having a database of 55,000 blogs that could be spammed for a small amount of money. Once you’re on the list, they just sell it and sell it and sell it some more until you’re getting spammed by every piece of scum out there. Goes with the territory. They spam because we link, because our sites can be manipulated by the outside.

    One method to avoid spamming is to change the cgi script which controls comments. Doesn’t work–the spammers lose the script, they scan and pick it up again no matter what you name it or how you try to protect it. You create an .htaccess script and block their sites by name and IP, thye get stopped… for a few hours, and then they find ways to blast through your lockouts like tissue paper. I’ve tried it all, and they’ve defeated it all.

    Very soon–I intended to do this already, really–I’m upgrading to MT version 3, and with that, I may pick up some of the new safeguards involved, either typepad registration or the image-deciphering check. I hate to do it, but the spam pressure is just getting to be too much.

    It’s also discouraging when I visit other sites that don’t clean up spam. You go to a blog, it’s active and ongoing, and yet every post is choked with spam, and the owner isn’t doing anything about it. Or you see a site with a Top-10-referrers list, stuffed with spam, of course, and the blog owner just seems oblivious to it. These are the people who make the spammers happy and encourage them to go on. Frustrating… like the 1-in-1000 goobers you know who you know are buying from spammers, the people who make it all possible for them.

    As for Trackback, I’m not 100% up on how it works either, but I’m pretty sure it involves the blog software–that when someone posts with a link to one of your posts (not just the site in general), their blog software sends out a ping to your blog software, kind of a community thing. It doesn’t affect stats, I don’t think–it just lets people know who’s talking about you.

  5. May 14th, 2005 at 00:08 | #5

    I admit I didn’t read most of the above but this referral spamming is being stopped at the host level, not the individual user account. At least my host (textdrive) is with mod_security.

  6. Luis
    May 14th, 2005 at 00:11 | #6

    Interesting–but I don’t understand… could you be more specific? Thanks in advance!

  7. May 14th, 2005 at 07:48 | #7

    neither do I… but I’ll let you know when i find out more. I just moved to the new host, but you can search the forums over there where the staff discuss pretty much everything they’re doing regarding spam and security.

    http://forum.textdrive.com/viewforum.php?id=15

  8. May 14th, 2005 at 14:10 | #8

    check out this forum thread

    http://forum.textdrive.com/viewtopic.php?id=65

    instead of blocking referals via individual .htaccess files they’re using a master list shared across all servers (something to do with mod_security on apache). I checked my first days stats and not a single bad link. Normally I have 300 plus bogus referals on the first day of the month in awstats on my old host.

Comments are closed.