Archive

Archive for the ‘BlogTech’ Category

Fun with Blogging

August 23rd, 2006 2 comments

This is one of the fun things about blogging–you never know what’s going to click with people, and sometimes it’s just plain weird. Like my fourth blog post after starting this whole shebang was an offhand remark about my eyelid twitching, and suddenly people swarmed to it. Even today, three years later, I still get several comments a week on a subsequent post on eyelid twitching, and it’s up to almost 900 comments so far. That one post is my main draw from Google. Go figure.

But that’s what blogging is like. You make long, arduous, well-thought-out posts on serious subjects, but then your stats show people crowding to read your pithy comments on errant bodily functions. With a blog, you never know what people are going to want.

Just before heading off to Karuizawa, I wrote a post mainly consisting of whinging about whether or not I should get my Powerbook screen replaced, and then a short recalling of the Apple store people giving me a big bag to protect my backpack from the rain. After getting several useful comments with advice on the repairs, I forgot all about the specific post–only to later notice a big spike in visitors around that time. For some bizarre, unknown reason, the Mac news aggregator MacSurfer.com picked up the blog post (usually someone has to recommend it), and then two other Mac sites (like this one) picked it up under the topic of “Opinion,” without comment. I gotta figure that they just automatically, mechanically pick random stories from MacSurfer, because the post is pretty ditzy, to be quite honest. But hundreds of people came just for that. I always hate to think people have come to my blog and have left disappointed, but then, people are still raving about my eyelid twitching thing. Maybe I’m just not that good a judge of what people like.

Categories: BlogTech Tags:

Diary Rescue

August 12th, 2006 Comments off

Hey, whaddaya know… my first “diary rescue” over at DKos, on the Moralities post. I’ve been cross-posting a few of my entries over there, but the non-recommended diaries disappear off the page so fast–there are so many posts by so many people–that it seems few people get the chance to read it. The only way you can really get an appreciable audience is to get “rescued.” And, it does encourage one somewhat. (Tip of the hat to Paul for the original suggestion.)

Categories: BlogTech Tags:

Three Years

August 2nd, 2006 4 comments

…And that makes three. Three years of blogging daily, without a break in that whole time. I started on August 2, 2003, and noted the yearly milestones for year one and year two. I’m also coming up on the two-thousandth-post mark, though not quite so soon–probably I’ll hit that late September or early October. Blog readership is down a bit recently, but that may be more because of the fact that I put trackbacks to rest, and as a result spam, now limited to comment spam and referrer spam, has trickled down to just three or four thousand a week. About 15% of my visitors are returning readers, and according to Google Analytics, several hundred people visit regularly (more than half from the U.S., another quarter from Japan, and the rest from mostly English-speaking countries, naturally).

And so on to year number four…

Categories: BlogTech Tags:

Good Web Sites

July 30th, 2006 5 comments

Here are some web sites I regularly visit, but don’t link to on my sidebar, at least not at present. They’re good regular visits:

Cosmic Buddha: Japan blogger with a good sense of humor.
Crooks and Liars: why the heck don’t I have them up on my sidebar yet?
Engadget: a good general tech blog.
FG: a good blog on Japan’s happenings (don’t make me print their full title).
Pharyngula: a good science blog; often takes on the creationists.
SeanPAune: some good stuff here.
The Straight Dope: fun facts for the day.
TUAW: The Unofficial Apple Website, good Apple tech blog.

Most of these will probably get on to the LinkBoard, once I find the time to clean up the blog some. Should be soon–I just taught my last class of the semester, and with just one final exam to give, some grading to do, and the graduation ceremony, there’s not much left before my one-month summer vacation. Ah, the college professor’s life!

Categories: BlogTech Tags:

Now I Remember….

July 28th, 2006 Comments off

My luck with web hosts has not been spectacular, though it has probably been about par for the course. Although my current web host has done the best so far, two that I stuck with before had to treat me pretty badly before I left. In every case, the thing that has torn it with me with web hosts has been flaky service–site outages and things just generally falling apart. But the two hosts that really let me down went farther than that.

If you like to read people who vent their whining, read on below the fold–I won’t inflict my incessant whinging on those visiting the main page. Essentially, I go into detail about how the two web hosts were very, very sucky, and how I had to revisit one of them just the other day, in a way that brought back all my memories of just how sucky the suckers sucked. If you for some bizarre reason don’t like to read other people moaning and bellyaching (what’s wrong with you?), then just go on to the next post.
Read more…

Categories: BlogTech, People Can Be Idiots Tags:

Links to 3 Again

July 6th, 2006 Comments off

As someone pointed out, my filters have been aggressive as of late, limiting the number of links allowed. I’ve reset the limit to 3 again; hopefully, attacks have subsided enough that I won’t get deluged…

Categories: BlogTech Tags:

RSS Back at Full Blast

June 30th, 2006 4 comments

After a 1-month experiment with trimmed RSS feeds, I have reset the feeds to display full stories (or at least, much, much longer excerpts of them). For those of you monitoring the site by feed readers, thanks for your patience.

Categories: BlogTech Tags:

Trackback Cleanup

June 18th, 2006 Comments off

Wow! Turning off the trackbacks really made a difference in clearing out the spammers. I’m still getting about one hundred comment spam a day (MT-Blacklist gets about 75% of them, SpamLookup the other 25%), but the trackbacks seem to have been the real monster here, as I noted before. And it showed up like crazy in the AwStats: immediately after I deactivated the trackback script, the number of “visitors” dropped by a bit more than half. This may represent the clearest picture of visitors I’ve gotten from AwStats in quite some time.

But it also shows how lopsided spam has become. Half of my site’s “visitor” traffic was trackback spam. Maybe 25% of all traffic volume, in megabytes, was spam–the spam, at least, did not download movies or anything. Just made smaller hits–but there were so damned many of them. For a long time, I’d known that about a quarter of my site’s traffic was spam. Recently, I could see that my “visitor” traffic had inexplicably risen by about 30% to 40%. This explains why.

When I look at the “recent visitors” in my domain’s CPanel, I can still see them constantly hitting the site–but now all they’re getting are “500”-type server errors. Maybe in time they’ll even stop trying, once they see there’s no trackback there anymore.

Turning the damned things off was the smartest maintenance move I’ve made so far. If you still use trackbacks, consider shutting them down. They’re not worth it.

Categories: BlogTech Tags:

Trackback Off

June 16th, 2006 1 comment

So much for that failed blog element. Trackbacks are supposed to be an automated linking system between blogs that refer to each other. Say if Blog A has an interesting post, and the author of Blog B writes his own entry which refers to Blog A, complete with a link. When Blog B publishes this entry, his blog software will send a “trackback ping” to Blog A, informing him of the reference. Blog A will often catch this ping and automatically make a link right back to Blog B within the original entry’s comment section. A nice idea–you link to me, I link back to you, we know we’re talking about each other. Community. Cooperation. Swell.

The problem: spam. What else? Now, maybe once a month, at most, I get a genuine trackback ping. In that same time period, I get tens of thousands of fake trackback pings from spammers (called “spings“). What’s more, these fake pings show up in my referrals (AwStats, not Google Analytics) and make a further hash of the numbers beyond what referral spam does. It got so bad that after my spam logs got so flooded that it caused a server error and I had to purge the database, within a half an hour, 114 more trackback spams had started filling it back up again. It got to the point where comment spam was beginning to look like a minor annoyance.

Mind you, the filters were working–no trackbacks were getting through. But I have no real use for trackbacks anyway, so I got a special utility and closed off all the trackbacks for all the entries in the entire blog, and then I switched off the trackback script. And good riddance.

Categories: BlogTech Tags:

Spam Blogs

June 13th, 2006 1 comment

Well, if this doesn’t beat all. Spammers have a new trick: create fake blogs, then steal blog post text from real blogs, and then load the blogs with ads for ringtones, pharmaceuticals, porn, and others spam commerce sites.

I found out about this by doing a Google blog search on the title of my last blog post, “Japanese Green Pheasant,” to see who else may have blogged on the bird. To my surprise, my own post appeared–three times. One was the original, and two more were spam plagiarists. Only a few lines of text are stolen, but it’s still theft, and still it’s spammers using my work to sell their crap.

Looking up other titles, I’ve found more of my posts stolen as well. So far, all the spam blogs I’ve found (and no, I won’t link to them) are on “yahoo-blogs.com”–but don’t be fooled, it’s not really Yahoo. A spammer apparently got ahold of that domain name, and uses it just for spamming.

Categories: BlogTech Tags:

Finally, Someone in Nigeria–er, Russia–Needs My Help

June 8th, 2006 2 comments

I don’t know what took them this time, but finally, five days after putting a virgin email address up on the main page, spam started coming in. A birthday present for me! Just one message so far–a Nigerian variant–but it’s beginning. I am adding it in the comments of the original post, where I will collect all future spam.

This Nigerian variant pretends to be a Russian barrister representing an oil tycoon who got arrested for bribing politicians, and needs to funnel $18 million through my bank account. I sent him my credit card number right away.

Doonesbury Nigerian Spam

Oh, and the travel agency is still hotlinking to my image. Bless their hearts.

Categories: BlogTech Tags:

Web Host Alert

June 7th, 2006 Comments off

I’m not getting anything for this, but it could be a fairly good deal, so I thought I might put out the word. The web hosting service I use for this blog is Surpass Hosting. I signed on to this host 2 years ago when my old host went into the “too much trouble” mode that all web hosts seem to reach at some point. At the time, Surpass was having a 2-for-1 promotion, where if you signed up during the promotion, you got two accounts for the price of one. If you only have one domain, then it isn’t that great. But if you have several, like me, and especially if you’re hitting high bandwidth, then the deal is very good. I signed up, and have been more or less satisfied with the service. There were two periods where I almost felt like leaving, but it never passed that threshold.

In the second half of June, they’re reviving the promotion for the first time since I signed on. The lowest-cost shared hosting deal is their “Power” plan, which has these specs:

$6 per month ($65 for a one-year contract)
5GB hard disk space
200GB monthly data transfer
10 add-on domains
Unlimited email, subdomain, and FTP accounts, and unlimited SQL databases

You always have to take the “unlimited” claims carefully; there are limits in that you can only use up a certain percentage of the server resources. Too many active scripts on your site could get you into trouble, but you have to do a pretty heavy amount of activity to get there.

“Shared hosting,” by the way, means that your web site is one of a few hundred that inhabit the same server computer at the web host’s facility. It’s usually enough for a regular person’s web site, like a blog or something else casual. However, there are more chances for one or more other users on the same server to behave badly and negatively impact your site’s uptime and performance. The next step up is “Virtual” or “VPS” hosting, which also divides a single server among many accounts (often up to two or three dozen), but each gets a bigger share and each account resembles a private server in some ways. These accounts can be more expensive (starting at around $50/mo.), but allow greater access to site resources. Then you have “Dedicated” hosting, where you get a server machine all to yourself, with the CPU and hard drive dedicated to serving your site, with no one elbowing in on the resources. These are the most expensive, ranging from a hundred to several thousand dollars a month, depending on the package and the bandwidth you get.

As for the other specs: “transfer” means how much data can be uploaded by you/downloaded by visitors each month. My blog is pretty active (around 30,000 unique visitors/mo.), and I have several multi-MB files that get downloaded a hundred times each or so every month; my transfer amounts to about 30GB/mo., and is growing. “Add-on domains” means that in addition to the domain that dominates your account (mine is blogd.com), you can also have other domains run from the same account. Each one inhabits a subdomain, but appears to the world as an independent domain. For example, I have the domain “xpat.org” settled within my “blogd.com” account. Its real address is “xpat.blogd.com,” a subdomain, but if “xpat.org” is directly accessed, it’ll act like its own domain–just not at the moment, though, as I currently have it sleeping, and it redirects to this blog’s main page.

As I mentioned, Surpass is OK as web hosts go, but like all hosts, it has had its rough spots. Soon after I signed on in 2004, a string of hurricanes hitting Florida showed up a lack in their backup facilities, but that got resolved. And late last year/early this year, I had enough site slowdown and script failure problems to almost make me move, but I stayed on and that got resolved. Both are good signs–the bad sign is when the problems don’t get resolved, and your host shows little interest in doing anything about it. But even with Surpass, you gotta keep on their butts about it. Eventually, my latest problems were solved when they moved me onto a new server; the problems arose from another account on the shared server being a CPU hog.

One of the better points about Surpass is the rather quick response time to support ticket requests; they tend to answer in a few minutes, and often the solution is not far behind. But as with any host, different people have different experiences, and there is never any guarantee that any host won’t go south at any given time.

As usual, you can find user experiences and a variety of good advice on Web Hosting Talk forums.

Ah, and the tech support just got back to me: they upgraded my account. When I signed up, ten bucks a month got me 7GB of hard disk space and 75GB traffic; under the present deal, the same amount gets you 10GB and 400GB. So I asked if they’d up me to the present levels, and they did. Fair enough, but apparently you gotta ask in order to get it….

Categories: BlogTech Tags:

Feeds Click Through, Please

June 1st, 2006 3 comments

Starting today, as an experiment in statistical accuracy, I am resetting the RSS feed for each blog story to a lower amount, meaning people who read this blog by feed readers will have to click through to the actual blog page to read the whole blog posts. If this is too annoying for some, please let me know. For the time being, at least, I would like to know how many people are actually coming to read via RSS. Thanks for your cooperation.

Categories: BlogTech Tags:

Comments Getting Through?

May 27th, 2006 1 comment

As a result of even greater waves of spam (I’m not the only one–a Google Blog Search reveals that one or more spammers are being even more obnoxious than usual), I turned on a few more protective layers of my spam filters. I don’t want to shut out legit users, however, so I’ve set up a throwaway account for you to email if your comment is blocked. Just drop me a line and let me know if you’re having any trouble with it. The throwaway is aptly set as: throwaway2 at blogd dot com. Thanks!

Categories: BlogTech Tags:

Google Analytics Review, Part 1

May 20th, 2006 Comments off

Ga-TitleOkay, I’ve been using Google’s beta web site statistics service for a little over a week now, enough to get data for the time-range analysis which is part of the software. This is a basic layman’s report on how Google Analytics (GA) works, what it offers, and how useful it is for a non-commercial blogger like me. This first part will give an overall review. Later I will go over all of the different statistical analyses the service provides, or at least the ones not concerning marketing. Keep in mind that I am still relatively new to GA, and so there may be features or tricks that I have not yet come across. (If you know any, please comment!)

First, the service seems to be aimed at (but by no means limited to) commercial sites, especially ones that use Google’s AdWords service, in that a good portion of the analyses are set up to measure the performance of that service. If you’re not advertising, then a lot of GA’s data sets will be empty for you. However, that still leaves quite a few highly useful statistics for you to peruse.

GA, like Google’s GMail, Calendar, and other services, makes extensive use of Javascript. There are a lot of toggled menus and pop-up windowlets that will expand your choices without having to regenerate the page view. When clicked on, graphs will acquire or lose labels, and pie charts will explode specific segments. In short, the app is designed in a very sexy way, much better than a simple web page with buttons and static graphic elements. Of course, there is still much that requires the page to be rebuilt, so some things take more time than a stand-alone app independent of a browser. But it’s worth hanging around for.

The way it works: because GA is not resident on your web site or domain, you need to append a small script to the end of every HTML script for your site. Blogs use templates to generate each page, so it’s easy to install in that sense. Every time a page so tagged is viewed by someone, the script sends the users’ data to your GA account, where it is accordingly tabulated to create the display. This system, while necessary for an off-site service, has a rather glaring flaw: files that cannot carry the GA script are not counted. As a result, GA cannot see anyone who only monitors your RSS feed, nor can it detect when non-HTML files (such as images or movies) are accessed by direct external links, including hotlinking. This creates a rather large blind spot for the service. GA does offer a way to track specific file views and downloads, but only if the referrer is from within your site.

On the other hand, the script tag does allow you to choose precisely which HTML files on your site you want to have tracked. For example, a huge amount of traffic on my site is generated by spammers, who focus primarily on the scripts that are not content pages in and of themselves. Spammers are constantly accessing my comment and trackback scripts without going through the actual pages of the blog, which are what I am interested in. As a result, very, very little of the spam that hits my site gets recorded, and despite the blindness to RSS visitors (which may constitute as much as 1/3 of my visitors), GA gives me a much truer view of the real people who come and read my blog.

Ideally, GA would be perfect as an on-domain script (like AwStats), which would directly monitor all traffic on the site–but still give you the ability to dictate which files are tracked and which are not. Even better would be a way for the data to be tracked from within the domain, and then compiled as a data package which an application resident on your computer could regularly download, allowing you to analyze the data far more flexibly and quickly.

Another downside to GA is the restricted filtering ability. GA allows you to dictate filters on incoming data; for example, if a spammer is hitting a page in your site, you can specify to GA through a powerful filter engine exactly what to block from coming in. That’s the good part; the bad part is that once the data has hit you, you can’t edit it out of the data you have already collected. This means that any spammer or generator of bogus data will have a permanent impact on your GA stats until you notice them and go to the trouble of applying a specific filter. If the files you track are frequently hit by spammers who constantly fake and/or rotate IP addresses, domain names, and other data, it will be a constant game of catch-up for you. The ability to purge the existing database of spammer activity you have noticed would be a vast improvement.

GA does offer you the ability to temporarily filter data as it is generated in the current display, but this ability is far too limited. First, it only allows you to include or exclude a single keyword. If you want to see only one element isolated from all others, it’s very good. But if you want to exclude more than one data element, it is more or less useless. One example: when GA lists all referrals (visitors who followed links to your site from outside sites), it does not differentiate search engines from other referrals (a rather glaring omission, in my opinion). The temporary filter will allow you to only exclude one keyword; that allows me, for example, to exclude all hits coming from Google, but I cannot also exclude all the other search engines, such as Yahoo, MSN, and so on–not at the same time. One other problem with these filters is that they are discarded as soon as you leave the particular analysis you apply the filter to; the filter cannot be remembered, and so must be re-input every time.

That pretty much wraps up all the major shortfalls I have noted so far; after that, it’s all gravy. As I mentioned, a sexy interface, tons of options, lots of useful and interesting data. There are dozens of useful breakdowns, lists, and charts. Almost every piece of data can be analyzed in cross-section–for example, when I view the chart showing new vs. returning visitors, I can break down either group by their region, browser type, or the keyword they used to find my site via a search engine. For example, most of the people who find me via a Google search for “eyelid twitch” only visit once; fewer than 5% return for another visit (within the same week, at least). The data GA collects is very flexibly viewable in these respects. You can also specify a range of dates within which to view data.

GA also has a Help Center which covers a surprising number of topics. Usually such “help” areas are lacking, leaving you in the dark about how to use the software. GA breaks that trend, explaining a wide range of features and issues, and doing a very good job at that. The explanations are not too technical for the casual user, usually favoring a complete omission of the hacker-level stuff. Support forums, in the form of Usenet groups tracked by Google’s Beta Group search engine, exist to highlight any specific requests or exchange of information between users.

Next: What you can see, and how you can see it.

Categories: BlogTech Tags:

Google Analytics: Initial Reaction

May 12th, 2006 6 comments

0506-Ga-Fd

The first data set came in for this site from GA, and while it’s incomplete (only a half-day’s report), it does give some interesting insight.

One huge drawback is that GA can’t track RSS feeds. According to AwStats, fully 1/3 of the visits to my site are people who get a feed of the blog, and don’t visit the blog directly. But when someone reads the feed, that means they aren’t actually visiting any pages on the site. RSS feed can’t use Javascript, which is how GA tracks visitors. So that’s a rather significant blind spot. It could be significant for me because I’d like to know for certain that all the hits on the RSS feed are actual people, and not bots or spam or whatever. Other data on it would be nice too, of course.

Of course, part of this is my own fault: I’m too generous with the RSS feed. Most feeds only let you glimpse the first few lines of text of any entry/article; I opened the floodgates, letting people see everything, all the content, entire posts, including photos, in the RSS feed alone. This is relevant because if I limited the feed to a snippet, RSS visitors would have to click “read more” and visit the actual web page in order to see my whole posts. Depending on how things go, I might temporarily (or even permanently) change to snippets rather than full feed. The good side of that would be that more people would visit the actual pages, giving me a better view of who is actually reading things on my site. The down side is that people who enjoy reading in RSS might be annoyed that I’m making them click links all of a sudden. Are there any RSS readers out there who would like to give feedback on that before I decide?

On the other hand, there is a positive blind spot for GA: it doesn’t track spambot-driven accesses to comment and trackback scripts, which is mostly how the spammers attack. Simply by nature, GA ignores most spam.

Of the data that does come in, GA seems to claim some pretty amazing abilities, tracking data I had no idea was possible. For example, GA claims to be able to detect the screen resolution and color depth settings of each visitor. According to the stats, about 52% of visitors to this site over the past 12 hours had a screen resolution of 1024×768; 14% had 800×600, and 13% had 1280×1024. How can GA tell that? It also claims to be able to tell if someone is using DSL, Dial-up, or “Corporate,” whatever the last one means.

GA also does a fun trick with the IP addresses of visitors, detailing which country, region, and even city the hit apparently came from. I’m not sure how accurate this is when you get down to the city–the IP address would be from the ISP, not the user, after all. 12 visitors appeared to come from Saint Petersburg, FL, but I think that’s where my blog host is, so they might be entering into the data somehow. After that, in the past 12 hours, 12 people came from New York, 7 from Denver, 6 each from L.A., Washington D.C., and St. Louis, 5 each from Oakland and Sacramento, 4 each from Plano, Atlanta, Austin, and Wilmington, and so on. Apparently visitors are coming from locales like Schaumburg, Mililani, Panorama City, Colchester, Poughkeepsie, Buzzard’s Bay, Glen Carbon, Sedro Woolley, Cockeysville (no, I did not make that up), Halethorp, and Plymouth Meeting. A shout-out to my peeps in Plymouth Meeting!

Another cute trick GA does is give you the ability to view cross-sections, stat breakdowns on any one group. For example, I can see what screen resolutions were used to view my birdwatching in Japan page, or see which cities people came from to view my eyelid-twitching page (apparently there’s an outbreak in Schaumburg).

But as I’ve mentioned, I’m playing with a very limited amount of data–I don’t even have a full day’s stats yet. And a better view will come after a few weeks or longer, when I can start really tracking returning visitors and various trends. One thing is for certain, GA is an incredibly powerful analytical tool, limited only by which data it is unable to track at all.

Categories: BlogTech Tags:

Google Analytics

May 11th, 2006 Comments off

Yes! I (finally) got my invite to Google Analytics (GA), formerly Urchin Stats. It requires you to insert a script to each web page on the site (simple to do with a blog site–just edit the templates), and according to theory, the stats should start being generated within 24 hours.

I’ve been looking forward to this as a way of finally discounting referral, trackback, and comment spam in my stats; up until now, spammers have been throwing my numbers off, and even with AwStats, nice as it is, there is no way to weed them out. I’ve had to simply guess at how many visitors actually come to my site. GA has filters that can–hopefully–clean out the unwanted spam and show me how many actual visitors are coming through.

Problem is, I’ve been looking at the GA site and the filters seem powerful but complex. Also, they mention that you can only screen incoming data–though their “exclude” filter seems like you can at least temporarily screen out existing data. It’s not 100% clear from the help files, so I’ll have to wait for data to come in before I can figure it out. So we’ll see how that goes. It’ll probably take several days as I try new things with various data sets as they arrive. Here’s hoping….

Categories: BlogTech Tags:

Contextual Spam

May 11th, 2006 Comments off

I notice that the blog comment spammers are trying a new trick: contextual comment spamming. It’s still automated, but it’s smarter. Instead of just sending a few dozen obvious spam comments, the spammers try to trick you by making their spam appear to be an actual message. The first one I got related a news story about an artistic photo shoot in Spain, and the second talked about how to deal with Hamas. Except for the placement of three or four spam links, they would appear to be valid comments. So far, the spam comes one at a time–smart, otherwise it would look suspicious.

Of course, there are problems related to automating the spam: the topic of the comments, so far, is not exactly in line with the topic of the blog posts. The photo shoot news story comment spam was posted to a blog entry about blogging (though there was a tangential reference to the press media), and the Hamas comment spam was posted to an entry about Air America Radio. Who knows, maybe they do have software to try to match topics and it’s just not doing a very good job. But I imagine that sometime soon, they’ll have enough fake comments on all kinds of topics, and smart enough spam software so that the comment and blog topic will be closely matched.

Not that it’s causing me any problems–I review comments in my email software, and links show up as blue and underlined; the spam links stand out. But if one were to moderate comments using a browser, the links might be less apparent, or the blog author might gloss over them, and the fact that it is spam could be missed. I’m also not sure how the spam filters are doing with these messages; I haven’t checked to see if my filters are blocking out more of these.

In one way, this is a good sign–it means that spam blocking is effective enough that the spammers are having to go to a lot of trouble to try to get their spam through.

Categories: BlogTech Tags:

No Links for the Time Being, Please

May 1st, 2006 1 comment

Just a note: the current spam attack is prolonged enough that I’m getting tired of moderating the few that get through the blog’s usual defenses. So, for the moment, I’m setting the blocking software so that active links cannot be posted in comments–that is, you can’t use the “a href” tag. You can, of course, simply paste URLs in comments without the link tags, so they don’t become a link. I’ll set it back to accepting active HTML links when the spam attack dies off. Thanks for your patience.

Categories: BlogTech Tags:

Spam Weeks

April 29th, 2006 Comments off

Total406Spam-1

So I started to see quite a few comment spam slip through the blacklists and get into moderation. Not too many, between 4 and 6 a day, but much more than usual. I wondered if it was due to some spammer figuring out a way past the blocking software, so I checked the numbers, and found that instead it was simply due to brute force. I quickly punched out the chart above, tracking numbers for the past 18 days (since I last purged my spam lists). On the chart, the black line is the total spam (including trackback spam); the blue line shows the number blocked by MT Blacklist, and the red line shows the ones blocked by SpamLookup. SpamLookup tends to block mostly trackback spam, about 95% of what it gets, whereas MT Blacklist gets the comment spam.

You can see that for the first week, it was going pretty much as it has been since I checked the numbers last August, just depositing a few hundred spam per day. Then there was a sudden spike on the 18th, and a more steady attack pattern after the 23rd. Considering that only 4-6 spam get through to moderation among floods of almost 2000 a day, that’s not bad–a block rate of up to 99.7%. And, of course, not a single one gets through moderation.

This made me think back to the days before mass comment spam. I installed MT Blacklist in December 2003, a little more than two years ago, because one spammer posted 50 comment spams on my site. Back then, 50 in one day was massive, the first time that sort of thing ever happened. And just four months previous to that, I got my first comment spam ever, a single comment, just a month after I started daily blogging. Now, just a few years later, 300 comment spams and 100 trackback spams per day is low traffic. Thank god that the vast majority of it gets filtered out in the background…

Categories: BlogTech Tags: