Okay, I’ve been using Google’s beta web site statistics service for a little over a week now, enough to get data for the time-range analysis which is part of the software. This is a basic layman’s report on how Google Analytics (GA) works, what it offers, and how useful it is for a non-commercial blogger like me. This first part will give an overall review. Later I will go over all of the different statistical analyses the service provides, or at least the ones not concerning marketing. Keep in mind that I am still relatively new to GA, and so there may be features or tricks that I have not yet come across. (If you know any, please comment!)
First, the service seems to be aimed at (but by no means limited to) commercial sites, especially ones that use Google’s AdWords service, in that a good portion of the analyses are set up to measure the performance of that service. If you’re not advertising, then a lot of GA’s data sets will be empty for you. However, that still leaves quite a few highly useful statistics for you to peruse.
GA, like Google’s GMail, Calendar, and other services, makes extensive use of Javascript. There are a lot of toggled menus and pop-up windowlets that will expand your choices without having to regenerate the page view. When clicked on, graphs will acquire or lose labels, and pie charts will explode specific segments. In short, the app is designed in a very sexy way, much better than a simple web page with buttons and static graphic elements. Of course, there is still much that requires the page to be rebuilt, so some things take more time than a stand-alone app independent of a browser. But it’s worth hanging around for.
The way it works: because GA is not resident on your web site or domain, you need to append a small script to the end of every HTML script for your site. Blogs use templates to generate each page, so it’s easy to install in that sense. Every time a page so tagged is viewed by someone, the script sends the users’ data to your GA account, where it is accordingly tabulated to create the display. This system, while necessary for an off-site service, has a rather glaring flaw: files that cannot carry the GA script are not counted. As a result, GA cannot see anyone who only monitors your RSS feed, nor can it detect when non-HTML files (such as images or movies) are accessed by direct external links, including hotlinking. This creates a rather large blind spot for the service. GA does offer a way to track specific file views and downloads, but only if the referrer is from within your site.
On the other hand, the script tag does allow you to choose precisely which HTML files on your site you want to have tracked. For example, a huge amount of traffic on my site is generated by spammers, who focus primarily on the scripts that are not content pages in and of themselves. Spammers are constantly accessing my comment and trackback scripts without going through the actual pages of the blog, which are what I am interested in. As a result, very, very little of the spam that hits my site gets recorded, and despite the blindness to RSS visitors (which may constitute as much as 1/3 of my visitors), GA gives me a much truer view of the real people who come and read my blog.
Ideally, GA would be perfect as an on-domain script (like AwStats), which would directly monitor all traffic on the site–but still give you the ability to dictate which files are tracked and which are not. Even better would be a way for the data to be tracked from within the domain, and then compiled as a data package which an application resident on your computer could regularly download, allowing you to analyze the data far more flexibly and quickly.
Another downside to GA is the restricted filtering ability. GA allows you to dictate filters on incoming data; for example, if a spammer is hitting a page in your site, you can specify to GA through a powerful filter engine exactly what to block from coming in. That’s the good part; the bad part is that once the data has hit you, you can’t edit it out of the data you have already collected. This means that any spammer or generator of bogus data will have a permanent impact on your GA stats until you notice them and go to the trouble of applying a specific filter. If the files you track are frequently hit by spammers who constantly fake and/or rotate IP addresses, domain names, and other data, it will be a constant game of catch-up for you. The ability to purge the existing database of spammer activity you have noticed would be a vast improvement.
GA does offer you the ability to temporarily filter data as it is generated in the current display, but this ability is far too limited. First, it only allows you to include or exclude a single keyword. If you want to see only one element isolated from all others, it’s very good. But if you want to exclude more than one data element, it is more or less useless. One example: when GA lists all referrals (visitors who followed links to your site from outside sites), it does not differentiate search engines from other referrals (a rather glaring omission, in my opinion). The temporary filter will allow you to only exclude one keyword; that allows me, for example, to exclude all hits coming from Google, but I cannot also exclude all the other search engines, such as Yahoo, MSN, and so on–not at the same time. One other problem with these filters is that they are discarded as soon as you leave the particular analysis you apply the filter to; the filter cannot be remembered, and so must be re-input every time.
That pretty much wraps up all the major shortfalls I have noted so far; after that, it’s all gravy. As I mentioned, a sexy interface, tons of options, lots of useful and interesting data. There are dozens of useful breakdowns, lists, and charts. Almost every piece of data can be analyzed in cross-section–for example, when I view the chart showing new vs. returning visitors, I can break down either group by their region, browser type, or the keyword they used to find my site via a search engine. For example, most of the people who find me via a Google search for “eyelid twitch” only visit once; fewer than 5% return for another visit (within the same week, at least). The data GA collects is very flexibly viewable in these respects. You can also specify a range of dates within which to view data.
GA also has a Help Center which covers a surprising number of topics. Usually such “help” areas are lacking, leaving you in the dark about how to use the software. GA breaks that trend, explaining a wide range of features and issues, and doing a very good job at that. The explanations are not too technical for the casual user, usually favoring a complete omission of the hacker-level stuff. Support forums, in the form of Usenet groups tracked by Google’s Beta Group search engine, exist to highlight any specific requests or exchange of information between users.
Next: What you can see, and how you can see it.