I don’t know if you are familiar with www.alexa.com. I didn’t until yesterday. I don’t recall how I got there, but the site looks like a search engine like Google, until you notice that you can enter URLs of websites and Alexa gives you a ranking of the site. I thought it was something like PageRank of google but publicly available. The interesting part starts when you find out it describes in detail the website’s traffic.
In the beginning I suspected that they use some heurist methods to estimate the traffic based on connectivity of the site or other available data. Then they claim that to be actual traffic information. I mean it couldn’t be real traffic data. How are they supposed to obtain it? To be honnest I was puzzled for a moment, “Am I missing something fundamental about the internet?”. But that was instant.
Digging further into Alexa I found out what is was about… the magic was gone… The statistics use information they spy from users that download the Alexa toolbar! Leaving behind the matter of spyware, I wonder what relation have the statistics collected from the Alexa toolbar users, with real traffic? You can say that this is the way statistics work. You can’t sample the whole population. Instead you sample a few of them, a characteristic subset. And the magic keyword is characteristic. Are the Alexa toolbar users representative of the total web surfers? I hardly think so. I mean I don’t care if they sample their employees and get results from that; the problem is that they don’t mention it clearly.