Jason Calacanis recently wrote "I’m sick of the Technorati 100" and has offered a prize of $50K advertising credit or $10K cash for the first person who can offer a "better" ranking list that addresses his many concerns with the Technorati 100. Of course, since he requires that any list submitted be based on trailing 12 months data, there aren't very many who can actually enter his contest with a responsive submission. 12 months of blog data is a tremendous amount of data and if you haven't been gathering it on a continuous basis over the last 12 months, there is no way to catch up today!
In any case, Jason has some very valid concerns (and some not so valid) with the Technorati 100 list as he understands it. In this note, I'll take a look at these concerns and talk about how we at PubSub addressed similar concerns when designing the PubSub LinkRanks system that we use to provide our subscribers a means to filter subscription results. (At PubSub, you can create a subscription to any keyword(s) or URI reference(s) occuring in the "Top 1%" of blogs, or the "Top 10%", etc.) While our LinkRank system has been in place for well over a year now, we don't currently provide a means for bloggers to discover their LinkRank. This will change soon... In the meantime, if you're interested, you can inspect the LinkCounts data on which our LinkRanks rely.
Jason's concerns with the Technorati 100 list include:
1. It counts sources not links. Jason may be over-reacting here. Technorati may be counting sources rather than raw links since it is simply too easy to game a system that is focused only on raw links. If you wanted to get a high rank in a link-oriented system, you would simply ask some friends to write a few blog entries that each had several hundred links to your blog. It wouldn't take too many such entries to push your InLink numbers into the stratosphere... It would be much harder to talk a large number of sites to link to you or you would have to build a "link farm." Nonetheless, Jason has a point in saying that InLink numbers shouldn't be completely ignored. In most cases, multiple InLinks from a blog will indicate that the blogger is repeatedly finding interesting postings. Thus, we should probably allow some additional "credit" to InLinks after the first from any single blog. The interesting question is: "How much credit?"
What we do for PubSub LinkRanks is count the number of links from a single source and then decay the value of all InLinks after the first one. We use a progressive scale for this decay. Thus, a second link is worth less than a first link and a third link is worth less than a second, etc. Because the decay is pretty rapid, this allows a reasonable number of links to be recognized while not leaving enough room for the link spammers to game the system too much.
2. It is not updated often. I haven't been able to find any specific information on how frequently the Technorati 100 is updated so I'll take Jason's word that it isn't updated often enough for whatever purposes he might have. However, my personal feeling is that rankings should be updated both frequently and infrequently! Any ranking will only have utility when computed for some specified period of interest and we must recognize that different people will have different periods of interest depending on their specific purposes. Sometimes, I want to see who were the "Most Prominent Bloggers of 2004" but at other times I want to know "Who's on top today?" No single ranking period can answer both these needs.
The PubSub LinkRanks are computed daily based on a sliding window a couple weeks wide. Thus, they are best suited for showing who is "on top today" or at least who has been "on top" recently. We've chosen this strategy intentionally to favor bloggers who are making recognized current contributions to the conversation. This seems to map well to typical blogger's desire to be hearing the latest buzz as well as to the "values" that are inherent to the kind of prospective search we do. While today we only support the use of Daily LinkRanks in filtering subscriptions, in the future, we might allow users to select the period of time over which the LinkRanks are computed. What we would do is compute an average LinkRank over a variety of periods. Thus, 30-day, 60-day, six-month, or yearly LinkRanks would provide distinct tools to accomplish different filtering and discovery tasks. The Daily LinkRanks would, as now, continue to show a great deal of day-to-day fluctuation while the period-specific averaged LinkRanks would tend to show less variability as their base period lengthened.
3. People drop off the list for no reason. Once again, I think Jason has the core of a good idea here -- a ranking system should be reasonably transparent or at least work based on easily understand principles -- but I think we'll find that in any really useful list, there will be some number of "unexplained disappearances." The problem here is, of course, link spammers. No matter how good the objective measures upon which the ranking system is based, it is inevitable that spammers will figure out how to game the system. This implies a requirement that the maintainers of the system carefully monitor results and, from time to time, based on subjective opinions, eliminate various bloggers from the rankings or from the data that feeds the rankings. You should not be surprised to see rankings restated from time-to-time. Unfortunately, this manual grooming of the lists can put the maintainers in the very uncomfortable position of having people demand that they explain why "really-obnoxious-poker-spam.blogspot.com" was dropped from the rankings. What we'd all like to do is be open and honest. However, as Google, Altavista, and others have found over the years, some spammers have lawyers and they can make life very difficult for you when you openly adjust the way your system handles them.
I think we must accept that any useful ranking system must incorporate some amount of "subjective" and even "unexplained" variability... No purely objective system will stay free of gamers and spammers for very long. Unfortunately, this means that any really good ranking system is going to require continuous monitoring and tweaking by real and expensive human beings... The machines can't do all the work for us. So, when you pick the ranking system that you decide to rely on, be sure that you have some idea of who are the people adjusting things "behind the curtain." Also, appreciate that their job is neither easy nor pleasant...
4. It’s based on the number of links for all time. Jason is actually not quite correct in this attack on the Technorati 100, nonetheless, it is an interesting point to explore. A "for-all-time" ranking system rewards people simply for having been blogging longer than others. It gives weight to seniority not quality. In a "for-all-time" system, a blog that accumulates 1,000 InLinks over the last five years is given the same rank as one that has generated 1,000 InLinks since it was first created 10 days ago. This just doesn't make sense. Imagine a blog that carried links to pictures of Janet Jackson's "wardrobe event" at the Superbowl and as a result gained 10's of thousands of InLinks in a matter of hours. Imagine also that that blogger hasn't had much to say since that event that anyone has found to be worthy of an InLink. Does it make sense that years later all those stale links should be lifting the rank of the now boring or even dormant blog over that of people blogging interesting content today? I don't think so. One important question that a ranking system should answer is: "What have you done for me lately?"...
Although the Technorati 100 is not based on a "for-all-time" system of weighting links, it is based on "ageless" links. I'm sure there are some uses for such ranking systems, but I must say that this attribute of the Technorati 100 is the one that contributes most to my failing to find it to be useful. Apparently, the Technorati system only considers links that are still visible on the blog when they scrape it. (Unlike PubSub, which is feed oriented, Technorati scapes blog pages...) However, they give to all such links an equal weighting in their ranking -- no matter how old they might be. What this means is that you can give your blog more say in the Technorati system simply by showing more history on your blog! Also, it means that if you abandon your blog, your links will continue indefinitely to have weight in the Technorati system. Given that a massive number of blogs are abandoned, any ranking system based on ageless link weights will have a persistent bias towards bloggers that used to be popular whether or not they are still popular.
PubSub does not provide a "for-all-time" ranking system nor do we base our LinkRanks on "ageless" links. As mentioned before, the Daily PubSub LinkRanks are computed using a window of only a couple weeks of LinkCounts data. Thus, very old InLinks have no impact on current Daily LinkRank. If a blog is abandoned, its influence on our rankings will rapidly disappear. We decay the value of more recent InLinks according to their age in somewhat the same way that we decay the value of multiple InLinks from a single site (see discussion above). What we do is give more value to an InLink created today than to one created yesterday and we give less value to a two day-old InLink, etc. until an InLink created a couple of weeks ago has no value to contribute to a blogs rank. The result is a much more accurate and current measure of a blog's current popularity, importance, impact, whatever...
5. Why 100? Why not? Jason seems to prefer 500, however, there are undoubtedly people who want just the Top 10 and others who want the Top 1000. The reality is that as the Blogosphere grows, the number of folk usefully said to be "at the top" also grows. A few years ago, the Top 100 bloggers would have included every blogger. Today, the Top 100 is such a small group it doesn't have much meaning. Even if the Top 500 is a meaningful group today, it won't be tomorrow... Thus, we probably shouldn't be worrying about any arbitrary and unchanging numeric cut-off. We should be seeking a flexible metric that grows along with the Blogosphere itself. Something like "Top 1%" or "Top 10%" seems to make a great deal more sense to me. Individual users should be the ones that decide how many are in whatever they consider to be the "Top."
The PubSub LinkRank system reports the absolute numeric "LinkRank" for millions of individual blogs as well as the relative percentile rank for each of those blogs. Thus, you can discover that your blog has an absolute rank of 2,045 and is also in the top 1% of all blogs that we rank. Because relative percentile ranks are pretty easy for people to understand, these are the ranks that offer for use in subscription filters at PubSub. An example of the dual ranking method can be seen in Business 2.0's recent story that gives a "Preview" of the PubSub LinkRank system. Note: The actual graphs that we'll be publishing will not look quite like what you see in the article.
These are just a few thoughts on Jason's challenge. His metrics for a new ranking system are interesting even though limited. I think that the PubSub LinkRanks system addresses a number of his concerns; however, I think there are many other concerns that would be addressed by a really good ranking system. But, this post is already getting a bit long... For some other folks' perspectives on the subject, consider reading Mary Hodder's recent post. Interesting thoughts are also found in the posts of Stowe Boyd and Kevin Burton among many others.
It will be interesting to see what comes from the debate that Jason has sparked. I'm certainly hoping that we'll get some good ideas that we can incorporate into the ongoing process of making our own LinkRanks a more useful tool.