Oscar Wilde said: "Consistency is the last refuge of the unimaginative." But it's clear that he was never responsible for an online service. Online, consistency is a virtue.
I recently wrote about grabPerf, the third-party response time monitor, that shows PubSub.com as consistently deliverying the fastest response time of any of the major search services. We even beat Google! In my note, I explained that the reason PubSub delivers results so fast is that we're designed from the ground up to support Internet Scale prospective search -- the most common kind of search performed in the blogging world. However, another aspect of our design is that we deliver very consistent response time. We're not just fast, we're consistent.
In the world of online systems, consistency is terribly important in maintaining a good user experience. Users can get used to the idea that a site might be slow (after all people actually use Technorati, IceRocket and Blogpulse -- consistently the slowest services in the grabPerf listings...) However, users are often very frustrated when a service is sometimes fast and sometimes slow. They hate inconsistency of response. This is a lesson I learned back in the 70's when IBM offered what was essentially "flat" response time in their transaction processing systems. Users loved it since they rapidly learned exactly how long it would take to get a response from the system and adjusted for it. I wondered for a long time how in the world IBM could deliver such consistent response time even though machine loads changed drastically throughout the day... Well, someone finally explained that they had a timer set on each transaction and the system wouldn't release results until the timer had expired. As long as the timer was set to some length longer than the maximum internal processing time, users saw dead flat consistent response time -- and they were happy.
It's too bad that the slower search engines don't do the same thing. It would make them much more pleasant to use. Of course, at PubSub, we don't need to do that sort of thing. As long as we deliver results in less than 1 second, users can't really tell the difference. I'll explain why we're able to do this below:
The grabPerf system monitors major search engines (and some not so major) and delivers up a wide variety of reports on the responsiveness of those engines. grabPerf can be considered sort of the "poor man's" version of the more professional and comprehensive monitoring services provided by Gomez. One of the more interesting charts that one can get from grabPerf is a chart of search engine responsiveness by hour. You can see one of these charts for PubSub on the right. (Click on the image to see it full sized.)
You'll note that the responsiveness shown in the PubSub chart is very flat. What this tells us is that it doesn't matter what time of day you fetch results from PubSub, you'll get pretty much the same response time from us. This is what one would expect from a service that implements prospective search algorithms like we do -- since serving the results of such a system are almost trivial. The hard work was done when the new data arrived, not when the request for data was made. Thus, it is really easy for us to ensure that we have enough web server power to handle any reasonable load without bogging down. (In fact, we use two dual-processor Intel servers as web servers. It's more than we need, but we like it like that. Over-configuration results in consistent response time.)
You wouldn't expect flat response time from a retrospective search engine unless it was very over configured. The reason is that each incremental request to a retrospective search engine is very expensive. Each request can require the retrospective search engine to scan through the index records for hundreds of millions of documents and then filter, rank, sort, and display the results. Now, since requests don't arrive evenly throughout the day, the result is that at some times during the day, the
retrospective system will be just barely able to keep up while at other times of the day it will have massively more processing power than it needs. Response time will vary throughout the day like in the chart at the right which shows Technorati's responsiveness.
Because retrospective queries are so expensive, the delta between the maximum configuration needed to handle the heaviest load and the minimum configuration can be larger than would be required to implement an equivelant, properly designed, prospective search system. (Note: While PubSub only has about 30 servers in total, some comparable retrospective services apparently have over 400 servers!...)
Of course, raw machine power and good algorithms aren't enough to ensure that you deliver consistent response time. It is also very important to properly manage network connections, firewalls, peering arrangements, routers, etc. Once again, we can use the grabPerf data to demonstrate how we're working to ensure consistent and fast response time for our users... If you look at the chart on the right, you'll see that before Aug 20, our response time varied greatly. Depending on the day, it took us anywhere from .5 to 1.7 seconds to deliver results. That isn't acceptable -- too much variablity. Then, suddenly on August 20, we not only doubled the average speed with which we deliver results, but we also smoothed out much (but not all) of the variablilty. It wasn't easy to get that improvement! On that day, we moved all our machines from a co-location facility in Chinatown to one near Wall Street (Peer1), we replaced our firewalls, changed all our IP addresses, changed our peering arrangments, and eliminated a stack of routers between us and the rest of the world. (Notice that grabPerf didn't even notice any real downtime during the move!) The result is a much better user experience and one that we'll do our best to not only maintain but improve.
As I said at the top of this piece: Online, consistency is a virtue. While I can say all sorts of things on this subject, the most telling proof of the value of what we do is that we've had a number of our users notice the difference in our service and let us know that they appreciate it. That's what we're here for: satisfying user needs -- consistently.
bob wyman
Comments