Adam Bosworth has moved from BEA to Google and after a lifetime of building enterprise applications is beginning to discover both the joy and the challenge of building web-based consumer services. Part of the challenge, he recognizes, is that "The scale [of consumer applications] is orders of magnitudes more than is normally processed by a business process within even the largest corporation." Adam uses this as a jumping off point to talk of how important it is to focus on simplicity in design -- not only to ensure that the system is usable by "mere mortals", but also to ensure that the required performance can be delivered.
The drastic scale difference between enterprise and consumer services is something that we've grappled with quite a bit at PubSub.com. We're constantly having potential investors look over our plans and then "wisely" tell us that we really should focus first on enterprise applications to pick up some "easy money" and fund our initial development. They tell us that only after having had success in the enterprise market should we even try to deliver the consumer service that we have at PubSub.com. The problem, of course, is that we know that the requirements of enterprises are vastly easier to meet than the requirements of a broadly used consumer oriented service. Thus, if we were to do as they recommend, it is almost inevitable that we would take short-term short cuts that would let us meet the limited scale needs of enterprise applications but would probably not help us address the vastly greater needs of the consumer application that is our real goal.
At PubSub.com, every day we read over 2.3 million RSS/Atom feeds -- often multiple times. We also read every posting in over 50,000 USENET newsgroups, and every one of thousands of SEC Edgar Filings. From these sources, we cull over 1.5 million unique new items every day that must be matched in near-real-time against tens of thousands of user subscriptions. As noted in a previous entry in this blog, we frequently match at rates of almost a quarter trillion matches per day... There are very, very few enterprise applications that would require matching at this scale. Nonetheless, and even though we're still running far below our capacity, we're constantly trying to make the system run faster. We must. If we don't make the system even faster then it is today, we won't be able to take the load once we really start to publicize our system and usage (hopefully) explodes.
As Adam suggests, to meet the tremendous scale requirements that we're building for, we've done our best to keep the system as simple as possible -- but no more simple than it can be. Frankly, the "flow chart" for PubSub.com isn't more than a handfull of boxes. By keeping it simple, we've been able to apply a great deal of effort to optimizing each of the boxes to ensure that it meets the needs of a consumer application -- not the easier met requirements of an enterprise application. I'm personally convinced that if enterprise applications had been our target, we would have given in to the luxury of the limited scale requirements of such systems and today we would have a much more complex design, offering many more "features," and we'd be happily running much more slowly then we do today. If we had designed first for enterprise systems, we would have little hope of ever being able to deliver the kind of service that "mere mortals", in all their vast numbers, will require.
bob wyman
Bob,
Why is it that the word Enterprise instills in people a sense of scale and robustness in delivery of an application environment. People have become well attuned to the "failure" of corporate web servers etc where companies have not "capacity planned" their potential requirements.
It would appear that way too many "Corporate" Application developers have ignored the rules of good software engineering, that should be mandatory for any global information delivery.
This does not rest simply with the simplicity of the transaction, but also with understanding the latencies that may exist within the networks, and the myriad of other interdependencies, many of which are totally out of the control of the developer.
You will recall the "good old days" when at Digital, someone developed the ability to build 96 node clusters. Inherent in that, was building 96 node clusters and pushing them, until they broke.
Many are the businesses that have believed the vendor rhetoric on the scalability of the technology, and bet their business on the "numbers engineering" that has replaced the tried and true approach of understanding the internals, and pushing the edge of the envelope.
The underlying simplicity of design concept also espoused rings a bell that I think you once rang.. The SOFF (Separation of form and function) principle. Smaller and more modular, (and a good object structure) means simpler, and inherently more scalable (well, less inputs into the failure equation...)
The other mistake that seems to be made as a result of the lack of understanding of the "limits", is that things are overscoped. I worked on a DR project for a Telco once. They wanted a STM4 (622 Mbps) site to site link. I pointed out that database updates were already being "mirrored" into the telephone network over three 9600 baud lines. Too often, the corporate sector has no comprehension of their own "core business".
As discussed offline, I have Enterprise application ideas for PubSub technology, that will become dependent on adding in some extra functionality, but I won't even go there, until the "size of the envelope" is determined.
Peter Q
Posted by: Peter Quodling | August 17, 2004 at 21:03