Cliff Gerrish writes on his Echovar blog:
The consumption strategy that makes the instant messaging model of Twitter work is to follow a core group and then track keywords of interest. Tracking keywords adds people you don’t follow into your stream and provides a proper level of noise and negative feedback into the information ecosystem. ... It’s tracking that makes a decentralized Twitter nearly impossible. ... This feature radically changes the shape of the social graph underlying the information stream. Since you don’t know who might use a tag you’re tracking, the regular RSS style contract around publication and subscription doesn’t work.
Gerrish's claim that tracking makes it impossible to decentralize Twitter seems to have convinced Steve Gilmore of Techcrunch who writes: "Decentralizing Twitter is unnecessary, if not impractical." Fortunately, both Cliff and Steve are wrong.
Distributed tracking of Twitter-like streams is easily accomplished using what are now well-known systems for distributed publish/subscribe. Certainly, it is easier to implement tracking if you have everything going through a single choke-point in the network, but it isn't necessary. In fact, as long ago as the 80's we had USENET based systems that handled the distributed fan-out of messages (news posts) that were then "matched" against user's local subscriptions (Yes, matching was normally trivial "topic-based" matching, however, "content-based" matching systems were often deployed locally). What we could do in the 80's we can do today -- but do it better. After all, we've learned a great deal since then.
Today, Twitter uses XMPP for both input and output. Via XMPP, Twitter offers topic-based "following" of tweets by pre-specified users and it offers content-based "tracking" of keywords and short phrases across all public tweets. Additionally, Twitter provides an experimental XMPP PubSub (XEP-0060) feed of all public tweets. Thus, almost all of what is needed to build a distributed Twitter is already in place. (Also, all that one needs to offer a "competitive" tracking offering is offered since anyone can get easy access to the full stream of public tweets.)
If Twitter thought that decentralizing their service was a good thing, then, as long as decentralization only dealt with "public" tweets and as long as they were willing to start with a crude system (to be cleaned up later), they would only have to do two things that they haven't already done:
- Allow their users to "follow" Twitterers (?) that
are hosted at other XMPP servers. (i.e. instead of following user "foo"
on Twitter, you might follow "foo@jabber.org".
- Accept a stream of "public tweets" generated by other XMPP hosts. This would allow Twitter to implement both topic-based "following" and content-based "tracking" of tweets generated off Twitter.
Of course, more elegant and efficient systems can be designed. But, a basic system is pretty easy to build. In fact, Twitter has already built most of it. However, even though the engineering delta might be small, the key thing that will probably keep Twitter centralized is Twitter's business plan -- whatever that might be... they probably see some advantage to keeping things centralized. Anyway, additional levels of capability would come from doing things like writing a quick "following" XEP (XEP= XMPP Extension Protocol) to permit the exchange of "follow" information between services (different than presence subscriptions). This "Following XEP" would enable directed messages and support private tweets. Once the following XEP was in place, XMPP services would probably adopt the convention that "tweets" should be sent to a standard local JID: "twitter".
Of course, not all XMPP servers would have the capability to do things like content-based tracking of tweets at high volume. But, this is the case with everything on the web these days. Heck, anyone can build a search engine if only they pick up a copy of Lucene, but building a search engine as powerful as Google's or Yahoo!'s is a very different problem. Thus, even if we had a distributed Twitter, it is likely that actual users of Twitter.com would get better service than the users of most other distributed Twitter services. Even over time, there would probably only be a small number of (well financed) players who could usefully support functions like tracking. Thus, we'd move from a Twitter monopoly to an oligopoly. Even so, it would be an improvement.
Distributed Twitter isn't hard -- it only requires the support of Twitter and some powerful partners.
bob wyman
Hey Bob,
thanks for the excellent post. Obviously, one of the major hurdles (beyond just a total lack of time) to getting a federated PubSub going at Twitter was the business model question, but I think (can't / don't speak for Twitter here) they get it; certainly my conversations at the time indicated as much.
Anyhow, I'm planning on devoting at least the next year to seeing services (re: not necessarily micro-blogging) XMPP enabled, and thus real-time and federated. I definitely agree with your comments over on Cliff's blog re: handling thousands of Tweets per second (at least, thousands of deliveries per second, which obviously isn't the same as incoming messages). The scaling challenge, of course, remains even in the federated model --- presenting archival views isn't easy. ;-)
Posted by: Blaine Cook | May 13, 2008 at 14:51
As far as the recent twitter outages, I can't help but wonder how well they would have faired if Google App Engine had been around for them to build on.
Posted by: Greg Clinton | May 17, 2008 at 23:33
ya could just write a protocol from scratch, easier and more efficient than piggybacking on XMPP or HTTP. too bad all the real good protocol designers died in the late 90's
Posted by: Dankoozy | April 06, 2009 at 15:56