Questions have recently been raised about the state of the blogosphere's pinging infrastructure. I find the recent discussions to be a bit light on data... So, I did a quick analysis of the pings that we receive daily at PubSub.com to provide some context and data for further discussions.
As would be expected, when one plots the number of pingers against a ranked ordering of the number of pings received per pinger per day, what results is a power-law curve. (click on the top chart at the right for more detail.) This particular chart is based on an analysis of 1,171,122 pings received from 468,795 pingers. In this sample, 313,340 pingers only pinged once while the most frequent pinger pinged 7,340 times.
The average number of pings per pinger was about 2.5. However, as the chart at the right demonstrates clearly, that average is heavily influenced by a large number of sites that pinged very frequently. The majority of sites (67%) only pinged once during this sample. These "single pingers" generated about 27% of all pings received. 99% of all pingers pinged 18 or fewer times while generating only 67% of all pings. The remaining 33% of pings were generated by only 1% of all pingers -- many of them "spingers"...
A comparision of the number of actual feed "updates" to the number of pings received from the top 1% of pingers indicates that very, very few of those in the top 1% actually generate anywhere near the number of new or updated entries that their pinging would indicate. Some exceptions include pinging newspaper sites that do generate an unusual number of updates during the day. On inspection, the vast majority of "fast pingers" turn out to be sploggers or spammers, "legitimate" sites that appear to ping from timer-driven scripts and sites that appear to incorrectly generate a ping whenever a comment is entered on an entry in their blogs.
I believe that we'll see the proportion of "fast pingers" and "fake pingers" reduced in the future as spammers begin to realize that fast-pinging is not in their interests. Spammers and Sploggers are beginning to learn that pinging too frequently makes them stick out in the logs of search services. We can expect that in the future they will seek to make themselves less visible and thus more difficult to filter out. Also, we'll probably see fewer timer-driving pingers in the future as the exisiting offenders are contacted one-by-one by various search services and as people tend to use less home-grown technology to do blogging and feed generation (i.e. more folk are moving to the common platforms like WordPress, TypePad, etc. that ping properly.)
In summary: The existing "ping infrastructure" is not broken. The frequency of pings per pinger tends to follow the power-law that we would expect in such an environment (R-squared=.92) Yes, there are some fast pingers out there, however, since we've got a power-law curve they are expected. Additionally, it turns out that fast-pinging helps us to identify some spammers and sploggers. Thus, the high end of the scale is likely to be pulled back somewhat over time while not changing the actual curve or slope of the curve too much.
bob wyman