Counting is a funny business. Some tribal languages count just one, two and then many. For most of our history we had to live with human counts, with all their concomitant problems. Fortunately at the height of the industrial revolution, a few people hit on the idea of using a machine to tabulate data. IBM went on to pioneer many concepts behind modern computing. Today we have very powerful computers that are, among other things, very good at counting.

However, the rise of massive data streams has complicated our calculus. What happens when we need to count through millions of packets streaming through an Ethernet switch? Or in Shareaholic’s case, how do we compute the number of page views a web page receives over a stream that comprises many thousands of URLs per second? Maintaining a count of every item exactly is very expensive.

We solve this problem with a count-min sketch.

The information published on this blog is free for your use with appropriate attribution to Shareaholic. We welcome your feedback, suggestions, and questions. Please contact us. Additionally, all photos that appear on this site are copyrighted by their respective owners. If you own the rights to any of the images and do not wish for them to appear here please contact us and the images will be promptly removed.

We’re always looking for fresh content! Apply to be a guest writer.