Title
Flexible approximate counting
Abstract
Approximate counting [18] is useful for data stream and database summarization. It can help in many settings that allow only one pass over the data, want low memory usage, and can accept some relative error. Approximate counters use fewer bits; we focus on 8-bits but our results are general. These small counters represent a sparse sequence of larger numbers. Counters are incremented probabilistically based on the spacing between the numbers they represent. Our contributions are a customized distribution of counter values and efficient strategies for deciding when to increment them. At run-time, users may independently select the spacing (accuracy) of the approximate counter for small, medium, and large values. We allow the user to select the maximum number to count up to, and our algorithm will select the exponential base of the spacing. These provide additional flexibility over both classic and Csűrös's [4] floating-point approximate counting. These provide additional structure, a useful schema for users, over Kruskal and Greenberg [13]. We describe two new and efficient strategies for incrementing approximate counters: use a deterministic countdown or sample from a geometric distribution. In Csűrös all increments are powers of two, so random bits rather than full random numbers can be used. We also provide the option to use powers-of-two but retain flexibility. We show when each strategy is fastest in our implementation.
Year
DOI
Venue
2011
10.1145/2076623.2076655
IDEAS
Keywords
Field
DocType
additional flexibility,floating-point approximate counting,counter value,additional structure,flexible approximate counting,approximate counting,full random number,efficient strategy,data stream,customized distribution,approximate counter,count data,geometric distribution,floating point,relative error,implementation
Countdown,Data mining,Automatic summarization,Exponential function,Data stream,Computer science,Algorithm,Geometric distribution,Schema (psychology),Database,Approximation error,Kruskal's algorithm
Conference
Citations 
PageRank 
References 
0
0.34
15
Authors
2
Name
Order
Citations
PageRank
Scott A. Mitchell146177.77
David M. Day2324.68