首页 | 本学科首页   官方微博 | 高级检索  
     


Order statistics and estimating cardinalities of massive data sets
Authors:Fré    ric Giroire
Affiliation:ALGO project, INRIA Rocquencourt, B.P. 105, 78153 Le Chesnay Cedex, France MASCOTTE, joint project CNRS-INRIA-UNSA, 2004 Routes des Lucioles, BP 93, F-06902, France
Abstract:
A new class of algorithms to estimate the cardinality of very large multisets using constant memory and doing only one pass on the data is introduced here. It is based on order statistics rather than on bit patterns in binary representations of numbers. Three families of estimators are analyzed. They attain a standard error of View the MathML source using M units of storage, which places them in the same class as the best known algorithms so far. The algorithms have a very simple internal loop, which gives them an advantage in terms of processing speed. For instance, a memory of only 12 kB and only few seconds are sufficient to process a multiset with several million elements and to build an estimate with accuracy of order 2 percent. The algorithms are validated both by mathematical analysis and by experimentations on real internet traffic.
Keywords:Cardinality estimates   Algorithm analysis   Very large multisets   Traffic analysis
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号