首页 | 本学科首页   官方微博 | 高级检索  
     检索      

近似聚集查询中Congressional Samples算法的优化研究
引用本文:胡文瑜,刘建华,张柏礼.近似聚集查询中Congressional Samples算法的优化研究[J].数学的实践与认识,2013,43(8).
作者姓名:胡文瑜  刘建华  张柏礼
作者单位:1. 福建工程学院计算机与信息科学系,福建福州350108;东南大学计算机科学与工程学院,江苏南京210096
2. 福建工程学院计算机与信息科学系,福建福州,350108
3. 东南大学计算机科学与工程学院,江苏南京,210096
基金项目:国家自然科学基金,福建省自然科学基金,福建工程学院科研启动基金
摘    要:取样是一种通用有效的近似技术,利用取样技术进行近似聚集查询处理是决策支持系统和数据挖掘工具中的常用方法,如何正确有效地给出近似查询结果并最小化近似查询误差是查询处理的关键和目标.在对应用于近似聚集查询的代表性取样方法Congressional Samples(国会取样)深入研究的基础上,指出其存在的不足和应用的局限,提出了一个优化的Congressional Samples取样方法:OptCongress算法,算法在组数据内部存在高方差分布时能克服原算法简单均匀取样的不足,提高了近似聚集查询的质量,同时改进了原算法的各组取样数分配算法,克服了原分配算法缺乏严格的公式描述,难以进行理论评估的不足.最后,通过实验比较验证了该优化算法的有效性和正确性.

关 键 词:数据挖掘  近似聚集查询  取样  国会取样

Optimized Congressional Samples for Approximate Aggregation Queries
HU Wen-yu , LIU Jian-hua , ZHANG Bai-li.Optimized Congressional Samples for Approximate Aggregation Queries[J].Mathematics in Practice and Theory,2013,43(8).
Authors:HU Wen-yu  LIU Jian-hua  ZHANG Bai-li
Abstract:Sampling is an efficient and most widely-used approximation technique.Its ability to approximately answer aggregation queries accurately and efficiently is of great benefit to decision support and data mining tools.Congressional Samples is a representative and influential sampling algorithm used in approximate aggregation queries,but it is sub-optimal in some scenario.OptCongress presented by us is an optimization of Congressional Samples. OptCongress proposes a new samples allocation algorithm which tries to minimize the MSE of the expected query distribution.The lack of a rigorous problem formulation leads to solutions that are difficult to evaluate theoretically,existing in original Congressional Samples,were overcome with OptCongress.Meanwhile,the problem of ignoring the variance in the data distribution of the aggregated column(s) is treated that approximation errors were significantly reduced compared to original Congressional Samples.Finally,a set of experiments on the modified TPC-H database demonstrate the correctness and effectiveness of the technique proposed.
Keywords:data mining  approximate aggregation queries  sampling  Congressional Samples
本文献已被 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号