首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于LSH的高维大数据k近邻搜索算法
引用本文:王忠伟,陈叶芳,钱江波,陈华辉.基于LSH的高维大数据k近邻搜索算法[J].电子学报,2016,44(4):906-912.
作者姓名:王忠伟  陈叶芳  钱江波  陈华辉
作者单位:宁波大学信息科学与工程学院, 浙江宁波 315211
基金项目:国家自然科学基金(No.61472194,No.61572266);浙江省自然科学基金(No.LY13F020040);宁波市自然科学基金(No.2014A610023);“信息与通信工程”浙江省重中之重学科开放基金
摘    要:局部敏感哈希(LSH)及其变体是解决高维数据k近邻(kNN)搜索的有效算法.但是,随着数据规模的日趋庞大,传统的集中式LSH算法结构已经不能够满足大数据时代的需求.本文分析传统LSH方案的不足之处,拓展AND-OR结构,提出通过索引而不比较原始数据直接实现高维大数据k近邻搜索算法C2SLSH.理论分析和实验证明,C2SLSH在分布式平台下具有稳定的可扩展性,在保证同等精确率的情况下,处理速度大约是现有方法的3倍.

关 键 词:高维数据k近邻  局部敏感哈希  MapReduce  冲突计数排序  
收稿时间:2014-12-24

LSH-Based Algorith m for k Nearest Neighbor Search on Big Data
WANG Zhong-wei,CHEN Ye-fang,QIAN Jiang-bo,CHEN Hua-hui.LSH-Based Algorith m for k Nearest Neighbor Search on Big Data[J].Acta Electronica Sinica,2016,44(4):906-912.
Authors:WANG Zhong-wei  CHEN Ye-fang  QIAN Jiang-bo  CHEN Hua-hui
Institution:Faculty of Electrical Engineering and Computer Science, Ningbo University, Ningbo, Zhejiang 315211, China
Abstract:The locality sensitive hashing (LSH)and its variants are efficient algorithms to solve the k nearest neigh-bor (kNN)search problems on high-dimensional data.However,with the increase of large data size,the traditional central-ized LSH algorithms cannot meet the challenge of the big data era.Based on a new AND-OR construction,this paper propo-ses an algorithm (called C2SLSH)for the k nearest neighbor search on big data.Different to the traditional algorithms,the C2SLSH can directly get the results from an index without having to compare the original data.The theoretical analysis and experimental results show that the algorithm has stable scalability on a distributed platform.Furthermore,it is faster than the conventional methods for about three times with the same accuracy rate.
Keywords:kNN  LSH  MapReduce  collision counting sorting
本文献已被 万方数据 等数据库收录!
点击此处可从《电子学报》浏览原始摘要信息
点击此处可从《电子学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号