首页 | 本学科首页   官方微博 | 高级检索  
     

基于相似性度量的改进DBSCAN算法
引用本文:郭艳婕,杨明,侯宇超,孟铭. 基于相似性度量的改进DBSCAN算法[J]. 数学的实践与认识, 2020, 0(6): 164-170
作者姓名:郭艳婕  杨明  侯宇超  孟铭
作者单位:中北大学理院;中北大学信息与通信工程学院
基金项目:国家自然科学基金(61601412,61571404,61471325);中北大学研究生科技立项(20181549)。
摘    要:针对传统DBSCAN算法对高维数据集聚类效果不佳且参数的选取敏感问题,提出一种新的基于相似性度量的改进DBSCAN算法.该算法构造了测地距离和共享最近邻的数据点之间的相似度矩阵,克服欧式距离对高维数据的局限性,更好地刻画数据集的真实情况.通过分析数据的分布特征来自适应确定Eps和MinPts参数.实验结果表明,所提GS-DBSCAN算法能够有效地对复杂分布的数据进行聚类,且在高维数据的聚类准确率高于对比算法,验证了算法的准确性和可行性.

关 键 词:DBSCAN  聚类  测地距离  共享最近邻

An Improved DBSCAN Algorithm Based on Similarity Measures
GUO Yan-jie,YANG Ming,HOU Yu-chao,MENG Ming. An Improved DBSCAN Algorithm Based on Similarity Measures[J]. Mathematics in Practice and Theory, 2020, 0(6): 164-170
Authors:GUO Yan-jie  YANG Ming  HOU Yu-chao  MENG Ming
Affiliation:(School of Science,North University of China,Taiyuan 030051,China;School of Information and Communication Engineering,North University of China,Taiyuan 030051,China)
Abstract:Aiming at the problem that traditional DBSCAN algorithm has poor clustering effect on high-dimensional dataset and sensitive parameter selection,a new improved DBSCAN algorithm based on similarity metric is proposed.The algorithm constructs the similarity matrix between the geodesic distance and the data points sharing the nearest neighbors,overcomes the limitation of Euclidean distance to high-dimensional data,and better describes the real situation of the data set.This paper adaptively determines the Eps and MinPts parameters by analyzing the distribution characteristics of the data.The experimental results show that the proposed GS-DBSCAN algorithm can effectively cluster the complex distributed data,and the clustering accuracy of high-dimensional data is higher than that of the comparison algorithm,which verifies the accuracy and feasibility of the algorithm.
Keywords:DBSCAN  clustering  geodesic distance  shared nearest neighbor
本文献已被 CNKI 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号