首页 | 本学科首页   官方微博 | 高级检索  
     检索      

分子构象聚类中的K-means密度峰值搜索算法
引用本文:王贵艳,付 婷,任 虹,许佩军,郭秋含,牟晓红,李 焱,李国辉.分子构象聚类中的K-means密度峰值搜索算法[J].化学物理学报,2022(2):353-368.
作者姓名:王贵艳  付 婷  任 虹  许佩军  郭秋含  牟晓红  李 焱  李国辉
作者单位:大连海洋大学,大连 116029;大连大学附属中山医院,大连 116001;中国科学院大连化学物理研究所分子反应动力学国家重点实验室,大连 116023;航天中心医院眼科,北京 100049;辽宁师范大学,大连 116023
摘    要:分子构象的聚类是搜索分子动力学模拟轨迹中代表构象的主要方法。 它是分析复杂构象改变或分子间相互作用机制的关键步骤. 作为一种基于密度的聚类算法,密度峰值搜索算法因其聚类的准确度而被应用于分子聚类过程中. 但随着模拟时长的增长,密度峰值搜索算法较低的计算效率限制了其应用的可能. 本文提出K-means密度峰值搜索算法的聚类算法,它是密度峰值搜索算法在计算效率方面的一个扩展版本,用于解决密度峰值搜索算法中巨大的资源消耗问题. 在K-means密度峰值搜索算法中,首先,通过高效的聚类算法(例如K-means)进行初始聚类,得到的聚类中心被定义为具有权重的典型点. 然后,对加权的典型点通过密度峰值搜索算法实现二次聚类,并细化点为核心点、边界点、加细光晕点. 在与密度峰值搜索算法具有相似的精度的同时,计算复杂度由O(n2)降至O(n). 通过二面角,二级结构,关联图描述的分子构象,将KFDP用于多个模拟轨迹的聚类过程中. 并通过与K-means聚类算法,DBSCAN聚类算法的比较结果,验证了K-means密度峰值搜索算法的优势.

关 键 词:K-means  密度峰值搜索    分子聚类    DBSCAN
收稿时间:2021/11/30 0:00:00

K-means Find Density Peaks in Molecular Conformation Clustering
Guiyan Wang,Ting Fu,Hong Ren,Peijun Xu,Qiuhan Guo,Xiaohong Mou,Yan Li,Guohui Li.K-means Find Density Peaks in Molecular Conformation Clustering[J].Chinese Journal of Chemical Physics,2022(2):353-368.
Authors:Guiyan Wang  Ting Fu  Hong Ren  Peijun Xu  Qiuhan Guo  Xiaohong Mou  Yan Li  Guohui Li
Abstract:Performing cluster analysis on molecular conformation is an important way to find the representative conformation in the molecular dynamics trajectories. Usually, it is a critical step for interpreting complex conformational changes or interaction mechanisms. As one of the density-based clustering algorithms, find density peaks (FDP) is an accurate and reasonable candidate for the molecular conformation clustering. However, facing the rapidly increasing simulation length due to the increase in computing power, the low computing efficiency of FDP limits its application potential. Here we propose a marginal extension to FDP named K-means find density peaks (KFDP) to solve the mass source consuming problem. In KFDP, the points are initially clustered by a high efficiency clustering algorithm, such as K-means. Cluster centers are defined as typical points with a weight which represents the cluster size. Then, the weighted typical points are clustered again by FDP, and then are refined as core, boundary, and redefined halo points. In this way, KFDP has comparable accuracy as FDP but its computational complexity is reduced from O(n2) to O(n). We apply and test our KFDP method to the trajectory data of multiple small proteins in terms of torsion angle, secondary structure or contact map. The comparing results with K-means and density-based spatial clustering of applications with noise show the validation of the proposed KFDP.
Keywords:K-means find density peaks  Molecular clustering  Density-based spatial clustering of applications with noise
点击此处可从《化学物理学报》浏览原始摘要信息
点击此处可从《化学物理学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号