首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于UMAP辅助的模糊C聚类方法进行太赫兹光谱识别
引用本文:易灿灿,庹 帅,涂 闪,张文涛.基于UMAP辅助的模糊C聚类方法进行太赫兹光谱识别[J].光谱学与光谱分析,2022,42(9):2694-2701.
作者姓名:易灿灿  庹 帅  涂 闪  张文涛
作者单位:1. 武汉科技大学冶金装备与控制技术教育部重点实验室,湖北 武汉 430081
2. 武汉科技大学湖北省机械传动与制造工程重点实验室,湖北 武汉 430081
3. 武汉科技大学精密制造研究院,湖北 武汉 430081
4. 广西师范大学物理科学与技术学院,广西 桂林 541004
5. 桂林电子科技大学电子工程与自动化学院,广西 桂林 541004
基金项目:国家自然科学基金项目(51805382),广西重点研发计划项目(2020AB44003),广西光电信息处理重点实验室主任基金项目(GD18206)和国家科技重大专项(2017ZX02101007-003)资助
摘    要:太赫兹(THz)具有低能性、瞬态性、波谱分析能力强的优点,在物质鉴别方面具有广阔的应用前景。现有的基于THz的物质鉴别方法,虽然取得了一定的效果,但是存在容易陷入局部最优的问题,从而导致识别精度不高。均匀流形逼近与投影(UMAP)作为一种非线性降维方法,其假设数据均匀分布在黎曼流形上,可以对具有模糊拓扑结构的流形进行建模。UMAP降维的过程是通过最小化两个拓扑表示之间的交叉熵,从而实现低维空间中数据表示的布局优化。传统的模糊C聚类方法(FCM)在聚类时,初始聚类中心往往随机给定,当初始聚类中心选择不恰当时,容易导致错误的聚类。为此,提出一种基于UMAP辅助的模糊C聚类算法,首先运用UMAP对输入的THz样本矩阵进行降维;再根据类与类之间距离最大化的原则,选择合适的初始聚类中心;最后利用模糊C均值聚类的方法进行聚类。所提出的方法不仅能够解决聚类过程中类与类之间过度拥挤的现象,而且能够反映出类别间的距离信息以便于给样本选择合适的初始聚类中心。为了验证提出的聚类方法的可靠性,运用太赫兹时域光谱技术对鲁棉研28、鲁棉研29、鲁棉研36、中棉28四种不同类型的转基因棉花种子进行了探测,利用基于UMAP辅助的模糊C聚类算法对转基因棉花种子的吸光度光谱数据进行聚类分析,成功地将四种不同类型的转基因棉花种子区分开,得到了总正确率为0.983 3的聚类效果,说明提出的基于UMAP辅助的模糊C聚类算法在物质太赫兹光谱识别方面具有良好的应用前景。

关 键 词:太赫兹时域光谱  物质鉴别  转基因棉花种子  UMAP  降维  模糊C聚类  
收稿时间:2021-08-02

UMAP-Assisted Fuzzy C-Clustering Method for Recognition of Terahertz Spectrum
YI Can-can,TUO Shuai,TU Shan,ZHANG Wen-tao.UMAP-Assisted Fuzzy C-Clustering Method for Recognition of Terahertz Spectrum[J].Spectroscopy and Spectral Analysis,2022,42(9):2694-2701.
Authors:YI Can-can  TUO Shuai  TU Shan  ZHANG Wen-tao
Abstract:Terahertz (THz) waves characterized by low energy, instantaneity and proficiency in spectral analysis have a promising futures in material identification. Although the existing substance identification methods based on THz have achieved certain effects, they are prone to fall into local optimization, resulting in low identification accuracy. Uniform manifold Approximation and Projection (UMAP), as a nonlinear dimensionality reduction method, assume that the data are uniformly distributed on Riemannian manifolds, which can be used to model manifolds with fuzzy topology. UMAP dimension reduction is to optimize the layout of data representation in low-dimensional space by minimizing the cross-entropy between two topological representations. The initial clustering centre is often given randomly in the traditional fuzzy C-clustering method (FCM). When the initial clustering center is not selected correctly, it is easy to fall into the local optimum, leading to wrong clustering. To this end, this paper proposes a Uniform Manifold Approximation and Projection (UMAP) assisted fuzzy C-clustering algorithm. Firstly, UMAP is used to reduce the dimensionality of the input THz sample matrix. And then, based on the principle of maximizing the distance between categories, the appropriate initial clustering center is selected. Finally, the fuzzy C-means method is employed to perform the clustering analysis. This proposed algorithm can solve the overcrowding problem between categories in the clustering process and reflect the distance information between categories to facilitate the selection of appropriate initial clustering centers. In order to verify the reliability of the algorithm proposed in this paper, four different types of genetically modified cotton seeds of Lu Mianyan28, Lu Mianyan29, Lu Mianyan36, and Zhongmian28 were detected by using THz time-domain spectroscopy technology. Then, the UMAP-assisted fuzzy C-clustering method was used to cluster the absorbance spectral data of four different types of genetically modified cotton seeds. The different cotton seeds are successfully well separated, and the clustering effect with a total correct rate of 0.9833 is obtained. The result fully demonstrates that the fuzzy C-clustering method based on UMAP-assisted proposed in this paper has a good application prospect in identifying material THz spectrum.
Keywords:Terahertz time-domain spectroscopy  Substance identification  Transgenic cotton seeds  UMAP  Dimensionality reduction  Fuzzy C-clustering method  
点击此处可从《光谱学与光谱分析》浏览原始摘要信息
点击此处可从《光谱学与光谱分析》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号