首页 | 本学科首页   官方微博 | 高级检索  
     

核映射和Rank-Order距离的局部保持投影相似性度量方法
作者单位:青岛科技大学信息科学技术学院 ,山东 青岛 266061;青岛蓝智现代服务业数字工程技术研究中心 ,山东 青岛 266071;云南中烟工业有限责任公司技术中心 ,云南 昆明 650024
基金项目:国家重点研发计划项目(2018YFB1701704),云南中烟工业有限责任公司项目(2019XX02)资助
摘    要:针对近红外光谱高维、高冗余、非线性和小样本等特点导致光谱相似性度量时出现的“维度灾难”,提出一种基于核映射和rank-order距离的局部保持投影(KRLPP)算法。首先将光谱数据经过核变换映射到更高维空间,有效保证了流形结构的非线性特征。然后改进局部保持投影(LPP)算法对数据进行降维操作,将rank-order距离替代传统的欧氏距离或测地线距离,通过共享邻近点的信息,得到更加准确的局部邻域关系。最后在低维空间通过距离的计算实现光谱的度量。该方法不仅有效解决了高维空间存在的“距离失效”问题,同时还提高了相似性度量结果的精度。为了验证KRLPP算法的有效性,首先根据降维前后数据集信息残差的变化确定了最佳参数近邻点的个数k和降维后的维数d。其次,从光谱降维投影效果和模型分类效果两个角度与PCA,LPP和INLPP算法进行了对比,结果表明KRLPP算法对于烟叶的部位有较好的区分能力,降维效果以及对于不同部位的正确识别率明显优于PCA,LPP和INLPP。最后,从某品牌卷烟叶组配方中选取了5个代表性烟叶作为目标烟叶,分别采用PCA,LPP和KRLPP方法从300个用于配方维护的烟叶样品中为每个目标烟叶寻找相似烟叶,并从化学成分和感官评价两方面对替换前后的烟叶及叶组配方进行了评价分析。其中LPP和KRLPP用于降维的参数选择保持一致,PCA选择前6个主成分。结果表明,由KRLPP选出的替换烟叶与替换配方在总糖、还原糖、总烟碱、总氮等化学成分以及香气、烟气、口感等感官指标上较PCA、LPP方法差异最小,相似性度量准确度最高。该方法可应用于配方产品替换原料的查找,辅助企业实现产品质量的维护。

关 键 词:近红外光谱  局部保持投影算法  核映射  rank-order距离  相似性度量
收稿时间:2020-09-24

Local Preserving Projection Similarity Measure Method Based on Kernel Mapping and Rank-Order Distance
QIN Yu-hua,ZHANG Meng,YANG Ning,SHAN Qiu-fu. Local Preserving Projection Similarity Measure Method Based on Kernel Mapping and Rank-Order Distance[J]. Spectroscopy and Spectral Analysis, 2021, 41(10): 3117-3122. DOI: 10.3964/j.issn.1000-0593(2021)10-3117-06
Authors:QIN Yu-hua  ZHANG Meng  YANG Ning  SHAN Qiu-fu
Affiliation:1. College of Information Science and Technology, Qingdao University of Science and Technology, Qingdao 266061, China2. Qingdao Lanzhi Modern Service Industry Digital Engineering Research Center, Qingdao 266071, China3. China Tobacco Yunnan Industrial Co., Ltd., Technical Research Center, Kunming 650024, China
Abstract:Aiming at the curse of dimensionality problem in measuring spectral similarity caused by the high dimensionality, high redundancy, non-linearity and small samples of the near-infrared spectrum, a local preserving projection algorithm based on kernel mapping and rank-order distance (KRLPP) is proposed in this paper. First, the spectral data is mapped to a higher-dimensional space through a kernel transformation, which effectively ensures the manifold structure’s nonlinear characteristics. Then, the dimensionality of the data is reduced by the locality preserving projections (LPP) algorithm, the rank-order distance is introduced instead of the traditional Euclidean distance or geodesic distance, and a more accurate local neighborhood relationship can be obtained by sharing the information of neighboring points. Finally, the measurement of the spectrum is realized by calculating the distance in low-dimensional space. This method solves the problem of distance failure in high-dimensional space and improves the accuracy of similarity measurement results. In order to verify the effectiveness of the KRLPP algorithm, firstly, the best parameters including the number k of the nearest neighbors and the dimensionality d of the reduced space were determined according to the residuals variation of the dataset before and after dimension reduction. Secondly, it compared with PCA, LPP, and INLPP algorithms from the perspectives of the projection effect of the spectra dimension reduction and the model classification ability. The results show that the KRLPP algorithm has a better ability to distinguish tobacco positions, and the effects of dimension reduction and correct identification of different tobacco positions are significantly better than PCA, LPP and INLPP methods. Finally, five representative tobacco were selected as target tobacco from a certain brand of cigarette formula. At the same time, PCA, LPP and KRLPP methods were used to find similar tobacco for each target tobacco from 300 tobacco samples used for formula maintenance, and the tobacco and cigarette formulas before and after replacement were evaluated from the aspects of chemical composition and sensory. Among them, the parameter selection of LPP and KRLPP for dimensionality reduction is consistent, and 6 principal components were selected for PCA. The results showed that, compared with PCA and LPP methods, the chemical components of total sugar, reducing sugar, total nicotine, total nitrogen and sensory indexes such as aroma, smoke and taste of the replacement tobacco and the replacement formula selected by the KRLPP algorithm had the least difference, and the accuracy of similarity measurement was the highest. This method can be applied to search for alternative raw materials for formula products and assist enterprises in maintaining product quality.
Keywords:Near infrared spectroscopy  Local preservation projection algorithm  Kernel mapping  Rank-order distance  Similarity measure  
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《光谱学与光谱分析》浏览原始摘要信息
点击此处可从《光谱学与光谱分析》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号