首页 | 本学科首页   官方微博 | 高级检索  
     检索      

正交匹配追踪算法的近红外光谱定量分析
引用本文:李四海,刘东玲.正交匹配追踪算法的近红外光谱定量分析[J].光谱学与光谱分析,2021,41(4):1097-1101.
作者姓名:李四海  刘东玲
作者单位:甘肃中医药大学信息工程学院,甘肃 兰州 730000;甘肃中医药大学药学院,甘肃 兰州 730000
基金项目:国家自然科学基金项目(81603407);甘肃省自然科学基金项目(1506RJZA046);兰州市科技计划项目(2018-3-41);甘肃省高校中(藏)药化学与质量研究省级重点实验室开放基金项目(zzy-2018-05)资助。
摘    要:压缩感知(CS)是一种新兴的信号压缩和采样技术,正交匹配追踪(OMP)是一种贪婪追踪算法,广泛用于压缩感知领域中的稀疏信号重构。针对近红外光谱信号高维小样本以及信号稀疏先验的特点,为进一步提高小样本近红外光谱变量选择的灵活性和可靠性,基于压缩感知理论,提出了一种新颖的光谱变量选择方法正交匹配追踪变量选择(OMPBVS)。OMPBVS算法通过对原始光谱信号的稀疏重构,将绝大部分变量的回归系数压缩为0,进而间接实现光谱变量选择。具体过程为以光谱矩阵为传感矩阵,预测变量为观测变量,迭代地计算残差与原子的内积,选择内积最大的原子,在每一步迭代过程中将信号投影到由所有已经被选择原子张成的子空间上,然后对所有被选择原子的系数进行更新,使得产生的残差与已被选择的所有原子都正交,其残差计算的实质是进行Gram-Schmidt正交化,正交投影能够在保证信号重构精度的情况下减小迭代次数。OMPBVS具有将光谱维度降低至样本大小规模的能力,其变量选择能力与LASSO相当,但与LASSO相比,由于OMPBVS损失函数的优化方法是前向选择算法,减少了迭代次数,并且可以精确控制选择变量的数量。分别在beer数据集和Wheat kernels数据集上进行变量选择实验,比较PLS,MCUVE-PLS,CARS-PLS,WMSCVS,LASSOLarsCV和OMPBVS六种变量选择方法的性能。其中beer数据集共60个样本,采用Kennard Stone (KS)方法划分训练集样本36个,测试集样本24个,预测变量为Original extract concentration。Wheat kernels数据集共523个样本,训练集样本415个,测试集样本108个,预测值为蛋白质含量。OMPBVS方法在beer数据集上选择变量个数、RMSEC和RMSEP分别为2,0.205 2和0.159 8,在Wheat kernels数据集上选择变量个数、RMSEC和RMSEP分别为9,0.450 2和0.412 5,其变量选择能力和模型性能均好于其他五种方法,这说明OMPBVS是一种有效的近红外光谱变量选择和定量分析方法。OMPBVS变量选择方法在小样本情况下具有良好的泛化能力,能够减少选择变量的数量,提高变量选择的稳健性。此外,基于SNV和MSC等光谱预处理方法,能够在一定程度上减少选择变量的个数,提高模型的可解释性。

关 键 词:近红外光谱  变量选择  压缩感知  偏最小二乘  正交匹配追踪
收稿时间:2020-03-02

Quantitative Analysis of Near Infrared Spectroscopy Based on Orthogonal Matching Pursuit Algorithm
LI Si-hai,LIU Dong-ling.Quantitative Analysis of Near Infrared Spectroscopy Based on Orthogonal Matching Pursuit Algorithm[J].Spectroscopy and Spectral Analysis,2021,41(4):1097-1101.
Authors:LI Si-hai  LIU Dong-ling
Institution:1. College of Information Engineering, Gansu University of Chinese Medicine, Lanzhou 730000, China 2. School of Pharmacy, Gansu University of Chinese Medicine, Lanzhou 730000, China
Abstract:Compressed sensing(CS)is a new technology of signal compression and sampling.Orthogonal Matching Pursuit(OMP),a greedy tracking algorithm,is widely used in sparse signal reconstruction in the compressed sensing field.In connection with the characteristics of high-dimensional small samples of near-infrared spectra signals and sparse prior signals,a novel near-infrared spectra variable selection method named Orthogonal Matching Pursuit Based Variable Selection(OMPBVS)is proposed,based on the compressed sensing theory,to further improve the flexibility and reliability of near-infrared spectra variable selection.By sparse reconstruction of the original spectral signal,OMPBVS can compress the regression coefficient of most variables to zero,and then indirectly realize the selection of spectral variables.In the specific process,the spectral matrix is adopted as the sensing matrix,the predictive variable as the observation variable and iteratively calculated residual and the inner product of the atom,and the inner product of the largest atom is chosen.During each iteration,the signal is projected onto the subspace spanned by all selected atoms,and then the coefficients are updated for all the selected atoms,enabling the residual error and all the selected atoms to be orthogonal.With the residual calculation to be the essence of Grammar-Schmidt Orthogonalization,the orthogonal projection can reduce the number of iterations and ensure the accuracy of signal reconstruction.OMPBVS can reduce the spectral dimension to the sample size scale,and its variable selection capability is comparable to LASSO.However,compared with LASSO,the optimization method of OMPBVS loss function is a forward selection algorithm,which reduces the number of iterations and can precisely control the number of selected variables.Variable selection experiments were performed on the beer dataset and Wheat kernels dataset to compare the performance of six variable selection methods:PLS,MCUVE,CARS,WMSCVS,LASSOLarsCV,and OMPBVS.There were 60 samples in the beer dataset,36 samples of the training set and 24 samples of the test set were divided by Kennard Stone(KS)method,and the prediction variable was Original extract concentration.The Wheat kernels data set consisted of 523 samples,415 training samples,and 108 test samples.The predicted value was protein content.The OMPBVS method selects the number of variables,RMSEC and RMSEP from the beer dataset as 2,0.2052 and 0.1598,respectively.When on the Wheat kernels data set,the number of selected variables,RMSEC and RMSEP were 9,0.4502,and 0.4125,respectively,and the variable selection ability and model performance was better than the other five methods,indicating that OMPBVS is an effective NIR spectral variable selection and quantitative analysis method.OMPBVS variable selection method has good generalization ability in the case of small samples,which can reduce the number of selected variables and improve the robustness of variable selection.Besides,spectral preprocessing methods based on SNV and MSC can reduce the number of selected variables to a certain extent and improve the interpretability of the model.
Keywords:Near infrared spectroscopy  Variable selection  Compressed sensing  Partial Least squares  Orthogonal matching pursuit
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《光谱学与光谱分析》浏览原始摘要信息
点击此处可从《光谱学与光谱分析》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号