首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于密度泛函理论与自举软缩减法的酒石酸太赫兹光谱特征谱区分析指认
引用本文:唐 鑫,周胜灵,祝诗平,马羚凯,郑 权,普 京.基于密度泛函理论与自举软缩减法的酒石酸太赫兹光谱特征谱区分析指认[J].光谱学与光谱分析,2022,42(9):2740-2745.
作者姓名:唐 鑫  周胜灵  祝诗平  马羚凯  郑 权  普 京
作者单位:西南大学工程技术学院,重庆 402160
基金项目:国家自然科学基金面上项目(31771670,62005227),重庆市自然科学基金面上项目(cstc2020jcyj-msxmX0300)资助
摘    要:太赫兹时域光谱不但包含了样品的化学信息和物理信息,还承载了设备噪声、样品状态、环境参数等多方面的背景信息,其光谱的多元性可能影响模型的性能,降低预测精度。能否在复杂、重叠、变动背景下从光谱数据中提取目标组分的特征信息,去除冗余变量,筛选特征谱区,对太赫兹光谱定量、定性分析至关重要。以L-酒石酸为研究对象,在室温下采集6个浓度:10%,20%,40%,50%,60%和80%,共计342个样本的L-酒石酸太赫兹吸收光谱。利用密度泛函理论(DFT)中的B3LYP方法,基于6-31G*(d,p)基组对L-酒石酸单分子模型进行优化并对其太赫兹频谱特性进行理论模拟计算,分析对应特征波峰的分子振动模式,得到0.2~1.6 THz频段吸收谱。与实测吸收谱进行对比,实验所测结果与理论计算结果对应的吸收峰位置基本吻合。采用自举软缩减法(BOSS)对L-酒石酸的太赫兹吸收谱进行特征谱区筛选,并与竞争性自适应加权采样(CARS)、蒙特卡洛无信息变量消除法(MC-UVE)和间隔区间偏最小二乘法(iPLS)3种经典特征谱区筛选法进行对比,分析结果显示BOSS算法选取的有效谱区与DFT理论计算特征谱区重合度最优。分别使用全谱PLS,CARS-PLS,MC-UVE-PLS,iPLS及BOSS五种算法对L-酒石酸光谱进行建模回归分析,实验结果表明,四种谱区筛选方法相较于全谱PLS模型,预测精度均有所提高,其中BOSS算法预测能力提高最为显著,其交互验证均方根误差(RMSECV)、预测均方根误差(RMSEP)、训练集决定系数(R2train)和测试集决定系数(R2test)分别为0.026 0,0.026 0,0.988 1和0.987 5,相较其他模型有更高的预测精度和模型稳定性,为实现基于太赫兹光谱技术的快速定量检测提供了一种有效的方法。

关 键 词:太赫兹光谱  L-酒石酸  密度泛函  谱区筛选  自举软缩减法  
收稿时间:2021-07-26

Analysis and Identification of Terahertz Tartaric Acid Spectral Characteristic Region Based on Density Functional Theory and Bootstrapping Soft Shrinkage Method
TANG Xin,ZHOU Sheng-ling,ZHU Shi-ping,MA Ling-kai,ZHENG Quan,PU Jing.Analysis and Identification of Terahertz Tartaric Acid Spectral Characteristic Region Based on Density Functional Theory and Bootstrapping Soft Shrinkage Method[J].Spectroscopy and Spectral Analysis,2022,42(9):2740-2745.
Authors:TANG Xin  ZHOU Sheng-ling  ZHU Shi-ping  MA Ling-kai  ZHENG Quan  PU Jing
Institution:College of Engineering and Technology, Southwest University, Chongqing 402160, China
Abstract:Terahertz time-domain spectroscopy contains the chemical and physical information of samples and indicates the background information related to equipment noise, sample status and environmental parameters. Its diversified spectrum may affect the model’s performance and reduce the prediction accuracy. Therefore, extracting the characteristic information of target components, eliminatingredundant variables and screen the characteristic spectrum regions from the spectral data in a complex, overlapping and changing environment is of great significance for the quantitative and qualitative analysis of the terahertz spectrum. This paper collected the THz absorption spectra of 342 L-tartaric acid samples with concentrations of 10%, 20%, 40%, 50%, 60% and 80%. The B3LYP method in density functional theory (DFT) was used to optimize the monolecular model of L-tartaric acid based on 6-31G* (d, p) basis set, and the terahertz spectrum characteristics of the monolecular model were theoretically simulated. The molecular vibration modes corresponding to the characteristic wave peaks were analyzed, and the absorption spectra in the band of 0.2~1.6 THz were obtained. Compared with the measured absorption spectrum, the measured results agree well with the theoretical calculation results. The terahertz absorption spectrum of L-tartaric acid was screened using Bootstrapping soft shrinkage (BOSS). The competitive adaptive weighted sampling (CARS-PLS), Monte Carlo non-informational variable elimination (MC-UVE-PLS) and interval partial least square method (iPLS) were then compared and analyzed to obtain a better feature spectral region identification model. The analysis results indicate that the effective spectrum area obtained by the BOSS algorithm agrees better with the characteristic spectral region calculated by DFT theory. The L-tartaric acid spectrum modeling and regression analysis were conducted using full-spectrum PLS, CARS-PLS, MC-UVE-PLS, iPLS and BOSS algorithms. The experimental results imply that the prediction accuracy of the four spectral region screening methods is improved compared with the full spectrum PLS model. In addition, the prediction ability of the BOSS algorithm is improved most significantly by whose cross-validation root-mean-square error (RMSECV), prediction root-mean-square error (RMSEP), validation set determination coefficient (R2test) and test set determination coefficient (R2train) are 0.026 0, 0.026 0, 0.988 1 and 0.987 5 respectively, with higher prediction accuracy and model stability than other models. Therefore, it is foreseeable that, this study may provide an effective method for rapid and quantitative detection based on terahertz spectroscopy.
Keywords:Terahertz spectrum  L-tartaric acid  Density functional theory  Spectral region  Bootstrapping Soft Shrinkage  
点击此处可从《光谱学与光谱分析》浏览原始摘要信息
点击此处可从《光谱学与光谱分析》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号