首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于XGBoost的糖尿病血液拉曼光谱定量分析法
引用本文:王铭萱,王巧云,骈斐斐,单鹏,李志刚,马振鹤.基于XGBoost的糖尿病血液拉曼光谱定量分析法[J].光谱学与光谱分析,2022,42(6):1721-1727.
作者姓名:王铭萱  王巧云  骈斐斐  单鹏  李志刚  马振鹤
作者单位:东北大学信息科学与工程学院,辽宁 沈阳 110819
基金项目:国家自然科学基金项目(11404054,61601104);;河北省自然科学基金项目(F2019501025,F2020501040,F2017501052);;中央高校基本科研业务费专项资金项目(N172304032,2020GFYD026)资助;
摘    要:血液中包含着大量的生物信息,如激素、酶、血糖等成分,而血糖偏高将引发糖尿病。糖尿病有很多并发症,比如脑梗塞,脑出血,肾脏损害,眼底损害,周围神经病变等一系列疾病。目前,血液常规成分检测分析周期较长,结果反馈较慢,难以实现快速连续检测。光学检测技术能够根据待测物质的光谱鉴别物质化学成分和相对含量,因其灵敏度高、适用性强、分析速度快等优势,在血液无创检测领域逐渐发挥其优势。随着激光技术的不断进步,拉曼光谱技术作为一种非线性散射光谱技术,在血液检测技术中得到了广泛应用。为提高拉曼光谱的预测精度,首次将XGBoost算法应用到拉曼光谱血液血糖浓度中进行预测精度的提升。实验中106组血液样本及试验标准值为河北省秦皇岛市第一医院提供,选用布鲁克的MultiRAM光谱仪进行血液的拉曼光谱数据测量,实验中1 064 nm激发光源功率为400 mW,光谱分辨率为6 cm-1,扫描速率为10 kHz,扫描范围为400~4 000 cm-1,对每个样本重复采集10次并计算平均值作为原始光谱数据,以保证实验的准确性和可重复性。该方法无需对数据进行预处理,首先将光谱数据随机划分为训练集和测试集,比例为7∶3,训练集用于训练模型并确定模型参数,测试集用于测试模型的稳定性和预测精度。建立XGBoost模型后,用网格搜索法和k折交叉验证优化模型参数;引入模型评估指标和克拉克网格误差分析图对XGBoost模型血糖浓度的预测进行分析;最后将XGBoost模型与决策树(DT)、随机森林(RF)和支持向量机回归(SVR)模型进行对比。实验结果表明通过XGBoost建立的定量回归模型效果最佳,模型的决定系数为0.999 99,校正集均方误差为0.007 49,预测集均方误差为0.007 17,相对分析误差为331.973 18,预测点均落在克拉克网格误差分析图的A区。结果证明,将XGBoost算法应用到拉曼光谱血液成分定量分析中具有较高的预测精度,并且数据未经过预处理,可以有效缩短程序运行时间,其在拉曼光谱以及近红外光谱定量分析领域具有广阔的发展前景。

关 键 词:XGBoost  拉曼光谱  血糖  定量回归  
收稿时间:2021-05-26

Quantitative Analysis of Diabetic Blood Raman Spectroscopy Based on XGBoost
WANG Ming-xuan,WANG Qiao-yun,PIAN Fei-fei,SHAN Peng,LI Zhi-gang,MA Zhen-he.Quantitative Analysis of Diabetic Blood Raman Spectroscopy Based on XGBoost[J].Spectroscopy and Spectral Analysis,2022,42(6):1721-1727.
Authors:WANG Ming-xuan  WANG Qiao-yun  PIAN Fei-fei  SHAN Peng  LI Zhi-gang  MA Zhen-he
Institution:College of Information Science and Engineering, Northeastern University, Shenyang 110819, China
Abstract:The blood contains many biological information, such as hormones, enzymes, blood sugar and other components. High blood sugar will cause diabetes, which has many complications, such as cerebral infarction, cerebral hemorrhage, kidney damage, fundus damage, peripheral neuropathy and a series of diseases.At present, the routine blood component detection and analysis cycle are too long, the resulting feedback is slow, and it is not easy to achieve rapid and continuous detection. Optical detection technology can identify the chemical composition and relative content of the substance according to the spectrum of the substance to be tested. Because of its advantages, such as high sensitivity, strong applicability, and fast analysis speed, it gradually exerts its advantages in blood non-invasive detection. With the continuous advancement of laser technology, Raman spectroscopy technology, as a nonlinear scattering spectroscopy technology, has been widely used in blood detection technology. In order to improve the prediction accuracy of Raman spectroscopy in this paper, the XGBoost algorithm was firstly applied to the blood glucose concentration of Raman spectroscopy to improve the prediction accuracy. 106 sets of experimental blood samples and real concentrations were provided by the First Hospital of Qinhuangdao City, Hebei Province. Bruker’s Multi RAM spectrometer was used to measure blood Raman spectroscopy data. In the experiment, the power of the 1 064 nm excitation light source was 400 mW, the spectral resolution was 6 cm-1, the scanning rate was 10 kHz, and the scanning range was 400~4 000 cm-1. Each sample is collected 10 times, and the average value is calculated as the original spectrum to ensure the accuracy and repeatability of the experiment. In this paper, the method does not require preprocessing of the data. Firstly, the spectral data were randomly divided into a training and test sets with a ratio of 7∶3. The training set was used to train the model and determine the model parameters. The test set was used to verify the stability and prediction accuracy of the model. Then, the XGBoost model was established, and grid search and k-fold cross-validation were used to optimize the model parameters. We adopted model evaluation indicators and a Clark grid error analysis chart to analyze the prediction of blood glucose concentration of the XGBoost model. Finally, the XGBoost model was compared with Decision Tree (DT), Random Forest (RF) and Support Vector Machine Regression (SVR) models.The experimental results showed that the quantitative regression model established by XGBoost had the best effect. The model’s coefficient of determination was 0.999 99, the mean square error of the calibration set was 0.007 49, the mean square error of the prediction set was 0.007 17, and the relative analysis error was 331.973 18. The prediction points fell in area A of the Clark grid error analysis chart. The results prove that the application of the XGBoost algorithm to the quantitative analysis of blood components in Raman spectroscopy has high prediction accuracy, and the data is not pre-processed, which can effectively shorten the program’s running time. It has broad development prospects in Raman spectroscopy and near-infrared spectroscopy quantitative analysis.
Keywords:XGBoost  Raman spectroscopy  Blood glucose  Quantitative regression  
本文献已被 万方数据 等数据库收录!
点击此处可从《光谱学与光谱分析》浏览原始摘要信息
点击此处可从《光谱学与光谱分析》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号