首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于偏最小二乘法判别分析与随机森林算法的牛肝菌种类鉴别
引用本文:陈凤霞,杨天伟,李杰庆,刘鸿高,范茂攀,王元忠.基于偏最小二乘法判别分析与随机森林算法的牛肝菌种类鉴别[J].光谱学与光谱分析,2022,42(2):549-554.
作者姓名:陈凤霞  杨天伟  李杰庆  刘鸿高  范茂攀  王元忠
作者单位:1. 云南农业大学资源与环境学院,云南 昆明 650201
2. 云南省热带作物科学研究所,云南 景洪 666100
3. 云南农业大学农学与生物技术学院,云南 昆明 650201
4. 云南省农业科学院药用植物研究所,云南 昆明 650200
基金项目:国家自然科学基金地区项目(31660591);
摘    要:牛肝菌作为一种著名的野生食用菌,具有较高的食用价值和经济价值。牛肝菌种类繁多,不易区分,建立一种有效、快速、可信的种类鉴别技术,可为牛肝菌提高品质提供一种方法。本研究采集云南不同地区7种野生牛肝菌共计683株,获取样品中红外光谱和紫外光谱,分析不同种类牛肝菌平均光谱图特征。基于多种预处理组合(SNV+SG,2D+MSC+SNV,1D+MSC+SNV+SG,MSC+2D)的单一光谱数据结合两种特征值提取法(PCA,LVs)构建了偏最小二乘法判别分析与随机森林算法并结合数据融合策略对牛肝菌进行种类鉴别,有一定的创新性。结果表明:(1)中红外光谱和紫外光谱的不同种类牛肝菌平均光谱吸收峰差异较小,吸光度具有细微差异。(2)合适的预处理可提高光谱数据信息,偏最小二乘法判别分析和随机森林算法模型的中红外光谱数据和紫外光谱数据最佳预处理组合为2D+MSC+SNV,SNV+SG,2D+MSC+SNV,1D+MSC+SNV+SG。(3)单一光谱模型中,中红外光谱模型优于紫外光谱模型,中红外光谱最佳预处理组合2D+MSC+SNV的偏最小二乘法判别分析模型正确率训练集99.78%,验证集99.12%;随机森林模型正确率训练集93.20%,验证集99%。(4)数据融合策略提高了分类正确率,低级融合的偏最小二乘法判别分析模型训练集和验证集正确率为100%,99.12%。随机森林模型训练集和验证集正确率为92.32%,99.14%。(5)随机森林算法中级数据融合Latent variable(LVs)正确率为训练集92.76%,验证集96.04%;中级数据融合Principal components analysis(CPA)正确率为训练集97.15%,验证集100%。(6)偏最小二乘法判别分析中级数据融合(LVs)正确率为训练集100%,验证集99.56%;中级数据融合(CPA)训练集和验证集正确率均能达到100%。基于偏最小二乘法判别分析和随机森林算法结合数据融合策略对牛肝菌进行种类鉴别,鉴别效果理想。偏最小二乘法判别分析中级数据融合(CPA)可作为一种低成本高效率的牛肝菌种类鉴别技术。

关 键 词:牛肝菌  中红外光谱  紫外光谱  偏最小二乘法判别分析  随机森林  数据融合  
收稿时间:2021-01-04

Identification of Boletus Species Based on Discriminant Analysis of Partial Least Squares and Random Forest Algorithm
CHEN Feng-xia,YANG Tian-wei,LI Jie-qing,LIU Hong-gao,FAN Mao-pan,WANG Yuan-zhong.Identification of Boletus Species Based on Discriminant Analysis of Partial Least Squares and Random Forest Algorithm[J].Spectroscopy and Spectral Analysis,2022,42(2):549-554.
Authors:CHEN Feng-xia  YANG Tian-wei  LI Jie-qing  LIU Hong-gao  FAN Mao-pan  WANG Yuan-zhong
Institution:1. College of Resources and Environmental Sciences, Yunnan Agricultural University, Kunming 650201, China 2. Yunnan Institute for Tropical Crops Research, Jinghong 666100, China 3. College of Agronomy and Biotechnology, Yunnan Agricultural University, Kunming 650201, China 4. Institute of Medicinal Plants, Yunnan Academy of Agricultural Sciences, Kunming 650200, China
Abstract:As a famous wild edible mushroom, boletus has great edible and economic value. There are many kinds of boletus, and it is not easy to distinguish. An effective, rapid and credible species identification technology can be established to improve the quality of boletus.In this study, a total of 683 strains of 7 species of wild bolete from different regions of Yunnan were collected, the infrared and ultraviolet spectra of the samples were obtained, and the average spectral characteristics of different kinds of bolete were analyzed. Based on the single spectral data of multiple preprocessing combinations (SNV+SG, 2D+MSC+SNV, 1D+MSC+SNV+SG, MSC+2D) combined with two feature value extraction methods (PCA, LVs), the partial least squares discrimination analysis and random forest algorithm combined with data fusion strategy to identify the species of boletus.There is a certain degree of innovation. The results show: (1) The average spectral absorption peaks of different types of boletus in the mid-infrared spectrum and the ultraviolet spectrum have small differences, and the absorbance has subtle differences. (2) Appropriate preprocessing can improve spectral data information. The best preprocessing combination of mid-infrared spectral data and ultraviolet spectral data for partial least square discriminant analysis and random forest algorithm model is 2D+MSC+SNV, SNV+SG, 2D +MSC+SNV, 1D+MSC+SNV+SG. (3) The mid-infrared spectroscopy model is better than the ultraviolet spectroscopy model in the single spectrum model. The partial least squares discriminant analysis model of the best preprocessing combination of mid-infrared spectroscopy 2D+MSC+SNV has a correct rate of 99.78% in the training set and 99.12% in the validation set. The accuracy of the random forest model is 93.20% on the training set and 99% on the validation set. (4) The data fusion strategy improves classification accuracy. The accuracy of the low-level fusion partial least squares discriminant analysis model training set and validation set is 100%, 99.12%. The accuracy of the random forest model’s training set and validation set are 92.32% and 99.14%. (5) Random Forest Algorithm Intermediate Data Fusion latent variable (LVs) training set 92.76%, validation set 96%; Intermediate Data Fusion principal components analysis (CPA) training set 97.15%, validation set 100%. (6) Partial Least Squares Discriminant Analysis Intermediate Data Fusion (LVs) training set is 100%, and validation set is 99.56%; the accuracy of intermediate data fusion (CPA) training set and validation set can reach 100%. Based on the discriminant analysis of the partial least squares method and random forest algorithm combined with data fusion strategy, the species identification of boletus is satisfactory. Partial Least Squares Discriminant Analysis Intermediate Data Fusion (CPA) can be used as a low-cost and high-efficiency technology for identifying boletus species.
Keywords:Boletus  Mid-infrared spectroscopy  Ultraviolet spectroscopy  Discriminant analysis by partial least squares  Random forest  Data fusion
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《光谱学与光谱分析》浏览原始摘要信息
点击此处可从《光谱学与光谱分析》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号