首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于随机森林算法的食源性致病菌拉曼光谱识别
引用本文:王其,曾万聃,夏志平,李志萍,曲晗.基于随机森林算法的食源性致病菌拉曼光谱识别[J].中国激光,2021(3):130-138.
作者姓名:王其  曾万聃  夏志平  李志萍  曲晗
作者单位:上海应用技术大学计算机科学与信息工程学院;军事兽医研究所
摘    要:药品食品的安全问题一直是人们关注的重点。相比于传统的食源性致病菌光谱检测方法,拉曼光谱法具有检测范围广、检测灵活、光谱特征突出等特点。本文以常见的食源性致病菌为研究对象,利用拉曼光谱仪采集了11种食源性致病菌样品的132个拉曼光谱数据,提出了一种基于主成分分析和随机森林算法的分类模型。实验结果表明,主成分分析结合随机森林算法的分类模型可以将食源性致病菌区分开,且分类准确度可达到91.36%。

关 键 词:光谱学拉曼光谱  机器学习  食源性致病菌检测  主成分分析  随机森林

Recognition of Food-Borne Pathogenic Bacteria by Raman Spectroscopy Based on Random Forest Algorithm
Wang Qi,Zeng Wandan,Xia Zhiping,Li Zhiping,Qu Han.Recognition of Food-Borne Pathogenic Bacteria by Raman Spectroscopy Based on Random Forest Algorithm[J].Chinese Journal of Lasers,2021(3):130-138.
Authors:Wang Qi  Zeng Wandan  Xia Zhiping  Li Zhiping  Qu Han
Institution:(College of Computer Science and Information Engineering,Shanghai Institute of Technology,Shanghai 201418,China;Military Veterinary Institute,Changchun,Jilin 130062,China)
Abstract:Objective Food and drug safety is of great concern to society.Food pathogenic bacteria are pathogenic bacteria that can cause food poisoning or bacteria that use food as the vector of transmission.Therefore,quick and effective detection of food-borne pathogenic bacteria in food is crucial to protect public health.The culture separation method,which is traditionally used to examine microorganisms,depends on the medium used for culturing,separation,and biochemical identification.Detection of food-borne pathogenic bacteria generally requires five to seven days and includes a series of detection procedures such as pre-enrichment,selective enrichment,microscopic examination and serological verification.Therefore,traditional detection methods are insufficient for preventing and controlling foodborne pathogenic bacteria.However,Raman spectroscopy is a nondestructive method that can be used to rapidly and accurately identify molecules existing in the functional groups.In this study,11 food-borne pathogenic bacteria samples were used to construct a recognition and classification model based on a random forest algorithm and Raman spectra.This model was then used to build a classification and recognition model to resolve the problems of low classification accuracy and long detection time required by traditional methods used to detect food-borne pathogenic bacteria.The results of this study will help to ensure public health safety by rapidly and effectively detecting pathogens in food and drugs.Methods All of the food-borne pathogenic bacteria in this study were purchased from China Center of Industrial Culture Collection.First,a sample of food-borne pathogenic bacteria was detected by Raman spectrometry in a shift range of 500--1600 cm-1.LabSpec 6.0 software was used for spectral collection,and each sample was collected 15 times.After screening,132 Raman spectral data were obtained.Min-max normalization was performed on the Raman spectral data in the spectral preprocessing stage,and the intensity was mapped to a range of 0,1] for comparison.The Savitzky-Golay algorithm was used for smooth denoising to remove noise and fluorescence interference.Principal component analysis(PCA) was used for feature dimensionality reduction for sample data with high-dimensional characteristics to avoid problems caused by excessively high dimensions.In the model evaluation stage,K-fold cross-validation was used to verify whether the model balanced underfitting and overfitting phenomena and to evaluate the model stability.According to these criteria,the Raman spectral recognition model based on the random forest algorithm proposed in this study was able to effectively distinguish different food-borne pathogenic bacteria among the collected samples.Results and Discussions In this study,K-nearest neighbors(KNN),logistic regression,support vector machine(SVM),decision tree,and random forest models were used for classification prediction of the pre-treated Raman spectral data of the food-borne pathogenic bacteria(Table 4).Among the 10-fold cross-validation models,the accuracy of the random forest model was better than that of the traditional machine learning algorithms.The decision tree model presented the worst results,with an accuracy rate of 82.63%.This is because the decision tree results in a single weak learner,whereas the random forest model includes multiple votes that are combined to form strong learning(Fig.5).Therefore,the classification ability of the random forest algorithm is higher than that of a single decision tree classifier.Compared with traditional machine learning algorithms,the random forest algorithm adds two randomness elements in the model construction:sampling randomness and feature selection randomness(Table 2).Because the random forest is composed of decision trees,a higher correlation of decision trees results in a higher error rate.Random sampling determines the decrease degree in the correlation of each tree in the random forest.Among a small number of features selected randomly by each tree in the random forest,the features of optimal splitting ability are chosen as the left and right subtrees of the decision tree.This expands the effect of randomness and further enhances the robustness of the model.Because the introduction of the two randomness elements has a strong effect on reducing the variance of the model,the random forest generally does not need additional pruning.That is,it can achieve better generalization and a stronger ability to avoid overfitting,resulting in low variance.In addition,the Savitzky-Golay filtering algorithm was used for denoising in the preprocessing stage of the Raman spectral data(Fig.3) to ensure good anti-interference ability in the model.Conclusions Raman spectroscopy is a mature technology that has a significant effect on the detection and classification of food-borne pathogenic bacteria.In this study,a Raman spectrometer was used to detect the spectral data of 11 food-borne pathogens.According to the spectral properties,the spectral data were normalized,smoothed,and denoised in the preprocessing stage,which facilitated the model construction and training.In addition,a method was developed for identification and analysis of food-borne pathogenic bacteria by using Raman spectroscopy.The experimental results show that the classification model of PCA combined with the random forest algorithm proposed in this study has higher accuracy for Raman spectral data than that of the single machine learning method used conventionally for detecting food-borne pathogens.In addition,the new method improves the speed of manual identification of the Raman spectra.However,the random forest model was prone to overfitting in the sample sets with large noise processing.Future research to improve the accuracy of the model will show that denoising can be optimized in the data pretreatment stage and that the data feature selection algorithm can be optimized using the random forest algorithm.Only 11 samples of food-borne pathogenic bacteria were used in this study.Additional samples could be introduced in the construction of a later model to build a more complete Raman spectral database.
Keywords:Raman spectroscopy  machine learning  food-borne pathogen detection  principle component analysis  random forest Spectroscopy
本文献已被 CNKI 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号