首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 15 毫秒
1.
该文提出了一种基于太赫兹时域光谱的水稻种子模式识别方法。实验以10种不同品牌混合掺假的水稻种子为样本,基于采集的样本太赫兹时域光谱数据,通过建立Relief、随机森林(RF)、支持向量机递归特征消除(SVM-RFE)和最大相关最小冗余(mRMR)模型分别对样本光谱波长进行特征选择,最后设计分类器对4种特征选择方法处理后的样本进行分类识别。结果表明,基于布谷鸟算法(CS)优化的极限学习机模型对经RF特征选择算法提取后的样本光谱数据具有最佳识别效果,其准确率可达100%,实验对于法庭科学领域内种子的掺假鉴定具有一定的借鉴意义。  相似文献   

2.
该文从实际案件中收集了5个地区共计204份指甲样本,运用希尔伯特变换滤波器对原始谱图进行降噪处理,然后采用主成分分析进行数据降维,借助朴素贝叶斯、随机森林以及偏最小二乘判别分析模型开展指甲地区的识别工作,并根据模型的识别率和相关指标筛选出最佳预处理方法和最优识别模型。结果表明,经预处理后的原始谱图识别率得到显著提升,希尔伯特变换滤波器结合主成分分析是最佳预处理方法,随机森林模型的稳定性和识别率均高于朴素贝叶斯和偏最小二乘判别分析模型,对最佳预处理方法的训练集识别率为94.88%,测试集识别率为93.47%。该方法能有效降低谱图的噪声,减少数据的冗余,提高模型的识别效果,为法庭科学中指甲地区的快速鉴定提供了参考。  相似文献   

3.
对现场缴获的食欲抑制剂进行快速检验能够为案件调查提供线索和方向,同时机器学习算法开展物证的快速无损检验是法庭物证学的重要研究之一。红外光谱是最经典的快速无损检验方法,滤波器能够有效地除去原始谱图的噪声和背景干扰,从而提高模型的识别效果。本文收集了从实际案件中缴获的4种食欲抑制剂样本共计291份,运用快速傅里叶变换滤波器和希尔伯特变换滤波器对样本原始光谱数据进行降噪处理,同时借助朴素贝叶斯和随机森林模型建立分类模型,开展识别工作,从而筛选除噪效果最优的滤波器,同时比较了朴素贝叶斯和随机森林模型的识别效果。结果表明,经滤波器处理后原始光谱数据的识别率和稳定性显著提升,希尔伯特变换滤波器的除噪效果要比快速傅里叶变换滤波器好,随机森林模型的识别率和稳定性均要比朴素贝叶斯模型强,随机森林模型对经希尔伯特变化滤波器处理后的训练集识别率为96.33%,测试集识别率为95.89%。该方法通过滤波器有效地滤除谱图的噪声,提高了模型定性识别能力,对法庭科学中食欲抑制剂的快速鉴定有一定的参考意义。  相似文献   

4.
自适应蚁群优化算法的近红外光谱特征波长选择方法   总被引:2,自引:0,他引:2  
为提高近红外光谱预测模型的精度和适用性,同时简化模型,提出了自适应蚁群优化偏最小二乘法优选特征波长的方法,建立不同产地苹果可溶性固形物含量混合分析模型。收集山东、陕西和新疆的富士苹果,采集3800~14000 cm"1范围的近红外光谱,并对其重要品质指标可溶性固形物含量进行测定。利用蚁群算法启发式全局搜索的特点,结合蒙特卡罗轮盘赌随机选择机制,优选苹果可溶性固形物含量的近红外光谱特征波长,然后用偏最小二乘法建立分析模型。与全光谱偏最小二乘模型和遗传偏最小二乘模型相比,蚁群优化算法选择的波长数最少,模型预测能力最强,预测的相关系数R和预测均方根误差RMSEP分别为0.9708和0.5144。研究结果表明,自适应蚁群优化算法可以有效选择近红外光谱特征波长,提高模型的稳健性和适用性。  相似文献   

5.
依据中药大黄的近红外光谱信息,采用最小二乘双胞胎支持向量机( LSTSVM)算法,通过MATLAB软件编程,建立参数可优化识别模型,实现了对中药大黄的真伪鉴别.将实验材料98个大黄样品随机划分为训练集和测试集,对于训练集60个样品采用留1/5法交叉验证优化模型参数,以所选最优化参数结合训练集样品的近红外光谱建立最优识别模型,对测试集的38个样品的真伪迸行识别,识别率可达97.4%.结果表明,LSTSVM算法是一种有效的识别方法,可依据中药大黄的近红外光谱对其真伪进行快速识别.同时,本研究将大黄样品6次随机划分为训练集和测试集,建模预测平均识别率为93.4%,表明采用LSTSVM算法建立识别模型具有较好的稳健性.  相似文献   

6.
本文用近红外光谱结合最小二乘双胞胎支持向量机(LSTSVM)算法建立了烟叶等级分类模型。从三个等级共210个烟叶样品中,取出120个样品作为建模集,剩余90个样品作为预测集。为了建立最优模型,对光谱预处理方法和模型参数进行筛选优化,最优模型对预测集样品的平均识别率为95.56%,结果表明该方法可以作为烟叶等级分类的一种有效方法。此外,将该算法与SIMCA、PLS-DA、SVM等三种常见的模式识别算法进行了比较,结果表明基于样品的原始光谱,同等条件下,LSTSVM算法的预测效果优于其他三种算法。  相似文献   

7.
将稳定度自适应重加权采样特征变量选择算法用于支持向量机定性分析(Support vector machine-stability competitive adaptive reweighted sampling,SVM-SCARS)。该算法通过对数据多次采样建模计算各变量的稳定度值,稳定度值能更加客观准确地评估变量在建模中的作用,因此可作为变量重要性的评价依据。通过循环迭代方式,采用自适应重加权采样技术逐步筛选变量,然后以每次循环所得变量子集建立SVM模型,并以模型交叉验证分类正确率(Correct classification rate of cross validation,CCRCV)评估子集优劣,确定最优特征变量子集。将该算法结合漫反射近红外光谱技术建立了制浆造纸常用木材的树种识别模型,实现了对4种桉木和2种相思木的快速识别分类。最终共筛选出15个特征变量建立分类模型,模型对各树种分类的正确率达97.9%,具有较好的分类效果。与全光谱模型和递归特征消除支持向量机模型相比,SVM-SCARS能够筛选出更少的特征变量,且模型具有更好的预测性能和稳定性。研究结果表明,SVM-SCARS算法能够有效优化光谱特征变量,提高近红外在线分析模型在木材材性分析中的稳健性和适用性。  相似文献   

8.
在法庭科学领域,轮胎橡胶颗粒的检验鉴别对交通肇事和一些诉讼案件的侦破尤为重要,针对传统取样分析技术会破坏物证的问题和综合考察样本在多变量多维度上的差异性,提出基于红外光谱法结合K近邻算法无损识别轮胎橡胶的鉴别方法。采集不同品牌的样本,对其光谱进行自动基线校正和归一化操作,采用Savitsky-Golay算法平滑去噪,通过降维实现对840个原始特征到5个识别特征的高效筛选,运用训练样本为测试样本的方法进行交互验证,选取K值为1,"特征3"为主要自变量,"特征4"、"特征5"、"特征2"和"特征1"为协变量作为分类参数,按重要性加权特征进行计算样本之间的距离,建立分类模型,模型总分类准确率达83. 56%,区分效果良好,结合样本红外谱图展开进一步分析,最终成功将73类样本分为了10类。结果表明,利用红外光谱检测和K近邻算法可实现对轮胎橡胶颗粒的识别与分类,普适性和高效性较强,具有一定的借鉴和参考意义。  相似文献   

9.
在法庭科学领域,轮胎橡胶颗粒的检验鉴别对交通肇事和一些诉讼案件的侦破尤为重要,针对传统取样分析技术会破坏物证的问题和综合考察样本在多变量多维度上的差异性,提出基于红外光谱法结合K近邻算法无损识别轮胎橡胶的鉴别方法。采集不同品牌的样本,对其光谱进行自动基线校正和归一化操作,采用Savitsky-Golay算法平滑去噪,通过降维实现对840个原始特征到5个识别特征的高效筛选,运用训练样本为测试样本的方法进行交互验证,选取K值为1,"特征3"为主要自变量,"特征4"、"特征5"、"特征2"和"特征1"为协变量作为分类参数,按重要性加权特征进行计算样本之间的距离,建立分类模型,模型总分类准确率达83. 56%,区分效果良好,结合样本红外谱图展开进一步分析,最终成功将73类样本分为了10类。结果表明,利用红外光谱检测和K近邻算法可实现对轮胎橡胶颗粒的识别与分类,普适性和高效性较强,具有一定的借鉴和参考意义。  相似文献   

10.
为了解决传统接触式疲劳驾驶检测方法影响驾驶、检测算法识别率较低等问题,本文提出一种基于稀疏表示的眼睛状态识别的方法。利用K-SVD(K均值奇异值分解)方法对输入的训练集构造过完备冗余字典,利用正交匹配追踪法对测试的图像进行稀疏表示,然后根据重构图像和测试图像之间的误差,确定测试图像所属的类别,判断出测试图像的状态。实验中将K-SVD和OMP(正交匹配追踪)方法与其它字典学习和稀疏表示方法进行对比,结果表明,利用K-SVD字典学习算法结合OMP算法获得了较好的识别效果。  相似文献   

11.
Many gram-negative bacteria use type IV secretion systems to deliver effector molecules to a wide range of target cells. These substrate proteins, which are called type IV secreted effectors (T4SE), manipulate host cell processes during infection, often resulting in severe diseases or even death of the host. Therefore, identification of putative T4SEs has become a very active research topic in bioinformatics due to its vital roles in understanding host-pathogen interactions. PSI-BLAST profiles have been experimentally validated to provide important and discriminatory evolutionary information for various protein classification tasks. In the present study, an accurate computational predictor termed iT4SE-EP was developed for identifying T4SEs by extracting evolutionary features from the position-specific scoring matrix and the position-specific frequency matrix profiles. First, four types of encoding strategies were designed to transform protein sequences into fixed-length feature vectors based on the two profiles. Then, the feature selection technique based on the random forest algorithm was utilized to reduce redundant or irrelevant features without much loss of information. Finally, the optimal features were input into a support vector machine classifier to carry out the prediction of T4SEs. Our experimental results demonstrated that iT4SE-EP outperformed most of existing methods based on the independent dataset test.  相似文献   

12.
Rice blast is a serious threat to rice yield. Breeding disease-resistant varieties is one of the most economical and effective ways to prevent damage from rice blast. The traditional identification of resistant rice seeds has some shortcoming, such as long possession time, high cost and complex operation. The purpose of this study was to develop an optimal prediction model for determining resistant rice seeds using Ranman spectroscopy. First, the support vector machine (SVM), BP neural network (BP) and probabilistic neural network (PNN) models were initially established on the original spectral data. Second, due to the recognition accuracy of the Raw-SVM model, the running time was fast. The support vector machine model was selected for optimization, and four improved support vector machine models (ABC-SVM (artificial bee colony algorithm, ABC), IABC-SVM (improving the artificial bee colony algorithm, IABC), GSA-SVM (gravity search algorithm, GSA) and GWO-SVM (gray wolf algorithm, GWO)) were used to identify resistant rice seeds. The difference in modeling accuracy and running time between the improved support vector machine model established in feature wavelengths and full wavelengths (200–3202 cm−1) was compared. Finally, five spectral preproccessing algorithms, Savitzky–Golay 1-Der (SGD), Savitzky–Golay Smoothing (SGS), baseline (Base), multivariate scatter correction (MSC) and standard normal variable (SNV), were used to preprocess the original spectra. The random forest algorithm (RF) was used to extract the characteristic wavelengths. After different spectral preproccessing algorithms and the RF feature extraction, the improved support vector machine models were established. The results show that the recognition accuracy of the optimal IABC-SVM model based on the original data was 71%. Among the five spectral preproccessing algorithms, the SNV algorithm’s accuracy was the best. The accuracy of the test set in the IABC-SVM model was 100%, and the running time was 13 s. After SNV algorithms and the RF feature extraction, the classification accuracy of the IABC-SVM model did not decrease, and the running time was shortened to 9 s. This demonstrates the feasibility and effectiveness of IABC in SVM parameter optimization, with higher prediction accuracy and better stability. Therefore, the improved support vector machine model based on Ranman spectroscopy can be applied to the fast and non-destructive identification of resistant rice seeds.  相似文献   

13.
该研究利用一维尺度不变特征变换(SIFT)算法寻找烟叶近红外光谱(Near infrared spectroscopy,NIRS)的稳定特征波长,根据样品精密度测试光谱筛选的波长计算重现率和重现度,采用L_9(3~3)正交表优化SIFT算法中的相关参数,使重现率和重现度尽可能高。基于优化的参数和主机上10个代表性样品的光谱,筛选出10个稳定特征波长集合,以这些波长集合并集的光谱响应为自变量,采用偏最小二乘(PLS)方法构建烟叶总植物碱NIRS模型(简称SIFT-PLS)。该模型直接传递到3台从机后,对3台从机样品总植物碱的平均相对预测误差(MRE)均满足小于6%的企业内控要求,而全光谱模型(WW-PLS)直接转移后仅1台从机的MRE满足要求,经分段直接校正(PDS)方法校正从机光谱后,WW-PLS模型也仅对1台从机的MRE小于6%。采用SIFT算法筛选稳定特征波长建立的NIRS模型可在3台从机直接共享,无需转移集,不需对从机光谱或光谱模型进行校正,实现了真正意义的无标样NIRS模型的直接转移。  相似文献   

14.
We applied the random forest method to discriminate among different kinds of cut tobacco. To overcome the influence of the descending resolution caused by column pollution and the subsequent deterioration of column efficacy at different testing times, we constructed combined peaks by summing the peaks over a specific elution time interval Δt. On constructing tree classifiers, both the original peaks and the combined peaks were considered. A data set of 75 samples from three grades of the same tobacco brand was used to evaluate our method. Two parameters of the random forest were optimized using out-of-bag error, and the relationship between Δt and classification rate was investigated. Experiments show that partial least squares discriminant analysis was not suitable because of the overfitting, and the random forest with the combined features performed more accurately than Naïve Bayes, support vector machines, bootstrap aggregating and the random forest using only its original features.  相似文献   

15.
16.
In tobacco industry of China, tobacco leaves are classified and managed in terms of their cultivation areas and plant parts of tobacco-stalks. However, sometimes intentionally or involuntary mislabeling cultivation areas, blending tobacco plant parts would occur into tobacco market. The error will affect the style and quality of cigarettes. In the present work, more than 1000 Chinese flue-cured tobacco leaf samples, which have 12 genotypes and cultivated from 5 to 10 regions of China in 2003 and 2004, have been discriminated by means of an improved and simplified KNN classification algorithm (IS-KNN) based on near infrared (NIR) spectra. An original method of optimizing number of significant principal components (PCs) based on analysis of error and cross-validation was advanced. Compared with conventional pattern recognition methods KNN, NN, LDA and PLS-DA, IS-KNN exhibits good adaptability in discrimination of complicated Chinese flue-cured tobaccos. The practice in this work shows that optimized number of PCs and performance of classification models are closely relative to complicated extent of samples but not to number of categories or samples. The results demonstrated the usefulness of NIR spectra combined with chemometrics as an objective and rapid method for the authentication and identification of tobacco leaves or other kinds of powder samples.  相似文献   

17.
越来越多的研究表明:药物分子与靶标分子的结合动力学性质与其在体内的药效有很强的相关性。因此,以改善结合动力学性质为导向的分子设计为药物研发提供了新的思路。本工作的研究目标在于得出预测药物分子解离速率常数(koff)的通用型定量结构-动力学关系(QSKR)模型。我们从文献中收集了406个配体分子的解离速率常数实验值,采用分子模拟方法构建了所有配体与靶蛋白复合物的三维结构模型。然后基于蛋白-配体原子对描述符,采用随机森林算法来构建预测配体分子解离速率常数的QSKR模型。通过探索不同条件(如距离区间,划分区间宽度和特征选择标准)下产生的描述符集合对模型预测精度的影响,确定当采用距离阈值为15?、划分区间宽度为3?、特征选择方差水平为2时得到的QSKR模型为最优,在两个独立测试集上获得良好的预测精度(相关系数为0.62)。本工作对预测药物分子解离速率常数这一关键科学问题进行了有益的探索,可为后续研究提供思路。  相似文献   

18.
与统计分析和神经网络相比,基于结构风险最小的支持向量机有更好的分类性能。它用于非线性分类时,先将样本映射到更高维的特征空间,往往会增加复共线性与冗余信息,将影响样本分布,降低线性支持向量机分类器(LSVC)的预测性能。本研究提出非线性分类相关分析算法(NLCCA),利用核函数技术,无需了解非线性映射的算式,从特征空间的样本映像中提取分类相关成分,以消除冗余信息,改善样本分布。由此构建的NLCCA-LSVC集成分类器具有优良的预测性能。经模拟数据的测试,并实际用于两个复杂的化学模式识别问题,均取得令人满意的效果,也印证了算法的有效性。  相似文献   

19.
为提高毒死蜱农药乳油中有效成分近红外光谱定量分析模型的精度和稳定性。采用联合区间偏最小二乘法(siPLS)结合遗传算法(GA)筛选特征变量,由交互验证法确定最佳主成分因子数及筛选的变量数。结果表明,从全光谱区优选出81个变量,主成分因子数为11时,能建立性能最优的模型,模型预测集的决定系数R_p~2为0.972,预测均方根误差(RMSEP)为0.353%。研究表明,利用siPLS结合GA方法优选特征变量,能大幅度地消除农药乳油光谱变量间的冗余信息和无关信息,降低模型的复杂度,提高农药有效成分预测模型的精度及稳定性。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号