首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 109 毫秒
1.
采用便携式近红外光谱分析仪,对苹果样品进行扫描获得光谱数据,运用偏最小二乘法结合基于粒子群算法的波长选择方法对苹果试验数据进行多元统计分析,建立数学模型,利用该模型对苹果酸度进行了预测。对于基于粒子群算法和全谱偏最小二乘方法,校正集样品的酸度预测值和实测值之间的相关系数分别为0.9880和0.9553,校正均方根误差分别为0.0197和0.0388;预测集样品的酸度预测值和实测值之间的相关系数分别为0.9833和0.9596,预测均方根误差分别为0.0193和0.0304。与全谱偏最小二乘法相比,基于粒子群算法的偏最小二乘法,不仅较大地减少波长变量而降低计算量,而且也较大地提高了模型性能而增强了模型预测的准确性。该方法可建立较好的定量分析模型,能广泛应用于现场或野外苹果酸度的快速分析。  相似文献   

2.
该文针对近红外光谱因冗余变量导致的标定模型预测性能差的问题,提出了一种迭代缩减窗口自助软收缩(ISWBOSS)算法。该方法使用窗口对变量进行划分,随机抽取窗口并利用其中的变量建立子模型,计算窗口内变量回归系数的归一化并作为权重继续进行加权采样,从而逐步实现变量空间的软收缩。同时在迭代过程中不断缩减窗口大小对特征变量进行精确搜索。通过在玉米数据集上进行验证,并与全谱法、遗传算法、竞争自适应重加权采样法和自助软收缩法建立的偏最小二乘模型对比,结果表明,新方法不论在准确性还是稳定性上都具有显著优势。以玉米蛋白质含量预测为例,与自助软收缩算法相比,ISWBOSS的预测均方根误差从0.041 8降至0.010 3,且达到最优模型所需的迭代次数更少,运算效率更高。该方法对提高近红外光谱标定模型的性能具有一定的指导意义。  相似文献   

3.
为提升多元校正模型的性能并简化其复杂度,遵循机器学习领域的集成思路,提出了一种基于子空间建模并以多元线性回归为基础的算法,多元校正算法,简写作SER-MLR。通过两个近红外光谱定量应用试验及与全谱偏最小二乘算法(PLS)的比较,验证了其优良的综合性能,该算法不仅容易解释,而且能够以更低的计算代价建立起简洁、稳健的校正模型,对过拟合也不敏感。  相似文献   

4.
张若秋  杜一平 《分析测试学报》2020,39(10):1282-1287
在实际多元校正应用中有很多因素会影响偏最小二乘(PLS)模型的预测效果,作为光谱数据本源的仪器噪声是其中的重要影响因素。以往的研究工作多使用各种滤波器或平滑方法来降低仪器噪声的影响,然而对于仪器噪声如何影响偏最小二乘的建模过程和模型预测能力鲜有报道。该文阐述并论证了仪器噪声怎样通过第一个隐变量的计算被引入模型中,经过对偏最小二乘计算过程的理论推导,论述了噪声的引入对偏最小二乘权重向量、载荷向量计算具有累积效应,并随着后续隐变量的计算不断在模型中传递,从而对偏最小二乘模型产生影响。同时对偏最小二乘模型的预测误差进行理论分解,将其划分为无噪理想模型本身的误差和由噪声传播导致的误差。结果表明,仪器噪声不仅会降低偏最小二乘模型的预测性能,还会影响偏最小二乘模型的最优复杂度选择。  相似文献   

5.
成忠  诸爱士 《分析化学》2008,36(6):788-792
针对光谱数据峰宽、局部效应显著、含有噪音、变量个数多及彼此间常存在严重的复共线性等问题,改进和设计一种光谱数据局部校正方法:基于窗口平滑的段式正交信号校正方法,并将之结合偏最小二乘回归,以实现光谱数据的预处理及定量分析。通过NIPALS算法初始化将滤去的正交成分,以近邻分段方式进行逐个波长点的正交信号校正。而后将去噪后的光谱矩阵作为新的自变量阵,通过偏最小二乘回归构建其与性质参变量间的校正模型。通过小麦近红外漫反射光谱数据的应用实验结果表明,本方法正交成分估计稳定,去噪明显,模型的预报性能优于其它方法,PLS成分数减少,模型更加简洁。  相似文献   

6.
陈昭  吴志生  史新元  徐冰  赵娜  乔延江 《分析化学》2014,(11):1679-1686
建立金银花醇沉过程中稳健的近红外光谱( Near infrared spectroscopy,NIR)定量模型,为金银花醇沉过程的快速评价提供方法。研究基于金银花醇沉过程绿原酸的 NIR 数据,通过建立 Bagging 偏最小二乘(Bagging-PLS)模型、Boosting偏最小二乘(Boosting-PLS)模型与偏最小二乘(Partial Least Squares,PLS)模型,实现对模型性能比较;在此基础上,采用组合间隔偏最小二乘法( Synergy interval partial least squares,siPLS)和竞争自适应抽样( Competitive adaptive reweighted sampling,CARS )法分别对光谱进行变量筛选,建立模型,实现了对模型预测性能的考察。实验结果表明, Bagging-PLS和Boosting-PLS(潜变量因子数设为10)的预测性能均优于 PLS 模型。在此基础上,两批样品采用 siPLS 筛选变量,第一个批次金银花筛选波段820~1029.5 nm和1030~1239.5 nm,第二个批次金银花醇沉筛选波段为820~959.5 nm和960~1099.5 nm;采用CARS方法变量筛选,两批样品分别选择5折交叉验证和10折交叉验证,取交叉验证均方根误差( RMSECV)值最小的子集作为最终变量筛选的结果。经过变量筛选的两批金银花醇沉过程中的绿原酸含量Bagging-PLS和Boosting-PLS模型的预测均方根误差(RMSEP)值降低了0.02~0.04 g/L,预测相关系数提高了4%~5%。综上,Baggning-PLS和Boosting-PLS算法可作为金银花醇沉过程NIR定量模型的快速预测方法。  相似文献   

7.
将小波变换和多维偏最小二乘法相结合用于近红外光谱定量校正模型的建立.首先将原始光谱进行小波变换分解,得到系列小波细节系数,通过选取一组受外界因素少、信息强的小波系数组成三维光谱阵,然后再采用多维偏最小二乘法建立校正模型.实验结果表明,该方法所建近红外校正模型的预测能力更强,并更具稳健性.  相似文献   

8.
连续投影算法在近红外光谱校正模型优化中的应用   总被引:4,自引:0,他引:4  
主要从减少变量、提高校正速度的角度,采用了一种新的变量提取方法——连续投影算法(successive projections algorithm)来优化白酒酒精度的近红外光谱定量模型,对于异常样品的剔除沿用了T2椭圆法,使模型更具代表性和稳健性,只用了全部变量的1.17%(9个变量)建立模型,其预测相关系数0.9477,得到了较好的预测效果,并与采用经无信息变量消除法进行波长优选后的偏最小二乘(partial least-squares)方法建立的校正模型做了比较,进一步证明这种算法是切实可行的。  相似文献   

9.
变量选择技术是光谱建模的重要环节。本研究提出了一种新的变量选择方法——自加权变量组合集群分析法(AWVCPA),首先通过二进制矩阵采样法(BMS)对变量空间进行采样;其次通过对变量出现频率(Fre)和偏最小二乘回归系数(Reg)两种信息向量(IVs)做加权处理,得到了每个光谱变量的贡献值,进而考虑到了Fre和Reg两类IVs对于光谱建模的影响;最后通过指数衰减函数(EDF)删除贡献小的波长点,进而实现特征变量选取。以啤酒和玉米两组近红外光谱数据为例,基于偏最小二乘法(PLS)建立啤酒中酵母浓度预测模型和玉米中油浓度预测模型,对比其它变量选择方法。研究表明,在相同条件下,基于AWVCPA变量选择方法建立的预测模型都取得了最优的预测精度,对啤酒中酵母浓度的预测,相比全光谱PLS模型,RMSEP由0.5348下降到0.1457,预测精度提高了72.7%;对玉米含油量的预测,相比全光谱PLS模型,预测均方根误差(RMSEP)由0.0702下降到了0.0248,预测精度提高了64.7%。  相似文献   

10.
组合偏最小二乘回归方法在近红外光谱定量分析中的应用   总被引:4,自引:1,他引:3  
成忠  诸爱士  陈德钊 《分析化学》2007,35(7):978-982
针对近红外光谱数据局部效应显著,变量个数多,彼此间常存在严重的复共线性,并多与样品组分含量呈非线性关系,构建一种组合非线性偏最小二乘回归(E-S-QPLSR)方法。它采用无重复采样技术(subag-ging),从训练样本中生成若干子样,然后每个子样通过二次多项式偏最小二乘回归(QPLSR),建立其子模型,并实现对训练样本因变量的定量预测,再将它们交由线性PLS算法用于计算各子模型的组合权系数。将该法应用于80个玉米样品的水组分含量与其近红外光谱的定量关系建模,效果良好,显示出很强的学习能力,所建模型的预报性能也优于其它方法。  相似文献   

11.
《Analytical letters》2012,45(16):2640-2651
An ensemble multivariate calibration algorithm, termed as MISEPLS, is proposed. In MISEPLS, when constructing a member model, the variables that have mutual information (MI) with the response less than a threshold are eliminated; thus, the modeling can be performed in a subset of original variables and some problems arising from multi-collinearity can be avoided. Through experiments on three near-infrared (NIR) spectroscopic datasets from the food industry, MISEPLS proves to be superior to the single-model full-spectrum PLS and MIPLS (PLS combined with MI-induced variable selection). MISEPLS can improve the accuracy and robustness of a calibration model, without increasing its complexity.  相似文献   

12.
A new ensemble learning algorithm is presented for quantitative analysis of near-infrared spectra. The algorithm contains two steps of stacked regression and Partial Least Squares (PLS), termed Dual Stacked Partial Least Squares (DSPLS) algorithm. First, several sub-models were generated from the whole calibration set. The inner-stack step was implemented on sub-intervals of the spectrum. Then the outer-stack step was used to combine these sub-models. Several combination rules of the outer-stack step were analyzed for the proposed DSPLS algorithm. In addition, a novel selective weighting rule was also involved to select a subset of all available sub-models. Experiments on two public near-infrared datasets demonstrate that the proposed DSPLS with selective weighting rule provided superior prediction performance and outperformed the conventional PLS algorithm. Compared with the single model, the new ensemble model can provide more robust prediction result and can be considered an alternative choice for quantitative analytical applications.  相似文献   

13.
This paper investigates the effects of the ratio of positive-to-negative samples on the sensitivity, specificity, and concordance. When the class sizes in the training samples are not equal, the classification rule derived will favor the majority class and result in a low sensitivity on the minority class prediction. We propose an ensemble classification approach to adjust for differential class sizes in a binary classifier system. An ensemble classifier consists of a set of base classifiers; its prediction rule is based on a summary measure of individual classifications by the base classifiers. Two re-sampling methods, augmentation and abatement, are proposed to generate different bootstrap samples of equal class size to build the base classifiers. The augmentation method balances the two class sizes by bootstrapping additional samples from the minority class, whereas the abatement method balances the two class sizes by sampling only a subset of samples from the majority class. The proposed procedure is applied to a data set to predict estrogen receptor binding activity and to a data set to predict animal liver carcinogenicity using SAR (structure-activity relationship) models as base classifiers. The abatement method appears to perform well in balancing sensitivity and specificity.  相似文献   

14.
This paper investigates the effects of the ratio of positive-to-negative samples on the sensitivity, specificity, and concordance. When the class sizes in the training samples are not equal, the classification rule derived will favor the majority class and result in a low sensitivity on the minority class prediction. We propose an ensemble classification approach to adjust for differential class sizes in a binary classifier system. An ensemble classifier consists of a set of base classifiers; its prediction rule is based on a summary measure of individual classifications by the base classifiers. Two re-sampling methods, augmentation and abatement, are proposed to generate different bootstrap samples of equal class size to build the base classifiers. The augmentation method balances the two class sizes by bootstrapping additional samples from the minority class, whereas the abatement method balances the two class sizes by sampling only a subset of samples from the majority class. The proposed procedure is applied to a data set to predict estrogen receptor binding activity and to a data set to predict animal liver carcinogenicity using SAR (structure-activity relationship) models as base classifiers. The abatement method appears to perform well in balancing sensitivity and specificity.  相似文献   

15.
Based on a so-called ensemble strategy, an algorithm is proposed for near-infrared (NIR) spectral calibration of complex beverage samples. This algorithm is a combination of a novel training set/test set sample-selection procedure based on a Kohonen self-organizing map (SOM) with a simple procedure to calculate an average partial least-squares (PLS) calibration model, which is therefore named SOMEPLS. In order to verify the proposed SOMEPLS, two NIR beverage datasets involving the determination of sugar content are considered, and three kinds of reference algorithm, i.e., conventional PLS (CPLS), the Kennard-Stone (KS) algorithm in combination with PLS (KSPLS), and sample set partitioning based on the joint x-y distance (SPXY) algorithm in combination with PLS (SPXYPLS), are used. Of these, both KS and SPXY are well-known representative sample-selection algorithms. By comparison, it was found that when there is a training set of appropriate size, SOMEPLS can achieve better prediction accuracy than the three reference algorithms, but without increasing the complexity of the corresponding calibration model for the future application, indicating that SOMEPLS can serve as a promising tool for NIR spectral calibration.  相似文献   

16.
The effectiveness of a regression method strongly depends on the characteristics of the considered regression problem. As a consequence, this makes it difficult to choose a priori the most appropriate algorithm for a given dataset. This issue is faced in this work through a novel regression approach based on the fusion of an ensemble of different regressors. In order to implement the proposed robust multiple system (RMS), four different fusion strategies are explored. In this context, we propose a novel fusion strategy named selection‐based strategy (SBS) that provides as output the estimate obtained by the regression algorithm (included in the ensemble) characterized by the highest expected accuracy in the region of the feature space associated with the considered model. The SBS is based not on a direct combination of the estimates yielded by all the regressors but on a selection mechanism that identifies the expected best available estimate. For such purpose, it exploits the accuracies of the regressors included in the ensemble in different portions of the input feature space. The experimental assessment of the RMS was carried out on three different datasets: a wine, an orange juice, and an apple datasets. The obtained experimental results suggest that, in general, the fusion of an ensemble of different regression algorithms leads to a regression process that is more robust and sometimes also more accurate than traditional regression methods. In particular, the proposed SBS method represents an effective solution to carry out the fusion process. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

17.
In this work, different approaches for variable selection are studied in the context of near-infrared (NIR) multivariate calibration of textile. First, a model-based regression method is proposed. It consists in genetic algorithm optimisation combined with partial least squares regression (GA-PLS). The second approach is a relevance measure of spectral variables based on mutual information (MI), which can be performed independently of any given regression model. As MI makes no assumption on the relationship between X and Y, non-linear methods such as feed-forward artificial neural network (ANN) are thus encouraged for modelling in a prediction context (MI-ANN). GA-PLS and MI-ANN models are developed for NIR quantitative prediction of cotton content in cotton-viscose textile samples. The results are compared to full-spectrum (480 variables) PLS model (FS-PLS). The model requires 11 latent variables and yielded a 3.74% RMS prediction error in the range 0-100%. GA-PLS provides more robust model based on 120 variables and slightly enhanced prediction performance (3.44% RMS error). Considering MI variable selection procedure, great improvement can be obtained as 12 variables only are retained. On the basis of these variables, a 12 inputs ANN model is trained and the corresponding prediction error is 3.43% RMS error.  相似文献   

18.
The accurate and reliable real‐time prediction of melt index (MI) is indispensable in quality control of the industrial propylene polymerization (PP) processes. This paper presents a real‐time soft sensor based on optimized least squares support vector machine (LSSVM) for MI prediction. First, the hybrid continuous ant colony differential evolution algorithm (HACDE) is proposed to optimize the parameters of LSSVM. Then, considering the complexity and nondeterminacy of PP plant, an online correcting strategy (OCS) is adopted to update the modeling data and to revise the model's parameters adaptively. Thus, the real‐time prediction model, HACDE‐OCS‐LSSVM, is obtained. Based on the data from a real PP plant, the models of HACDE‐LSSVM, DE‐LSSVM and LSSVM are also developed for comparison. The research results show that the proposed real‐time model achieves a good performance in the practical industrial MI prediction process. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

19.
The results of a series of constant pressure and temperature molecular-dynamics (MD) simulation studies based on the rigorous shell particle formulation of the isothermal-isobaric (NpT) ensemble are presented. These MD simulations validate the newly proposed constant pressure equations of motion in which a "shell" particle is used to define uniquely the volume of the system [M. J. Uline and D. S. Corti, J. Chem. Phys. (to be published), preceding paper]. Ensemble averages obtained with the new MD NpT algorithm match the ensemble averages obtained using the previously derived shell particle Monte Carlo NpT method [D. S. Corti, Mol. Phys. 100, 1887 (2002)]. In addition, we also verify that the Hoover NpT MD algorithm [W. G. Hoover, Phys. Rev. A 31, 1695 (1985); 34, 2499 (1986)] generates the correct ensemble averages, though only when periodic boundary conditions are employed. The extension of the shell particle MD algorithm to multicomponent systems is also discussed, in which we show for equilibrium properties that the identity of the shell particle is completely arbitrary when periodic boundary conditions are applied. Self-diffusion coefficients determined with the shell particle equations of motion are also identical to those obtained in other ensembles. Finally, since the mass of the shell particle is known, the system itself, and not a piston of arbitrary mass, controls the time scales for internal pressure and volume fluctuations. We therefore consider the effects of the shell particle on the dynamics of the system. Overall, the shell particle MD algorithm is an effective simulation method for studying systems exposed to a constant external pressure and may provide an advantage over other existing constant pressure approaches when developing nonequilibrium MD methods.  相似文献   

20.
A newly developed approach for predicting the structure of segments that connect known elements of secondary structure in proteins has been applied to some of the longer loops in the G-protein coupled receptors (GPCRs) rhodopsin and the dopamine receptor D2R. The algorithm uses Monte Carlo (MC) simulation in a temperature annealing protocol combined with a scaled collective variables (SCV) technique to search conformation space for loop structures that could belong to the native ensemble. Except for rhodopsin, structural information is only available for the transmembrane helices (TMHs), and therefore the usual approach of finding a single conformation of lowest energy has to be abandoned. Instead the MC search aims to find the ensemble located at the absolute minimum free energy, i.e., the native ensemble. It is assumed that structures in the native ensemble can be found by an MC search starting from any conformation in the native funnel. The hypothesis is that native structures are trapped in this part of conformational space because of the high-energy barriers that surround the native funnel. In this work it is shown that the crystal structure of the second extracellular loop (e2) of rhodopsin is a member of this loop’s native ensemble. In contrast, the crystal structure of the third intracellular loop is quite different in the different crystal structures that have been reported. Our calculations indicate, that of three crystal structures examined, two show features characteristic of native ensembles while the other one does not. Finally the protocol is used to calculate the structure of the e2 loop in D2R. Here, the crystal structure is not known, but it is shown that several side chains that are involved in interaction with a class of substituted benzamides assume conformations that point into the active site. Thus, they are poised to interact with the incoming ligand.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号