首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 156 毫秒
1.
该文将蒙特卡洛-无变量信息消除(MC-UVE)算法和变量重要性投影(VIP)算法结合,挑选出重要、有信息的波长变量,建立了MC-UVE-VIP两步波长筛选方法。该法首先采用MC-UVE筛选出稳定性参数大于某一阈值(Mthreshold)的有信息波长集合UUVE,然后采用VIP算法从UUVE中筛选出VIP参数大于UUVE中所有波长VIP均值的波长,作为重要、有信息的波长集合UUVE-VIP。基于UUVE-VIP建立玉米中蛋白质含量的偏最小二乘回归(PLSR)近红外光谱预测模型,模型的潜变量个数根据累计贡献率大于99.9%确定。该模型变量少、稳健、可解释性强、运算速度快,其预测两台从机样品蛋白质的平均相对误差(MARE)分别为1.64%与1.88%,均小于MC-UVE模型的从机MARE(5.40%与5.19%)和VIP模型的从机MARE(6.23%与7.16%)。因此,基于MC-UVE-VIP两步波长筛选法所建立的玉米蛋白质含量近红外光谱模型可直接传递到从机,...  相似文献   

2.
采用CARS(Competitive adaptive reweighted sampling)变量筛选方法建模,显著提高了液态奶中蛋白质与脂肪近红外模型的预测精度。用蒙特卡罗采样(Monte-Carlo sampling)方法先剔除奇异样本,再对光谱进行中心化与Karl Norris滤波降噪处理,通过CARS方法筛选出与样本性质密切相关的变量,建立预测蛋白质与脂肪含量的偏最小二乘法(PLS)校正模型,并与未选变量的PLS模型进行比较。以定标集相关系数(r2)及交互验证均方残差(RMSECV)和预测误差均方根(RMSEP)作为判定依据,确定了蛋白质与脂肪的最佳建模条件。蛋白质与脂肪校正模型的相关系数分别为0.975 0、0.995 1,RMSECV分别为0.194 8、0.136 3,RMSEP分别为0.113 3、0.140 1,预测结果优于未选变量的PLS模型及其他选变量方法,有效简化了模型,适于液态奶中脂肪和蛋白质的快速、无损检测。  相似文献   

3.
该文在群体智能的鲸鱼优化算法(WOA)基础上,提出了一种改进的鲸鱼优化算法(iWOA)用于近红外光谱波长的选择。首先引入混沌策略初始化种群,避免算法过早陷入局部最优;其次引入一种非线性时变Sigmoid传递函数和贪心算法思想,提升算法探优能力,使得模型获得更好的预测精度。为验证算法的有效性,以玉米脂肪、蛋白质、淀粉、水4个指标的近红外光谱数据进行偏最小二乘(PLS)建模分析,并与其他算法进行对比。结果表明,iWOA算法能在最短时间内,有效地筛选出波长变量,降低模型的复杂度,提升模型的预测精度。在玉米脂肪、蛋白质、淀粉、水含量的预测上,与全光谱相比,模型的预测集均方根误差(RMSEP)分别从0.077 2、0.122 4、0.334 4、0.059 5降至0.033 2、0.050 7、0.139 2、0.004 4,预测精度分别提升了57.0%、58.6%、58.3%、92.6%;算法选出的波长数目分别为:84、69、87、66。  相似文献   

4.
将竞争自适应重加权采样(CARS)与区间偏最小二乘回归(iPLS)相结合的变量筛选建模方法 CARSiPLS,用于烟煤中水分与挥发分的近红外光谱测定。以CARS逐步筛选出每个区间与待测量相关的变量,建立烟煤中水分与挥发分近红外光谱测定的偏最小二乘回归模型。结果表明:与PLS、iPLS相比,CARSiPLS可以显著减少变量数,同时提高模型预测性能;挥发分建模变量从1557个减少至15个,水分建模变量从1557个减少至317个;挥发分、水分的预测平均绝对百分误差分别从0.031 5降至0.018 4、从0.188 4降至0.094 6;挥发分、水分的预测均方差分别从0.010 8降至0.006 7、从0.005 0降至0.002 8。  相似文献   

5.
将竞争自适应重加权采样(CARS)与区间偏最小二乘回归(iPLS)相结合的变量筛选建模方法 CARSiPLS,用于烟煤中水分与挥发分的近红外光谱测定。以CARS逐步筛选出每个区间与待测量相关的变量,建立烟煤中水分与挥发分近红外光谱测定的偏最小二乘回归模型。结果表明:与PLS、iPLS相比,CARSiPLS可以显著减少变量数,同时提高模型预测性能;挥发分建模变量从1557个减少至15个,水分建模变量从1557个减少至317个;挥发分、水分的预测平均绝对百分误差分别从0.031 5降至0.018 4、从0.188 4降至0.094 6;挥发分、水分的预测均方差分别从0.010 8降至0.006 7、从0.005 0降至0.002 8。  相似文献   

6.
陶焕明  高美凤 《分析测试学报》2021,40(10):1482-1488
该文在免疫遗传算法(IGA)的基础上,提出一种改进免疫遗传算法(iIGA)用于近红外光谱波长变量的选择。该算法舍去了原算法中固定抗体相似度阈值的思想,取而代之的是抗体相似度阈值自适应,同时引入精英保留策略和贪心算法思想,使得算法朝着正确的方向进行局部性探优。将该算法在玉米的淀粉和蛋白质含量数据集上进行实验测试,建立偏最小二乘(PLS)分析模型,并与IGA、遗传算法(GA)以及全谱方法进行了对比。结果表明,在玉米淀粉含量的预测上,iIGA相较于原IGA算法,预测集均方根误差(RMSEP)从0.312 0降至0.298 0,预测集预测精度提升4.5%;在玉米蛋白质含量的预测上,RMSEP从0.124 4降至0.110 3,预测集预测精度提升11.3%。分别对预测淀粉和蛋白质模型的RMSEP值进行显著性检验,F值分别为165.22和182.05,P值分别为9.5 × 10-23和4.5 × 10-24,P值均小于0.05,因此,iIGA能显著提升模型预测精度。  相似文献   

7.
该文针对近红外光谱因冗余变量导致的标定模型预测性能差的问题,提出了一种迭代缩减窗口自助软收缩(ISWBOSS)算法。该方法使用窗口对变量进行划分,随机抽取窗口并利用其中的变量建立子模型,计算窗口内变量回归系数的归一化并作为权重继续进行加权采样,从而逐步实现变量空间的软收缩。同时在迭代过程中不断缩减窗口大小对特征变量进行精确搜索。通过在玉米数据集上进行验证,并与全谱法、遗传算法、竞争自适应重加权采样法和自助软收缩法建立的偏最小二乘模型对比,结果表明,新方法不论在准确性还是稳定性上都具有显著优势。以玉米蛋白质含量预测为例,与自助软收缩算法相比,ISWBOSS的预测均方根误差从0.041 8降至0.010 3,且达到最优模型所需的迭代次数更少,运算效率更高。该方法对提高近红外光谱标定模型的性能具有一定的指导意义。  相似文献   

8.
变量选择技术是光谱建模的重要环节。本研究提出了一种新的变量选择方法——自加权变量组合集群分析法(AWVCPA),首先通过二进制矩阵采样法(BMS)对变量空间进行采样;其次通过对变量出现频率(Fre)和偏最小二乘回归系数(Reg)两种信息向量(IVs)做加权处理,得到了每个光谱变量的贡献值,进而考虑到了Fre和Reg两类IVs对于光谱建模的影响;最后通过指数衰减函数(EDF)删除贡献小的波长点,进而实现特征变量选取。以啤酒和玉米两组近红外光谱数据为例,基于偏最小二乘法(PLS)建立啤酒中酵母浓度预测模型和玉米中油浓度预测模型,对比其它变量选择方法。研究表明,在相同条件下,基于AWVCPA变量选择方法建立的预测模型都取得了最优的预测精度,对啤酒中酵母浓度的预测,相比全光谱PLS模型,RMSEP由0.5348下降到0.1457,预测精度提高了72.7%;对玉米含油量的预测,相比全光谱PLS模型,预测均方根误差(RMSEP)由0.0702下降到了0.0248,预测精度提高了64.7%。  相似文献   

9.
光谱分析技术由于具有简单、快速、无损等优势,在复杂体系的定性和定量分析中得到了广泛应用。然而光谱中往往包含成百上千的波长点,有些波长点与研究的目标性质并不相关,加大了计算量并降低了模型的预测准确度。因此,在建立模型前需要进行变量选择。最小绝对收缩与选择算子(LASSO)可将回归系数收缩为0,进而达到变量选择的目的。该研究将LASSO用于三元调和油样品近红外光谱和生物样品拉曼光谱的变量选择,基于偏最小二乘(PLS)和多元线性回归(MLR)模型,分别对香油和肌氨酸的含量进行定量分析,并与无信息变量消除-PLS(UVE-PLS)、蒙特卡罗结合UVE-PLS(MCUVE-PLS)和随机检验-PLS(RT-PLS)3种变量选择方法进行比较。结果表明,基于LASSO的变量选择方法保留的变量数最少,运算速度最快。对三元调和油样品,LASSO-PLS预测的准确度最高;对生物样品,LASSO-MLR预测的准确度最高。因此,基于LASSO的变量选择算法有望在光谱分析领域中得到良好应用。  相似文献   

10.
偏最小二乘算法( Partial least squares, PLS)可以很好地解决分析数据中的变量共线性问题,在光谱分析,尤其是近/中红外及拉曼光谱的定量分析中应用广泛。针对PLS存在的有效信息提取和噪声抑制问题,提出一种变量聚类重加权的PLS算法。通过对光谱的各波数变量进行聚类并分别建模,然后集成为全谱模型。通过对计算并赋予各子类不同的权重,根据对模型的贡献对变量进行重加权,从而提高算法的预测精度。汽油中的辛烷值预测和烟草中的烟碱含量预测两组近红外数据验证表明,所提出算法优于经典的PLS算法,其RMSEP在两组数据中分别降低32%和22%,在光谱数据的定量分析中具有潜在的应用优势。  相似文献   

11.
《Analytical letters》2012,45(9):2073-2083
Abstract

A consensus regression approach based on partial least square (PLS) regression, named as cPLS, for calibrating the NIR data was investigated. In this approach, multiple independent PLS models were developed and integrated into a single consensus model. The utility and merits of the cPLS method were demonstrated by comparing its results with those from a regular PLS method in predicting moisture, oil, protein, and starch contents of corn samples using the NIR spectral data. It was found that cPLS was superior to regular PLS with respect to prediction accuracy and robustness.  相似文献   

12.
以普通玉米籽粒为试验材料,在应用遗传算法结合偏最小二乘回归法对近红外光谱数据进行特征波长选择的基础上,应用偏最小二乘回归法建立了特征波长测定玉米籽粒中淀粉含量的校正模型.试验结果表明,基于11个特征波长所建立的校正模型,其校正误差(RMSEC)、交叉检验误差(RMSECV)和预测误差(RMSEP)分别为0.30%、0.35%和0.27%,校正数据集和独立的检验数据集的预测值与实际测定值之间的相关系数分别达到0.9279和0.9390,与全光谱数据所建立的预测模型相比,在预测精度上均有所改善,表明应用遗传算法和PLS进行光谱特征选择,能获得更简单和更好的模型,为玉米籽粒中淀粉含量的近红外测定和红外光谱数据的处理提供了新的方法与途径.  相似文献   

13.
In this work, different approaches for variable selection are studied in the context of near-infrared (NIR) multivariate calibration of textile. First, a model-based regression method is proposed. It consists in genetic algorithm optimisation combined with partial least squares regression (GA-PLS). The second approach is a relevance measure of spectral variables based on mutual information (MI), which can be performed independently of any given regression model. As MI makes no assumption on the relationship between X and Y, non-linear methods such as feed-forward artificial neural network (ANN) are thus encouraged for modelling in a prediction context (MI-ANN). GA-PLS and MI-ANN models are developed for NIR quantitative prediction of cotton content in cotton-viscose textile samples. The results are compared to full-spectrum (480 variables) PLS model (FS-PLS). The model requires 11 latent variables and yielded a 3.74% RMS prediction error in the range 0-100%. GA-PLS provides more robust model based on 120 variables and slightly enhanced prediction performance (3.44% RMS error). Considering MI variable selection procedure, great improvement can be obtained as 12 variables only are retained. On the basis of these variables, a 12 inputs ANN model is trained and the corresponding prediction error is 3.43% RMS error.  相似文献   

14.
Near-infrared reflectance spectroscopy (NIRS) is often applied when a rapid quantification of major components in feed is required. This technique is preferred over the other analytical techniques due to the relatively few requirements concerning sample preparations, high efficiency and low costs of the analysis. In this study, NIRS was used to control the content of crude protein, fat and fibre in extracted rapeseed meal which was produced in the local industrial crushing plant. For modelling the NIR data, the partial least squares approach (PLS) was used. The satisfactory prediction errors were equal to 1.12, 0.13 and 0.45 (expressed in percentages referring to dry mass) for crude protein, fat and fibre content, respectively. To point out the key spectral regions which are important for modelling, uninformative variable elimination PLS, PLS with jackknife-based variable elimination, PLS with bootstrap-based variable elimination and the orthogonal partial least squares approach were compared for the data studied. They enabled an easier interpretation of the calibration models in terms of absorption bands and led to similar predictions for test samples compared to the initial models.  相似文献   

15.
This paper reports the results of a rapid method to determine sucrose in chocolate mass using near infrared spectroscopy (NIRS). We applied a broad-based calibration approach, which consists in putting together in one single calibration samples of various types of chocolate mass. This approach increases the concentration range for one or more compositional parameters, improves the model performance and requires just one calibration model for several recipes. The data were modelled using partial least squares (PLS) and multiple linear regression (MLR). The MLR models were developed using a variable selection based on the coefficient regression of PLS and genetic algorithm (GA). High correlation coefficients (0.998, 0.997, 0.998 for PLS, MLR and GA-MLR, respectively) and low prediction errors confirms the good predictability of the models. The results show that NIR can be used as rapid method to determine sucrose in chocolate mass in chocolate factories.  相似文献   

16.
《Analytical letters》2012,45(12):1910-1921
Multiblock partial least squares (MB-PLS) are applied for determination of corn and tobacco samples by using near-infrared diffuse reflection spectroscopy. In the model, the spectra are separated into several sub-blocks along the wavenumber, and different latent variable number was used for each sub-block. Compared with ordinary PLS, the importance and the contribution of each sub-block can be balanced by super-weights and the usage of different latent variable numbers. Therefore, the prediction obtained by the MB-PLS model is superior to that of the ordinary PLS, especially for the large data sets of tobacco samples with a large number of variables.  相似文献   

17.
In this paper, we proposed a wavelength selection method based on random decision particle swarm optimization with attractor for near‐infrared (NIR) spectra quantitative analysis. The proposed method was incorporated with partial least square (PLS) to construct a prediction model. The proposed method chooses the current own optimal or the current global optimal to calculate the attractor. Then the particle updates its flight velocity by the attractor, and the particle state is updated by the random decision with the new velocity. Moreover, the root‐mean‐square error of cross‐validation is adopted as the fitness function for the proposed method. In order to demonstrate the usefulness of the proposed method, PLS with all wavelengths, uninformative variable elimination by PLS, elastic net, genetic algorithm combined with PLS, the discrete particle swarm optimization combined with PLS, the modified particle swarm optimization combined with PLS, the neighboring particle swarm optimization combined with PLS, and the proposed method are used for building the components quantitative analysis models of NIR spectral datasets, and the effectiveness of these models is compared. Two application studies are presented, which involve NIR data obtained from an experiment of meat content determination using NIR and a combustion procedure. Results verify that the proposed method has higher predictive ability for NIR spectral data and the number of selected wavelengths is less. The proposed method has faster convergence speed and could overcome the premature convergence problem. Furthermore, although improving the prediction precision may sacrifice the model complexity under a certain extent, the proposed method is overfitted slightly. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

18.
Recently we have proposed a new variable selection algorithm, based on clustering of variable concept (CLoVA) in classification problem. With the same idea, this new concept has been applied to a regression problem and then the obtained results have been compared with conventional variable selection strategies for PLS. The basic idea behind the clustering of variable is that, the instrument channels are clustered into different clusters via clustering algorithms. Then, the spectral data of each cluster are subjected to PLS regression. Different real data sets (Cargill corn, Biscuit dough, ACE QSAR, Soy, and Tablet) have been used to evaluate the influence of the clustering of variables on the prediction performances of PLS. Almost in the all cases, the statistical parameter especially in prediction error shows the superiority of CLoVA-PLS respect to other variable selection strategies. Finally the synergy clustering of variable (sCLoVA-PLS), which is used the combination of cluster, has been proposed as an efficient and modification of CLoVA algorithm. The obtained statistical parameter indicates that variable clustering can split useful part from redundant ones, and then based on informative cluster; stable model can be reached.  相似文献   

19.
Near-infrared spectroscopy (NIR) is widely used in food quantitative and qualitative analysis. Variable selection technique is a critical step of the spectrum modeling with the development of chemometrics. In this study, a novel variable selection strategy, automatic weighting variable combination population analysis (AWVCPA), is proposed. Firstly, binary matrix sampling (BMS) strategy, which provides each variable the same chance to be selected and generates different variable combinations, is used to produce a population of subsets to construct a population of sub-models. Then, the variable frequency (Fre) and partial least squares regression (Reg), two kinds of information vector (IVs), are weighted to obtain the value of the contribution of each spectral variables, and the influence of two IVs of Rre and Reg is considered to each spectral variable. Finally, it uses the exponentially decreasing function (EDF) to remove the low contribution wavelengths so as to select the characteristic variables. In the case of near infrared spectra of beer and corn, yeast and oil concentration models based on partial least squares (PLS) of prediction are established. Compared with other variable selection methods, the research shows that AWVCPA is the best variable selection strategy in the same situation. It has 72.7% improvement comparing AWVCPA-PLS to PLS and the predicted root mean square error (RMSEP) decreases from 0.5348 to 0.1457 on beer dataset. Also it has 64.7% improvement comparing AWVCPA-PLS to PLS and the RMSEP decreases from 0.0702 to 0.0248 on corn dataset.  相似文献   

20.
This study proposes an analytical method for the simultaneous near infrared (NIR) spectrometric determination of palmitic, oleic, linoleic and linolenic acids in sea buckthorn seed oil. For this purpose, four different combinations of multivariate calibration methods and variable selections were evaluated: partial least squares (PLS) with full spectrum; PLS with uninformative variables elimination (UVE); PLS with competitive adaptive reweighted sampling (CARS); and multiple linear regression (MLR) with uninformative variable elimination combined with successive projections algorithm (UVE-SPA). An independent set of samples was employed to evaluate the performance of the resulting models. The UVE-SPA-MLR model developed with a few spectral variables provided the best results for each parameter. The values of relative errors of prediction (REP) from the UVE-SPA-MLR model for palmitic, oleic, linoleic and linolenic acids are 1.77%, 1.20%, 1.02% and 1.40%, respectively. These results indicate that this method is a feasible and fast method for the determination of the fatty acid content of sea buckthorn seed oil.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号