首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Multivariate spectral analysis has been widely applied in chemistry and other fields. Spectral data consisting of measurements at hundreds and even thousands of analytical channels can now be obtained in a few seconds. It is widely accepted that before a multivariate regression model is built, a well-performed variable selection can be helpful to improve the predictive ability of the model. In this paper, the concept of traditional wavelength variable selection has been extended and the idea of variable weighting is incorporated into least-squares support vector machine (LS-SVM). A recently proposed global optimization method, particle swarm optimization (PSO) algorithm is used to search for the weights of variables and the hyper-parameters involved in LS-SVM optimizing the training of a calibration set and the prediction of an independent validation set. All the computation process of this method is automatic. Two real data sets are investigated and the results are compared those of PLS, uninformative variable elimination-PLS (UVE-PLS) and LS-SVM models to demonstrate the advantages of the proposed method.  相似文献   

2.
Traditionally the partial least-squares (PLS) algorithm, commonly used in chemistry for ill-conditioned multivariate linear regression, has been derived (motivated) and presented in terms of data matrices. In this work the PLS algorithm is derived probabilistically in terms of stochastic variables where sample estimates calculated using data matrices are employed at the end. The derivation, which offers a probabilistic motivation to each step of the PLS algorithm, is performed for the general multiresponse case and without reference to any latent variable model of the response variable and also without any so-called "inner relation". On the basis of the derivation, some theoretical issues of the PLS algorithm are briefly considered: the complexity of the original motivation of PLS regression which involves an "inner relation"; the original motivation behind the prediction stage of the PLS algorithm; the relationship between uncorrelated and orthogonal latent variables; the limited possibilities to make natural interpretations of the latent variables extracted.  相似文献   

3.
为提高毒死蜱农药乳油中有效成分近红外光谱定量分析模型的精度和稳定性。采用联合区间偏最小二乘法(siPLS)结合遗传算法(GA)筛选特征变量,由交互验证法确定最佳主成分因子数及筛选的变量数。结果表明,从全光谱区优选出81个变量,主成分因子数为11时,能建立性能最优的模型,模型预测集的决定系数R_p~2为0.972,预测均方根误差(RMSEP)为0.353%。研究表明,利用siPLS结合GA方法优选特征变量,能大幅度地消除农药乳油光谱变量间的冗余信息和无关信息,降低模型的复杂度,提高农药有效成分预测模型的精度及稳定性。  相似文献   

4.
Two alternative partial least squares (PLS) methods, averaged PLS and weighted average PLS, are proposed and compared with the classical PLS in terms of root mean square error of prediction (RMSEP) for three real data sets. These methods compute the (weighted) average of PLS models with different complexity. The prediction abilities of the alternative methods are comparable to that of the classical PLS but they do not require to determine how many components should be included in the model. They are also more robust in the sense that the quality of prediction depends less on a good choice of the number of components to be included. In addition, weighted average PLS is also compared with the weighted average part of LOCAL, a published method that also applies weighted average PLS, with however an entirely different weighting scheme.  相似文献   

5.
Two-dimensional correlation spectroscopy (2DCOS) and near-infrared spectroscopy (NIRS) were used to determine the polyphenol content in oat grain. A partial least squares (PLS) algorithm was used to perform the calibration. A total of 116 representative oat samples from four locations in China were prepared and the corresponding near-infrared spectra were measured. Two-dimensional correlation spectroscopy was employed to select wavelength bands for the PLS regression model for the polyphenol determination. The number of PLS components and intervals was optimized according to the coefficients of determination (R2) and root mean square error of cross validation (RMSECV) in the calibration set. The performance of the final model was evaluated using the correlation coefficient (R) and the root mean square error of validation (RMSEV) in the prediction set. The results showed the band corresponding to the optimal calibration model was between 1350 and 1848?nm and the optimal spectral preprocessing combination was second derivative with second smoothing. The optimal regression model was obtained with an R2 of 0.8954 and an RMSECV of 0.06651 in the calibration set and R of 0.9614 and RMSEV of 0.04573 in the prediction set. These measurements reveal the calibration model had qualified predictive accuracy. The results demonstrated that the 2DCOS with PLS was a simple and rapid method for the quantitative determination of polyphenols in oats.  相似文献   

6.
Multivariate calibration is tested as an alternative to model chromium(III) concentration versus chemiluminescence registers obtained from luminol-hydrogen peroxide reaction. The multivariate calibration approaches included have been: conventional linear methods (principal component regression (PCR) and partial least squares (PLS)), nonlinear methods (nonlinear variants and variants of locally weighted regression) and linear methods combined with variable selection performed in the original or in the transformed data (stepwise multiple linear regression procedure). Both the direct and inverse univariate approaches have been also tested.

The use of a double logarithmic transformation previous to the linear regression has been also evaluated. A new double logarithmic transformation previous to the linear regression is proposed in order to avoid the effect of the noise in the calibration model. Pre-processing, optimization and prediction ability of the multivariate calibration models has been studied at nine different experimental conditions including batch and FIA measurements. Box-plots, PCA and cluster analysis have been employed to test the prediction ability of the different models tested. Nonlinear PCR and nonlinear PLS provide the best results. Real samples have been analyzed and compared with the reference method. The results confirm the successful use of the proposed methodology.  相似文献   


7.
In order to increase the predictive ability of the PLS (Partial Least Squares) model, we have developed a new algorithm, by which uninformative samples which cannot contribute to the model very much are eliminated from a calibration data set. In the proposed algorithm, uninformative wavelength (or independent) variables are eliminated at the first stage by using the modified UVE (Uninformative Variable Elimination)-PLS method that we reported previously. Then, if the prediction error of the ith (1 < or =i< or = n) sample is larger than 3sigma, the corresponding sample is eliminated as uninformative, where n is the total number of calibration samples and sigma is the standard deviation calculated from the other n(-1) samples. Calculation of sigma by the leave-one-out manner enhances the ability to identify the uninformative samples. The final PLS model is constructed precisely because both uninformative wavelength variables and uninformative samples are eliminated. In order to demonstrate the usefulness of the algorithm, we have applied it to two kinds of mid-infrared spectral data sets.  相似文献   

8.
Partial least squares (PLS) is a widely used algorithm in the field of chemometrics. In calibration studies, a PLS variant called orthogonal projection to latent structures (O‐PLS) has been shown to successfully reduce the number of model components while maintaining good prediction accuracy, although no theoretical analysis exists demonstrating its applicability in this context. Using a discrete formulation of the linear mixture model known as Beer's law, we explicitly analyze O‐PLS solution properties for calibration data. We find that, in the absence of noise and for large n, O‐PLS solutions are simpler but just as accurate as PLS solutions for systems in which analyte and background concentrations are uncorrelated. However, the same is not true for the most general chemometric data in which correlations between the analyte and background concentrations are nonzero and pure profiles overlap. On the contrary, forcing the removal of orthogonal components may actually degrade interpretability of the model. This situation can also arise when the data are noisy and n is small, because O‐PLS may identify and model the noise as orthogonal when it is statistically uncorrelated with the analytes. For the types of data arising from systems biology studies, in which the number of response variables may be much greater than the number of observations, we show that O‐PLS is unlikely to discover orthogonal variation whether or not it exists. In this case, O‐PLS and PLS solutions are the same. Copyright © 2011 John Wiley & Sons, Ltd.  相似文献   

9.
Run to run (R2R) optimization based on unfolded Partial Least Squares (u‐PLS) is a promising approach for improving the performance of batch and fed‐batch processes as it is able to continuously adapt to changing processing conditions. Using this technique, the regression coefficients of PLS are used to modify the input profile of the process in order to optimize the yield. When this approach was initially proposed, it was observed that the optimization performed better when PLS was combined with a smoothing technique, in particular a sliding window filtering, which constrained the regression coefficients to be smooth. In the present paper, this result is further investigated and some modifications to the original approach are proposed. Also, the suitability of different smoothing techniques in combination with PLS is studied for both end‐of‐batch quality prediction and R2R optimization. The smoothing techniques considered in this paper include the original filtering approach, the introduction of smoothing constraints in the PLS calibration (Penalized PLS), and the use of functional analysis (Functional PLS). Two fed‐batch process simulators are used to illustrate the results. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

10.
A new variable selection algorithm is described, based on ant colony optimization (ACO). The algorithm aim is to choose, from a large number of available spectral wavelengths, those relevant to the estimation of analyte concentrations or sample properties when spectroscopic analysis is combined with multivariate calibration techniques such as partial least-squares (PLS) regression. The new algorithm employs the concept of cooperative pheromone accumulation, which is typical of ACO selection methods, and optimizes PLS models using a pre-defined number of variables, employing a Monte Carlo approach to discard irrelevant sensors. The performance has been tested on a simulated system, where it shows a significant superiority over other commonly employed selection methods, such as genetic algorithms. Several near infrared spectroscopic experimental data sets have been subjected to the present ACO algorithm, with PLS leading to improved analytical figures of merit upon wavelength selection. The method could be helpful in other chemometric activities such as classification or quantitative structure-activity relationship (QSAR) problems.  相似文献   

11.
遗传算法用于偏最小二乘方法建模中的变量筛选   总被引:19,自引:0,他引:19  
利用全局搜索方法-遗传算法(genetic algorithms,GA)对近红外光谱分析中的波长变量进行筛选,再用偏最小二乘方法(patrial least squares,PLS)建立分析校正模型。对两类样品的近红外光谱分析应用实例表明,这种选取变量进行校正的方法,不仅简化、优化了模型,而且增强了所建模型的预测能力,尤其适用于单纯PLS较以校正关联的体系。  相似文献   

12.
Multivariate calibration are gaining popularity in assaying food matrices. Partial least squares is a powerful multivariate calibration method that used to build a quantitative relationship between measured variables and a property of interest (i.e., concentration) of the system under study. Partial least squares PLS calibration along with UV/vis spectral data was efficient to account for indirect food matrix and direct interference effects resulted from overlapping food dyes. PLS was able to quantify tartrazine TAT, allura red AR, sunset yellow SY and brilliant black BB that added to wide selection sugar-based candies. The results indicated that 70% of samples containing single dye while 8% containing TAT-SY mix and certain samples containing TAT + SY + AR + BB. Lollypops were found to contain high levels of AR (77–120 mg/kg) and TAT (56–166 mg/kg). The maximum adulteration was 50% observed in lollypops. PLS calibration was workable to predict colorants with prediction errors of 7%. Using PLS, dyes were detected down to 0.1 mg/L with acceptable accuracy and precision. PLS showed comparable performance with liquid chromatography for dyes quantification and can substitute laborious chromatography for quick detection of coloring agents in candies.  相似文献   

13.
An algorithm is proposed for extracting relevant information from near-infrared (NIR) spectra for multivariate calibration of routine components in complex plant samples. The algorithm is a combination of wavelet transform (WT) data compression and a procedure for uninformative variable elimination (UVE). After compression of the NIR spectra by WT, the UVE approach is used to eliminate the irrelevant wavelet coefficients. Finally, a calibration model is built from the retained wavelet coefficients to enable prediction. Because irrelevant information can be removed from the spectra used for multivariate calibration, the model based on the extracted relevant features is better than those obtained with full-spectrum data. Both prediction precision and calculation speed are improved.  相似文献   

14.
The wavelet packet transform (WPT) is a variant of the standard wavelet transform that offers greater flexibility in the decomposition of instrumental signals. Although encouraging results have been published concerning the use of WPT for signal compression and denoising, its application in multivariate calibration problems has received comparatively little attention, with very few contributions reported in the literature. This paper presents an investigation concerning the use of WPT as a feature extraction tool to improve the prediction ability of PLS models. The optimization of the wavelet packet tree is accomplished by using the classic dynamic programming algorithm and an entropy cost function modified to take into account the variance explained by the WPT coefficients. The selection of WPT coefficients for inclusion in the PLS model is carried out on the basis of correlation with the dependent variable, in order to exploit the joint statistics of the instrumental response and the parameter of interest. This WPT-PLS strategy is applied in a case study involving FT-IR spectrometric determination of four gasoline parameters, namely specific mass (SM) and the distillation temperatures at which 10%, 50%, 90% of the sample has evaporated. The dataset comprises 103 gasoline samples collected from gas stations and 6144 wavelengths in the range 2500-15000 nm. By applying WPT to the FT-IR spectra, considerable compression with respect to the original wavelength domain is achieved. The effect of varying the wavelet and the threshold level on the prediction ability of the resulting models is investigated. The results show that WPT-PLS outperforms standard PLS in most wavelet-threshold combinations for all determined parameters.  相似文献   

15.
以普通玉米籽粒为试验材料,在应用遗传算法结合偏最小二乘回归法对近红外光谱数据进行特征波长选择的基础上,应用偏最小二乘回归法建立了特征波长测定玉米籽粒中淀粉含量的校正模型.试验结果表明,基于11个特征波长所建立的校正模型,其校正误差(RMSEC)、交叉检验误差(RMSECV)和预测误差(RMSEP)分别为0.30%、0.35%和0.27%,校正数据集和独立的检验数据集的预测值与实际测定值之间的相关系数分别达到0.9279和0.9390,与全光谱数据所建立的预测模型相比,在预测精度上均有所改善,表明应用遗传算法和PLS进行光谱特征选择,能获得更简单和更好的模型,为玉米籽粒中淀粉含量的近红外测定和红外光谱数据的处理提供了新的方法与途径.  相似文献   

16.
A method for calibration and validation subset partitioning   总被引:13,自引:0,他引:13  
This paper proposes a new method to divide a pool of samples into calibration and validation subsets for multivariate modelling. The proposed method is of value for analytical applications involving complex matrices, in which the composition variability of real samples cannot be easily reproduced by optimized experimental designs. A stepwise procedure is employed to select samples according to their differences in both x (instrumental responses) and y (predicted parameter) spaces. The proposed technique is illustrated in a case study involving the prediction of three quality parameters (specific mass and distillation temperatures at which 10 and 90% of the sample has evaporated) of diesel by NIR spectrometry and PLS modelling. For comparison, PLS models are also constructed by full cross-validation, as well as by using the Kennard-Stone and random sampling methods for calibration and validation subset partitioning. The obtained models are compared in terms of prediction performance by employing an independent set of samples not used for calibration or validation. The results of F-tests at 95% confidence level reveal that the proposed technique may be an advantageous alternative to the other three strategies.  相似文献   

17.
This paper indicates the possibility to use near infrared (NIR) spectroscopy as a rapid method to predict quantitatively the content of caffeine and total polyphenols in green tea. A partial least squares (PLS) algorithm is used to perform the calibration. To decide upon the number of PLS factors included in the PLS model, the model is chosen according to the lowest root mean square error of cross-validation (RMSECV) in training. The correlation coefficient R between the NIR predicted and the reference results for the test set is used as an evaluation parameter for the models. The result showed that the correlation coefficients of the prediction models were R = 0.9688 for the caffeine and R = 0.9299 for total polyphenols. The study demonstrates that NIR spectroscopy technology with multivariate calibration analysis can be successfully applied as a rapid method to determine the valid ingredients of tea to control industrial processes.  相似文献   

18.
The present paper focuses on determining the number of PLS components by using resampling methods such as cross validation (CV), Monte Carlo cross validation (MCCV), bootstrapping (BS), etc. To resample the training data, random non‐negative weights are assigned to the original training samples and a sample‐weighted PLS model is developed without increasing the computational burden much. Random weighting is a generalization of the traditional resampling methods and is expected to have a lower risk of getting an insufficient training set. For prediction, only the training samples with random weights less than a threshold value are selected to ensure that the prediction samples have less influence on training. For complicated data, because the optimal number of PLS components is often not unique or readily distinguished and there might exist an optimal region of model complexity, the distribution of prediction errors can be more useful than a single value of root mean squared error of prediction (RMSEP). Therefore, the distribution of prediction errors are estimated by repeated random sample weighting and used to determine model complexity. RSW is compared with its traditional counterparts like CV, MCCV, BS and a recently proposed randomization test method to demonstrate its usefulness. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

19.
Glycerol monolaurate (GML) products contain many impurities, such as lauric acid and glucerol. The GML content is an important quality indicator for GML production. A hybrid variable selection algorithm, which is a combination of wavelet transform (WT) technology and modified uninformative variable eliminate (MUVE) method, was proposed to extract useful information from Fourier transform infrared (FT-IR) transmission spectroscopy for the determination of GML content. FT-IR spectra data were compressed by WT first; the irrelevant variables in the compressed wavelet coefficients were eliminated by MUVE. In the MUVE process, simulated annealing (SA) algorithm was employed to search the optimal cutoff threshold. After the WT-MUVE process, variables for the calibration model were reduced from 7366 to 163. Finally, the retained variables were employed as inputs of partial least squares (PLS) model to build the calibration model. For the prediction set, the correlation coefficient (r) of 0.9910 and root mean square error of prediction (RMSEP) of 4.8617 were obtained. The prediction result was better than the PLS model with full-spectra data. It was indicated that proposed WT-MUVE method could not only make the prediction more accurate, but also make the calibration model more parsimonious. Furthermore, the reconstructed spectra represented the projection of the selected wavelet coefficients into the original domain, affording the chemical interpretation of the predicted results. It is concluded that the FT-IR transmission spectroscopy technique with the proposed method is promising for the fast detection of GML content.  相似文献   

20.
In this paper, two spectral data sets have been used to illustrate the importance of maintaining chemical information whilst generating predictive multivariate calibration models. The first data set is based on 26 duplicate UV/VIS spectra for four meal ions (Fe, Ni, Co, Cu) present at varying concentrations in aqueous solution. Spectra were collected across the range 180–800 nm at a resolution of 3.5 nm generating 211 data points for each sample. Calibration was carried out using multiple linear regression (MLR) and a K-matrix approach to demonstrate the advantages the latter method has in describing real spectral features. In addition, the limitation of MLR in accommodating noise and spectral overlap in the data is also illustrated. The second data set based on NIR spectroscopy, was generated using a four-level 2 factor Factorial design strategy and consisted of two additives present at a range of concentrations in an aqueous caustic system, with the spectra being collected over the range 10,000–3000 cm−1. Whilst a conventional partial least squares (PLS) model was applied to the data, it was through the use of variable selection (VS) prior to PLS and the application of weighted ridge regression (WRR) techniques that the need to develop chemometric methodology which intuitively reflected chemical information has been demonstrated. The results will also illustrate how a poorly designed experimental design protocol and missing data can limit the performance of the calibration models generated. The aims of this paper are not to prescribe ideal calibration methodology but rather to demonstrate the relevance of selecting multivariate calibration methodology that relates more to the chem rather than just the metrics in chemometrics.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号