首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
I. Stanimirova 《Talanta》2007,72(1):172-178
An efficient methodology for dealing with missing values and outlying observations simultaneously in principal component analysis (PCA) is proposed. The concept described in the paper consists of using a robust technique to obtain robust principal components combined with the expectation maximization approach to process data with missing elements. It is shown that the proposed strategy works well for highly contaminated data containing different amounts of missing elements. The authors come to this conclusion on the basis of the results obtained from a simulation study and from analysis of a real environmental data set.  相似文献   

2.
Advances in sensory systems have led to many industrial applications with large amounts of highly correlated data, particularly in chemical and pharmaceutical processes. With these correlated data sets, it becomes important to consider advanced modeling approaches built to deal with correlated inputs in order to understand the underlying sources of variability and how this variability will affect the final quality of the product. Additional to the correlated nature of the data sets, it is also common to find missing elements and noise in these data matrices. Latent variable regression methods such as partial least squares or projection to latent structures (PLS) have gained much attention in industry for their ability to handle ill‐conditioned matrices with missing elements. This feature of the PLS method is accomplished through the nonlinear iterative PLS (NIPALS) algorithm, with a simple modification to consider the missing data. Moreover, in expectation maximization PLS (EM‐PLS), imputed values are provided for missing data elements as initial estimates, conventional PLS is then applied to update these elements, and the process iterates to convergence. This study is the extension of previous work for principal component analysis (PCA), where we introduced nonlinear programming (NLP) as a means to estimate the parameters of the PCA model. Here, we focus on the parameters of a PLS model. As an alternative to modified NIPALS and EM‐PLS, this paper presents an efficient NLP‐based technique to find model parameters for PLS, where the desired properties of the parameters can be explicitly posed as constraints in the optimization problem of the proposed algorithm. We also present a number of simulation studies, where we compare effectiveness of the proposed algorithm with competing algorithms. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

3.
Maximum likelihood principal component analysis (MLPCA) was originally proposed to incorporate measurement error variance information in principal component analysis (PCA) models. MLPCA can be used to fit PCA models in the presence of missing data, simply by assigning very large variances to the non‐measured values. An assessment of maximum likelihood missing data imputation is performed in this paper, analysing the algorithm of MLPCA and adapting several methods for PCA model building with missing data to its maximum likelihood version. In this way, known data regression (KDR), KDR with principal component regression (PCR), KDR with partial least squares regression (PLS) and trimmed scores regression (TSR) methods are implemented within the MLPCA method to work as different imputation steps. Six data sets are analysed using several percentages of missing data, comparing the performance of the original algorithm, and its adapted regression‐based methods, with other state‐of‐the‐art methods. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

4.
This paper presents a Bayesian approach to the development of spectroscopic calibration models. By formulating the linear regression in a probabilistic framework, a Bayesian linear regression model is derived, and a specific optimization method, i.e. Bayesian evidence approximation, is utilized to estimate the model “hyper-parameters”. The relation of the proposed approach to the calibration models in the literature is discussed, including ridge regression and Gaussian process model. The Bayesian model may be modified for the calibration of multivariate response variables. Furthermore, a variable selection strategy is implemented within the Bayesian framework, the motivation being that the predictive performance may be improved by selecting a subset of the most informative spectral variables. The Bayesian calibration models are applied to two spectroscopic data sets, and they demonstrate improved prediction results in comparison with the benchmark method of partial least squares.  相似文献   

5.
Stanimirova I  Walczak B 《Talanta》2008,76(3):602-609
Missing elements and outliers can often occur in experimental data. The presence of outliers makes the evaluation of any least squares model parameters difficult, while the missing values influence the adequate identification of outliers. Therefore, approaches that can handle incomplete data containing outliers are highly valued. In this paper, we present the expectation-maximization robust soft independent modeling of class analogy approach (EM-S-SIMCA) based on the recently introduced spherical SIMCA method. Several important issues like the possibility of choosing the complexity of the model with the leverage correction procedure, the selection of training and test sets using methods of uniform design for incomplete data and prediction of new samples containing missing elements are discussed. The results of a comparison study showed that EM-S-SIMCA outperforms the classic expectation-maximization SIMCA method. The performance of the method was illustrated on simulated and real data sets and led to satisfactory results.  相似文献   

6.
In this paper, we deal with multivariate measurement error models for replicated data under heavy‐tailed distributions, providing appealing robust and adaptable alternatives to the usual Gaussian assumptions. The models contain both error‐prone covariates and predictors measured without errors. The surrogates of the response and the multiple error‐prone covariates are replicated and are allowed unpaired and/or unequal cases. Under the scale mixtures of normal distribution class, we provide an explicit iterative formula of the maximum likelihood estimation via an expectation‐maximization‐type algorithm. Closed forms of asymptotic variances of the estimators are also given. The effect and robustness performances are confirmed by the simulation studies. Two real data sets are analyzed by the proposed models. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

7.
8.
The presence of multicollinearity in regression data is no exception in real life examples. Instead of applying ordinary regression methods, biased regression techniques such as principal component regression and ridge regression have been developed to cope with such datasets. In this paper, we consider partial least squares (PLS) regression by means of the SIMPLS algorithm. Because the SIMPLS algorithm is based on the empirical variance-covariance matrix of the data and on least squares regression, outliers have a damaging effect on the estimates. To reduce this pernicious effect of outliers, we propose to replace the empirical variance-covariance matrix in SIMPLS by a robust covariance estimator. We derive the influence function of the resulting PLS weight vectors and the regression estimates, and conclude that they will be bounded if the robust covariance estimator has a bounded influence function. Also the breakdown value is inherited from the robust estimator. We illustrate the results using the MCD estimator and the reweighted MCD estimator (RMCD) for low-dimensional datasets. Also some empirical properties are provided for a high-dimensional dataset.  相似文献   

9.
In this work, a comparative study of two novel algorithms to perform sample selection in local regression based on Partial Least Squares Regression (PLS) is presented. These methodologies were applied for Near Infrared Spectroscopy (NIRS) quantification of five major constituents in corn seeds and are compared and contrasted with global PLS calibrations. Validation results show a significant improvement in the prediction quality when local models implemented by the proposed algorithms are applied to large data bases.  相似文献   

10.
基于多模型共识的偏最小二乘法用于近红外光谱定量分析   总被引:6,自引:0,他引:6  
建立了多模型共识偏最小二乘(cPLS)建模方法, 并应用于烟草样品近红外(NIR)光谱与常规成分氯含量之间的建模研究, 探讨了建模参数对预测结果的影响. 结果表明, cPLS方法与传统的偏最小二乘算法(PLS)相比, 所建模型更稳定可靠, 预测结果也可得到了明显改善.  相似文献   

11.
12.
13.
The UV spectrophotometric analysis of a multicomponent mixture containing paracetamol, caffeine, tripelenamine and salicylamide by using multivariate calibration methods, such as principal component regression (PCR) and partial least-squares regression (PLS), was described. The calibration set was based on 47 reference samples, consisting of quaternary, ternary, binary and single-component mixtures, with the aim to develop models able to predict the concentrations of unknown samples containing as many as one-to-four components. The calibration models were optimized by an appropriate selection of the number of factors as well as wavelength ranges to be used for building up the data matrix and excluding any information about the interfering excipients included in pharmaceutics. The PCR and PLS models were compared and their predictive performance was inferred by a successful application to the assays of synthetic mixtures and pharmaceutical formulations.  相似文献   

14.
Two alternative partial least squares (PLS) methods, averaged PLS and weighted average PLS, are proposed and compared with the classical PLS in terms of root mean square error of prediction (RMSEP) for three real data sets. These methods compute the (weighted) average of PLS models with different complexity. The prediction abilities of the alternative methods are comparable to that of the classical PLS but they do not require to determine how many components should be included in the model. They are also more robust in the sense that the quality of prediction depends less on a good choice of the number of components to be included. In addition, weighted average PLS is also compared with the weighted average part of LOCAL, a published method that also applies weighted average PLS, with however an entirely different weighting scheme.  相似文献   

15.
Gadolinium can be difficult to determine by ICP-MS. In a normal geological sample there are risks of spectroscopic interferences on all of its isotopes. In this study this problem has been solved by using partial least squares (PLS) regression. Two PLS models are investigated: the first is based on aqueous standards, and the second on reference materials. Both models are capable of determining Gd with good results in reference materials containing interfering elements. It was not necessary to correct for nonspectroscopic matrix interferences. PLS is compared to principal components regression (PCR), another multivariate calibration method. For the aqueous standards PLS leads to a simpler model, while similar results are obtained for the two methods in the model based on reference materials.  相似文献   

16.
The selectivity and robustness of near-infrared (near-IR) calibration models based on short-scan Fourier transform (FT) infrared interferogram data are explored. The calibration methodology used in this work employs bandpass digital filters to reduce the frequency content of the interferogram data, followed by the use of partial least-squares (PLS) regression to build calibration models with the filtered interferogram signals. Combination region near-IR interferogram data are employed corresponding to physiological levels of glucose in an aqueous matrix containing variable levels of alanine, sodium ascorbate, sodium lactate, urea, and triacetin. A randomized design procedure is used to minimize correlations between the component concentrations and between the concentration of glucose and water. Because of the severe spectral overlap of the components, this sample matrix provides an excellent test of the ability of the calibration methodology to extract the glucose signature from the interferogram data. The robustness of the analysis is also studied by applying the calibration models to data collected outside of the time span of the data used to compute the models. A calibration model based on 52 samples collected over 4 days and employing two digital filters produces a standard error of calibration (SEC) of 0.36 mM glucose. The corresponding standard errors of prediction (SEP) for data collected on the 5th (18 samples) and 7th (10 samples) day are 0.42 and 0.48 mM, respectively. The interferogram segment used for the analysis contained only 155 points. These results are compatible with those obtained in a conventional analysis of absorbance spectra and serve to validate the viability of the interferogram-based calibration.  相似文献   

17.
18.
Variable scaling alters the covariance structure of data, affecting the outcome of multivariate analysis and calibration. Here we present a new method, variable stability (VAST) scaling, which weights each variable according to a metric of its stability. The beneficial effect of VAST scaling is demonstrated for a data set of 1H NMR spectra of urine acquired as part of a metabonomic study into the effects of unilateral nephrectomy in an animal model. The application of VAST scaling improved the class distinction and predictive power of partial least squares discriminant analysis (PLS-DA) models. The effects of other data scaling and pre-processing methods, such as orthogonal signal correction (OSC), were also tested. VAST scaling produced the most robust models in terms of class prediction, outperforming OSC in this aspect. As a result the subtle, but consistent, metabolic perturbation caused by unilateral nephrectomy could be accurately characterised despite the presence of much greater biological differences caused by normal physiological variation. VAST scaling presents itself as an interpretable, robust and easily implemented data treatment for the enhancement of multivariate data analysis.  相似文献   

19.
Using a series of thirteen organic materials that includes novel high-nitrogen energetic materials, conventional organic military explosives, and benign organic materials, we have demonstrated the importance of variable selection for maximizing residue discrimination with partial least squares discriminant analysis (PLS-DA). We built several PLS-DA models using different variable sets based on laser induced breakdown spectroscopy (LIBS) spectra of the organic residues on an aluminum substrate under an argon atmosphere. The model classification results for each sample are presented and the influence of the variables on these results is discussed. We found that using the whole spectra as the data input for the PLS-DA model gave the best results. However, variables due to the surrounding atmosphere and the substrate contribute to discrimination when the whole spectra are used, indicating this may not be the most robust model. Further iterative testing with additional validation data sets is necessary to determine the most robust model.  相似文献   

20.
Wu D  He Y  Feng S 《Analytica chimica acta》2008,610(2):232-242
In this study, short-wave near-infrared (NIR) spectroscopy at 800–1050 nm region was investigated for the analysis of main compounds in milk powder. Through quantitative analysis, the feasibility is further demonstrated for the simultaneous measurement of fat, proteins and carbohydrate in milk powder. Two models, partial least-squares and least-squares support vector machine, were compared and utilized for regression coefficients and loading weights. The affect of standard normal variate spectral pretreatment to model performance was evaluated. Based on the resulted coefficients and loading weights, interesting wavelength regions of nutrition in milk powder are screened and the assignment of all specific wavelengths is firstly proposed in the details associated with chemical base. Instead of the whole short-wave NIR spectral data, these assigned wavelengths which can be reliably exploited were used for the content determination. Compared with other spectroscopy technique, assigned short-wave NIR spectral wavelengths did a good work. Determination coefficients for prediction are 0.981, 0.984, and 0.982, respectively for three components. The proposed wavelength assignment in the short-wave NIR region could be used for the component contents determination of milk powder, and could be as a guidance to interpret the spectra of milk powder.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号