首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
An outlier detection method is proposed for near-infrared spectral analysis. The underlying philosophy of the method is that,in random test(Monte Carlo) cross-validation,the probability of outliers presenting in good models with smaller prediction residual error sum of squares(PRESS) or in bad models with larger PRESS should be obviously different from normal samples. The method builds a large number of PLS models by using random test cross-validation at first,then the models are sorted by the PRESS,and at last the outliers are recognized according to the accumulative probability of each sample in the sorted models. For validation of the proposed method,four data sets,including three published data sets and a large data set of tobacco lamina,were investigated. The proposed method was proved to be highly efficient and veracious compared with the conventional leave-one-out(LOO) cross validation method.  相似文献   

2.
The aim of this study is to show the usefulness of robust multiple regression techniques implemented in the expectation maximization framework in order to model successfully data containing missing elements and outlying objects. In particular, results from a comparative study of partial least squares and partial robust M-regression models implemented in the expectation maximization algorithm are presented. The performances of the proposed approaches are illustrated on simulated data with and without outliers, containing different percentages of missing elements and on a real data set. The obtained results suggest that the proposed methodology can be used for constructing satisfactory regression models in terms of their trimmed root mean squared errors.  相似文献   

3.
Five algorithms for data analysis are evaluated for their abilities to discriminate against outliers in small data sets (4–10 points). These methods included least-squares regression, the least absolute -deviation method, the least median of squares method, and two techniques based on an adaptive Kalman filter. For data sets consisting of 4–9 points with one outlier, the average errors in the estimation of the slope were found to be 18.9 % by least-squares, 17.7% by the least absolute deviation method, 0.5% by the least median of squares algorithm, 9.1% by an adaptive Kalman filter algorithm, and 0.9% by a zero-lag adaptive Kalman filter algorithm. Based on these results, the conclusion is that the zero-lag adaptive Kalman filter and the least median of squares approaches are best suited for the detection of outliers in small calibration data sets.  相似文献   

4.
提出了结合小波变换的偏最小二乘法(WPLS),即先对光谱信号进行小波变换,去除噪声,再用偏最小二乘法对多组分同时测定。将该法用于模拟体系及复方甲硝唑注射液体系,结果表明,该法优于偏最小二乘法。  相似文献   

5.
This paper is the third part of the work on robust partial least squares (RPLS) regression. The paper focuses on implementation issues for outlier detection and diagnosis. Furthermore, the paper introduces a numerically more efficient algorithm for determining the Stahel–Donoho estimator (SDE). This has been identified as a potential drawback of the new proposed RPLS algorithm, detailed in Part II of this work. Finally, a total of three application studies are presented which involve data recorded from (i) a calibration experiment (similar number of variables/observations), (ii) a distillation process for purifying benzene (considerably more observations than variables) and (iii) an experiment of a multi‐component concentration determination using Raman spectroscopy (considerably more variables than observations). Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

6.
To explore multi-way data, different methods have been proposed. Here, we study the popular PARAFAC (Parallel factor analysis) model, which expresses multi-way data in a more compact way, without ignoring the underlying complex structure. To estimate the score and loading matrices, an alternating least squares procedure is typically used. It is however well known that least squares techniques suffer from outlying observations, making the models useless when outliers are present in the data. In this paper, we present a robust PARAFAC method. Essentially, it searches for an outlier-free subset of the data, on which we can then perform the classical PARAFAC algorithm. An outlier map is constructed to identify outliers. Simulations and examples show the robustness of our approach.  相似文献   

7.
Ortiz MC  Sarabia LA  Herrero A 《Talanta》2006,70(3):499-512
The validation of an analytical procedure means the evaluation of some performance criteria such as accuracy, sensitivity, linear range, capability of detection, selectivity, calibration curve, etc. This implies the use of different statistical methodologies, some of them related with statistical regression techniques, which may be robust or not. The presence of outlier data has a significant effect on the determination of sensitivity, linear range or capability of detection amongst others, when these figures of merit are evaluated with non-robust methodologies.In this paper some of the robust methods used for calibration in analytical chemistry are reviewed: the Huber M-estimator; the Andrews, Tukey and Welsh GM-estimators; the fuzzy estimators; the constrained M-estimators, CM; the least trimmed squares, LTS. The paper also shows that the mathematical properties of the least median squares (LMS) regression can be of great interest in the detection of outlier data in chemical analysis. A comparative analysis is made of the results obtained by applying these regression methods to synthetic and real data. There is also a review of some applications where this robust regression works in a suitable and simple way that proves very useful to secure an objective detection of outliers. The use of a robust regression is recommended in ISO 5725-5.  相似文献   

8.
Two novel algorithms which employ the idea of stacked generalization or stacked regression, stacked partial least squares (SPLS) and stacked moving‐window partial least squares (SMWPLS) are reported in the present paper. The new algorithms establish parallel, conventional PLS models based on all intervals of a set of spectra to take advantage of the information from the whole spectrum by incorporating parallel models in a way to emphasize intervals highly related to the target property. It is theoretically and experimentally illustrated that the predictive ability of these two stacked methods combining all subsets or intervals of the whole spectrum is never poorer than that of a PLS model based only on the best interval. These two stacking algorithms generate more parsimonious regression models with better predictive power than conventional PLS, and perform best when the spectral information is neither isolated to a single, small region, nor spread uniformly over the response. A simulation data set is employed in this work not only to demonstrate this improvement, but also to demonstrate that stacked regressions have the potential capability of predicting property information from an outlier spectrum in the prediction set. Moisture, oil, protein and starch in Cargill corn samples have been successfully predicted by these new algorithms, as well as hydroxyl number for different instruments of terpolymer samples including and excluding an outlier spectrum. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

9.
10.
Comprehensive two‐dimensional gas chromatography and flame ionization detection combined with unfolded‐partial least squares is proposed as a simple, fast and reliable method to assess the quality of gasoline and to detect its potential adulterants. The data for the calibration set are first baseline corrected using a two‐dimensional asymmetric least squares algorithm. The number of significant partial least squares components to build the model is determined using the minimum value of root‐mean square error of leave‐one out cross validation, which was 4. In this regard, blends of gasoline with kerosene, white spirit and paint thinner as frequently used adulterants are used to make calibration samples. Appropriate statistical parameters of regression coefficient of 0.996–0.998, root‐mean square error of prediction of 0.005–0.010 and relative error of prediction of 1.54–3.82% for the calibration set show the reliability of the developed method. In addition, the developed method is externally validated with three samples in validation set (with a relative error of prediction below 10.0%). Finally, to test the applicability of the proposed strategy for the analysis of real samples, five real gasoline samples collected from gas stations are used for this purpose and the gasoline proportions were in range of 70–85%. Also, the relative standard deviations were below 8.5% for different samples in the prediction set.  相似文献   

11.
Two‐way and three‐way calibration models were applied to ultra high performance liquid chromatography with photodiode array data with coeluted peaks in the same wavelength and time regions for the simultaneous quantitation of ciprofloxacin and ornidazole in tablets. The chromatographic data cube (tensor) was obtained by recording chromatographic spectra of the standard and sample solutions containing ciprofloxacin and ornidazole with sulfadiazine as an internal standard as a function of time and wavelength. Parallel factor analysis and trilinear partial least squares were used as three‐way calibrations for the decomposition of the tensor, whereas three‐way unfolded partial least squares was applied as a two‐way calibration to the unfolded dataset obtained from the data array of ultra high performance liquid chromatography with photodiode array detection. The validity and ability of two‐way and three‐way analysis methods were tested by analyzing validation samples: synthetic mixture, interday and intraday samples, and standard addition samples. Results obtained from two‐way and three‐way calibrations were compared to those provided by traditional ultra high performance liquid chromatography. The proposed methods, parallel factor analysis, trilinear partial least squares, unfolded partial least squares, and traditional ultra high performance liquid chromatography were successfully applied to the quantitative estimation of the solid dosage form containing ciprofloxacin and ornidazole.  相似文献   

12.
13.
The presence of multicollinearity in regression data is no exception in real life examples. Instead of applying ordinary regression methods, biased regression techniques such as principal component regression and ridge regression have been developed to cope with such datasets. In this paper, we consider partial least squares (PLS) regression by means of the SIMPLS algorithm. Because the SIMPLS algorithm is based on the empirical variance-covariance matrix of the data and on least squares regression, outliers have a damaging effect on the estimates. To reduce this pernicious effect of outliers, we propose to replace the empirical variance-covariance matrix in SIMPLS by a robust covariance estimator. We derive the influence function of the resulting PLS weight vectors and the regression estimates, and conclude that they will be bounded if the robust covariance estimator has a bounded influence function. Also the breakdown value is inherited from the robust estimator. We illustrate the results using the MCD estimator and the reweighted MCD estimator (RMCD) for low-dimensional datasets. Also some empirical properties are provided for a high-dimensional dataset.  相似文献   

14.
针对独立软模式类簇法(SIMCA)在确定主成分数和决策区间时遇到的困难,提出了一种基于PLSR的类模型方法——PLS类模型方法(PLSCM)。通过把类描述问题转化为常见的PLSR问题,采用成熟的蒙特卡罗交互验证法确定模型的隐变量数和决策区间。采用本方法对不同牛黄样品的近红外光谱数据(波长范围4000~9000 cm-1)进行分析,可成功鉴别牛黄的真伪。本方法的可操作性和鉴别准确率均优于经典的SIMCA方法。对于原始光谱数据,PLSCM的训练和预测准确率均为100%,对于经SNV处理的数据,训练和预测准确率分别为99%和100%。  相似文献   

15.
In this study, complex substances such as Mint (Mentha haplocalyx Briq.) samples from different growing regions in China were analyzed for phenolic compounds by high‐performance liquid chromatography with diode array detection and for the volatile aroma compounds by gas chromatography with mass spectrometry. Chemometrics methods, e.g. principal component analysis, back‐propagation artificial neural networks, and partial least squares discriminant analysis, were applied to resolve complex chromatographic profiles of Mint samples. A total of 49 aroma components and 23 phenolic compounds were identified in 79 Mint samples. Principal component analysis score plots from gas chromatography with mass spectrometry and high‐performance liquid chromatography with diode array detection data sets showed a clear distinction among Mint from three different regions in China. Classification results showed that satisfactory performance of prediction ability for back‐propagation artificial neural networks and partial least squares discriminant analysis. The major compounds that contributed to the discrimination were chlorogenic acid, unknown 3, kaempherol 7‐O‐rutinoside, salvianolic acid L, hesperidin, diosmetin, unknown 6 and pebrellin in Mint according to regression coefficients of the partial least squares discriminant analysis model. This study indicated that the proposed strategy could provide a simple and rapid technique to distinguish clearly complex profiles from samples such as Mint.  相似文献   

16.
Changeable size moving window partial least squares (CSMWPLS) and searching combination moving window partial least squares (SCMWPLS) are proposed to search for an optimized spectral interval and an optimized combination of spectral regions from informative regions obtained by a previously proposed spectral interval selection method, moving window partial least squares (MWPLSR) [Anal. Chem. 74 (2002) 3555]. The utilization of informative regions aims to construct better PLS models than those based on the whole spectral points. The purpose of CSMWPLS and SCMWPLS is to optimize the informative regions and their combination to further improve the prediction ability of the PLS models. The results of their application to an open-path (OP)/FT-IR spectra data set show that the proposed methods, especially SCMWPLS can find out an optimized combination, with which one can improve, often significantly, the performance of the corresponding PLS model, in terms of low prediction error, root mean square error of prediction (RMSEP) with the reasonable latent variable (LVs) number, comparing with the results obtained using whole spectra or direct combination of informative regions for a compound. Regions consisting of the combinations obtained can easily be explained by the existence of IR absorption bands in those spectral regions.  相似文献   

17.
武晓莉  李艳君  吴铁军 《分析化学》2006,34(8):1091-1095
为提高水质参数总有机碳(TOC)的紫外吸收光谱分析的预测精度,提出一种基于Boosting理论的迭代式回归建模算法,并根据统计学习理论提出一种新的迭代停止判据,可有效防止过拟合,显著提高模型预测精度。为评估所提算法的性能,分别采用本算法和3种常用的光谱分析方法,即偏最小二乘、主成分回归和人工神经网络,对自行研制的紫外光谱水质分析仪实测的一组数据进行了建模和预测。计算结果表明:相对于其他3种方法,本算法具有生成的模型预测精度高的显著优势。  相似文献   

18.
The work summarized in this paper presents the first part of a three‐paper series on robust partial least squares (RPLS) regression. Motivated by recent research activities in this area, this part provides a detailed algorithmic analysis of associated techniques, showing that existing work (i) may not represent a true robust formulation of partial least squares (PLS), (ii) may lead to convergence problems or (iii) may be insensitive to a certain type of outlier. On the basis of this analysis, Part I introduces a new conceptual RPLS algorithm that overcomes the deficiencies of existing work. The second part of this work details this new RPLS technique, compares its peformance with existing RPLS methods and provides an analysis on the computational efficiency and sensitivity of these algorithms. Whilst the first two parts of this work discuss algorithmic developments of RPLS, the final part concentrates on practical issues of RPLS implementations. This third part is devoted to practitioners of chemistry and chemical engineering covering a wide range of applications involving a calibration experiment, the analysis of recorded data from an industrial debutanizer process and data from a number of Raman spectroscopy experiments. Copyright © 2007 John Wiley & Sons, Ltd.  相似文献   

19.
分析了各种工艺的重油样品近300种,通过各样品的薄层色谱图提取了谱图识别的特征变量,实现了由TLC/FID谱图识别样品类型的目的,并采用偏最小二乘方法将谱图数据特征与洗脱色谱法(eluting chromatography,简称EC^[2]法)进行了关联,从而用于重油样品的烃族组成分析,预测结果与EC法相符。  相似文献   

20.
A fast, simple and costless methodology without sample pre-treatment is proposed for the discrimination of beers. It is based on cyclic voltammetry (CV) using commercial carbon screen-printed electrodes (SPCE) and includes a correction of the signals measured with different SPCE units. Data are submitted to partial least squares discriminant analysis (PLS−DA) and support vector machine discriminant analysis (SVM−DA), which allow a reasonable classification of the beers. Also, CV data from beers can be used to predict their alcoholic degree by partial least squares (PLS) and artificial neural networks (ANN). In general, non-linear methods provide better results than linear ones.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号