首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Outlier detection is crucial in building a highly predictive model. In this study, we proposed an enhanced Monte Carlo outlier detection method by establishing cross‐prediction models based on determinate normal samples and analyzing the distribution of prediction errors individually for dubious samples. One simulated and three real datasets were used to illustrate and validate the performance of our method, and the results indicated that this method outperformed Monte Carlo outlier detection in outlier diagnosis. After these outliers were removed, the value of validation by Kovats retention indices and the root mean square error of prediction decreased from 3.195 to 1.655, and the average cross‐validation prediction error decreased from 2.0341 to 1.2780. This method helps establish a good model by eliminating outliers. © 2015 Wiley Periodicals, Inc.  相似文献   

2.
A novel outlier detection method in partial least squares based on random sample consensus is proposed. The proposed algorithm repeatedly generates partial least squares solutions estimated from random samples and then tests each solution for the support from the complete dataset for consistency. A comparative study of the proposed method and leave-one-out cross validation in outlier detection on simulated data and near-infrared data of pharmaceutical tablets is presented. In addition, a comparison between the proposed method and PLS, RSIMPLS, PRM is provided. The obtained results demonstrate that the proposed method is highly efficient.  相似文献   

3.
The number of latent variables (LVs) or the factor number is a key parameter in PLS modeling to obtain a correct prediction. Although lots of work have been done on this issue, it is still a difficult task to determine a suitable LV number in practical uses. A method named independent factor diagnostics (IFD) is proposed for investigation of the contribution of each LV to the predicted results on the basis of discussion about the determination of LV number in PLS modeling for near infrared (NIR) spectra of complex samples. The NIR spectra of three data sets of complex samples, including a public data set and two tobacco lamina ones, are investigated. It is shown that several high order LVs constitute main contributions to the predicted results, albeit the contribution of the low order LVs should not be neglected in the PLS models. Therefore, in practical uses of PLS for analysis of complex samples, it may be better to use a slightly large LV number for NIR spectral analysis of complex samples. Supported by the National Natural Science Foundation of China (Grant Nos. 20775036 & 20835002)  相似文献   

4.
Two-dimensional correlation spectroscopy (2DCOS) and near-infrared spectroscopy (NIRS) were used to determine the polyphenol content in oat grain. A partial least squares (PLS) algorithm was used to perform the calibration. A total of 116 representative oat samples from four locations in China were prepared and the corresponding near-infrared spectra were measured. Two-dimensional correlation spectroscopy was employed to select wavelength bands for the PLS regression model for the polyphenol determination. The number of PLS components and intervals was optimized according to the coefficients of determination (R2) and root mean square error of cross validation (RMSECV) in the calibration set. The performance of the final model was evaluated using the correlation coefficient (R) and the root mean square error of validation (RMSEV) in the prediction set. The results showed the band corresponding to the optimal calibration model was between 1350 and 1848?nm and the optimal spectral preprocessing combination was second derivative with second smoothing. The optimal regression model was obtained with an R2 of 0.8954 and an RMSECV of 0.06651 in the calibration set and R of 0.9614 and RMSEV of 0.04573 in the prediction set. These measurements reveal the calibration model had qualified predictive accuracy. The results demonstrated that the 2DCOS with PLS was a simple and rapid method for the quantitative determination of polyphenols in oats.  相似文献   

5.
A novel ensemble-based feature selection method was developed which is designated as ensemble partial least squares regression coeffientents (EPRC). It was composed of two steps: generating a series of different single feature selectors and aggregating them to reach a consensus. Specifically, the bootstrap resampling approach was used to generate a diversity of single feature selectors, and the absolute values of the regression coefficients of the partial least squares (PLS) model were used to rank the features. Next, these feature rankings out of single feature selectors were aggregated by the weighted-sum approach. Finally, coupled with the regression model, the features selected by EPRC were evaluated through cross validation and an independent test set. By experiments of constructing the spectroscopy analysis model on three near infrared spectroscopy (NIRS) datasets, it was shown that the EPRC located key wavelengths, gave a promotion to regression performance, and was more stable and interpretable to the domain experts.  相似文献   

6.
Fluorescence spectrum, as well as the first and second derivative spectra in the region of 220–900 nm, was utilized to determine the concentration of triglyceride in human serum. Nonlinear partial least squares regression with cubic B‐spline‐function‐based nonlinear transformation was employed as the chemometric method. Window genetic algorithms partial least squares (WGAPLS) was proposed as a new wavelength selection method to find the optimized spectra wavelengths combination. Study shows that when WGAPLS is applied within the optimized regions ascertained by changeable size moving window partial least squares (CSMWPLS) or searching combination moving window partial least squares (SCMWPLS), the calibration and prediction performance of the model can be further improved at a reasonable latent variable number. SCMWPLS should start from the sub‐region found by CSMWPLS with the smallest root mean squares error of calibration (RMSEC). In addition, WGAPLS should be utilized within the region of smallest RMSEC whether it is the sub‐region found by CSMWPLS or region combination found by SCMWPLS. Moreover, the prediction ability of nonlinear models was better than the linear models significantly. The prediction performance of the three spectra was in the following order: second derivative spectrum < original spectrum < first derivative spectrum. Wavelengths within the region of 300–367 nm and 386–392 nm in the first derivative of the original fluorescence spectrum were the optimized wavelength combination for the prediction model. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

7.
Hui Chen  Zan Lin  Tong Wu 《Analytical letters》2018,51(17):2695-2707
Textile products must be marked by fabric type and composition on the label and cotton is by far the most important fiber in the industry and often needs fast quantitative analysis. The corresponding standard methods are very time-consuming and labor-intensive. The work focuses on exploring the feasibility of combining near-infrared (NIR) spectroscopy and interval-based partial least squares (iPLS) for determining cotton content in textiles. Three types of partial least square (PLS)-based algorithms were used for experimental measurements. A total of 91 cloth samples with cotton content ranging from 0 to 100% (w/w) were collected and all compositions are commercially available on the market in China. In all cases, the original spectrum axis was split into 20 subintervals. As a result, three final models, i.e., the iPLS model on a single subinterval, the backward interval partial least squares (biPLS) model on the region remaining six subintervals, and the moving window partial least squares (mwPLS) model with a window of 75 variables, achieved better results than the full-spectrum PLS model. Also, no obvious differences in performance were observed for the three models. Thus, either iPLS or mwPLS was preferred considering their simplicity, which suggested that iPLS and mwPLS combined with NIR technique may have potential for the rapid determination of the cotton content of textile products with comparable accuracy to standard procedures. In addition, this approach may have commercial and regulatory advantages that avoid labor-intensive and time-consuming chemical analysis.  相似文献   

8.
Han QJ  Wu HL  Cai CB  Xu L  Yu RQ 《Analytica chimica acta》2008,612(2):121-125
An improved method based on an ensemble of Monte Carlo uninformative variable elimination (EMCUVE) is presented for wavelength selection in multivariate calibration of spectral data. The proposed algorithm introduces Monte Carlo (MC) strategy to uninformative variable elimination-PLS (UVE-PLS) instead of leave-one-out strategy for estimating the contributions of each wavelength variable in the PLS model. In EMCUVE wavelength variables are evaluated by different Monte Carlo uninformative variable elimination (MCUVE) models. Moreover, a fusion of MCUVE and the vote rule can obtain an improvement over the original uninformative variable elimination method. Results obtained from simulated data and real data sets demonstrate that EMCUVE can properly carry out wavelength selection in the course of data analysis and improve predictive ability for multivariate calibration model.  相似文献   

9.
An algorithm is proposed for extracting relevant information from near-infrared (NIR) spectra for multivariate calibration of routine components in complex plant samples. The algorithm is a combination of wavelet transform (WT) data compression and a procedure for uninformative variable elimination (UVE). After compression of the NIR spectra by WT, the UVE approach is used to eliminate the irrelevant wavelet coefficients. Finally, a calibration model is built from the retained wavelet coefficients to enable prediction. Because irrelevant information can be removed from the spectra used for multivariate calibration, the model based on the extracted relevant features is better than those obtained with full-spectrum data. Both prediction precision and calculation speed are improved.  相似文献   

10.
Hydroxyl (OH) number of polyol was measured using near-infrared (NIR) spectroscopy with the use of a disposable glass vial as a sample container. Polyols are viscous, so disposable vials are advantageous when spectroscopic methods are employed. Due to the curvature of the vial walls, a narrow aperture was used to minimize the spectroscopic deviations. The narrow aperture attenuated the NIR radiation and increased the spectral noise in the collected polyol spectra. Wavelet transformation (WT) was employed to reduce this noise and partial least squares (PLS) calibration model was developed. The overall prediction results compare well with those from conventional wet analysis that requires time (1–3 h) and large amounts of chemical reagents. NIR spectroscopy with the use of disposable vials can be utilized for a simple and fast quality assurance of polyol in actual industrial settings.  相似文献   

11.
短波近红外光谱法对蛇床子SFE萃取产物的定量分析   总被引:1,自引:0,他引:1  
郭晔  曲楠  王彬  任玉林 《分析试验室》2007,26(11):49-52
利用中药蛇床子CO2超临界萃取(SFE)的萃取物的短波近红外漫反射光谱(800~1100 nm),以HPLC分析值作参比值,采用化学计量学中的偏最小二乘法(PLS)建立短波近红外漫反射光谱与蛇床子SFE萃取物中主要成分蛇床子素和欧前胡素间定量分析数学模型.实现了快速、无损的测定双组分中药的有效成分.讨论了光谱的预处理方法和主成分数对PLS定量预测蛇床子萃取物中蛇床子素和欧前胡素含量能力的影响,并对预测集样品进行预测.  相似文献   

12.
This paper presents a modified version of the NIPALS algorithm for PLS regression with one single response variable. This version, denoted a CF‐PLS, provides significant advantages over the standard PLS. First of all, it strongly reduces the over‐fit of the regression. Secondly, R2 for the null hypothesis follows a Beta distribution only function of the number of observations, which allows the use of a probabilistic framework to test the validity of a component. Thirdly, the models generated with CF‐PLS have comparable if not better prediction ability than the models fitted with NIPALS. Finally, the scores and loadings of the CF‐PLS are directly related to the R2, which makes the model and its interpretation more reliable. Copyright © 2011 John Wiley & Sons, Ltd.  相似文献   

13.
Two alternative partial least squares (PLS) methods, averaged PLS and weighted average PLS, are proposed and compared with the classical PLS in terms of root mean square error of prediction (RMSEP) for three real data sets. These methods compute the (weighted) average of PLS models with different complexity. The prediction abilities of the alternative methods are comparable to that of the classical PLS but they do not require to determine how many components should be included in the model. They are also more robust in the sense that the quality of prediction depends less on a good choice of the number of components to be included. In addition, weighted average PLS is also compared with the weighted average part of LOCAL, a published method that also applies weighted average PLS, with however an entirely different weighting scheme.  相似文献   

14.
M.T. Bona 《Talanta》2007,72(4):1423-1431
An extensive study was carried out in coal samples coming from several origins trying to establish a relationship between nine coal properties (moisture (%), ash (%), volatile matter (%), fixed carbon (%), heating value (kcal/kg), carbon (%), hydrogen (%), nitrogen (%) and sulphur (%)) and the corresponding near-infrared spectral data. This research was developed by applying both quantitative (partial least squares regression, PLS) and qualitative multivariate analysis techniques (hierarchical cluster analysis, HCA; linear discriminant analysis, LDA), to determine a methodology able to estimate property values for a new coal sample. For that, it was necessary to define homogeneous clusters, whose calibration equations could be obtained with accuracy and precision levels comparable to those provided by commercial online analysers and, study the discrimination level between these groups of samples attending only to the instrumental variables. These two steps were performed in three different situations depending on the variables used for the pattern recognition: property values, spectral data (principal component analysis, PCA) or a combination of both. The results indicated that it was the last situation what offered the best results in both two steps previously described, with the added benefit of outlier detection and removal.  相似文献   

15.
由于校正集样本的质量决定校正模型的质量,校正集中奇异样本的检测在多元校正建模中具有非常重要的意义.本研究建立了一种用于近红外光谱多元校正建模时校正集中奇异样本的检测方法.本方法基于奇异样本的定义和偏最小二乘方法的原理,通过考察每个校正集样本在模型的每个因子(或主成分)中对模型的贡献,将与多数样本表现不同的样本识别为奇异样本.采用218个橘汁样本构成的近红外光谱数据进行了分析,结果表明,校正集中存在6个奇异样本,扣除奇异样本后,校正集的交叉验证均方根误差由16.870减小为4.809,预测集的均方根误差从3.688减小为3.332.  相似文献   

16.
建立使用近红外光谱法(NIR)快速测定溶剂型木器涂料稀释剂中甲苯、乙苯、对二甲苯、间二甲苯和邻二甲苯等苯系物含量方法。收集涂料稀释剂样品,使用气相色谱法(GC)测定苯系物含量,并采集其近红外光谱信息,采用偏最小二乘法(PLS)建立NIR光谱与苯系物含量的线性关系模型。苯系物校正均方差(RMSEC)在(0.47~1.40)%之间、相关系数(R2)在0.956~0.988之间;预测均方差(RMSEP)在(0.73~2.32)%之间、相关系数(R2)在0.951~0.986之间。NIR模型预测效果良好,定量方法快速、简单、准确,可在检测涂料的有毒有害物质中推广应用。  相似文献   

17.
以普通玉米籽粒为试验材料,在应用遗传算法结合偏最小二乘回归法对近红外光谱数据进行特征波长选择的基础上,应用偏最小二乘回归法建立了特征波长测定玉米籽粒中淀粉含量的校正模型.试验结果表明,基于11个特征波长所建立的校正模型,其校正误差(RMSEC)、交叉检验误差(RMSECV)和预测误差(RMSEP)分别为0.30%、0.35%和0.27%,校正数据集和独立的检验数据集的预测值与实际测定值之间的相关系数分别达到0.9279和0.9390,与全光谱数据所建立的预测模型相比,在预测精度上均有所改善,表明应用遗传算法和PLS进行光谱特征选择,能获得更简单和更好的模型,为玉米籽粒中淀粉含量的近红外测定和红外光谱数据的处理提供了新的方法与途径.  相似文献   

18.
基于近红外漫反射光谱技术,利用偏最小二乘多元校正方法建立了复方磺胺甲噁唑片中的两个有效成分磺胺甲噁唑(SMZ)和甲氧苄啶(TMP)含量的快速同时测定方法。对于SMZ和TMP定量分析模型,相关系数分别为99.969%与99.938%,校正集残差分别为0.217与0.159,而预测根均方差分别为0.310和0.418。该方法具有简单、快捷、两组分同时准确测定以及样品不经任何预处理等特点。  相似文献   

19.
Global sensitivity analysis with the Monte Carlo method is applied to the Boltzmann equation for the electron energy distribution function (eedf). The results show the sensitivity of eedf and related quantities on the global variation of cross sections set. A new indicator of global sensitivity is used, which appears to be of easy applicability.  相似文献   

20.
Changeable size moving window partial least squares (CSMWPLS) and searching combination moving window partial least squares (SCMWPLS) are proposed to search for an optimized spectral interval and an optimized combination of spectral regions from informative regions obtained by a previously proposed spectral interval selection method, moving window partial least squares (MWPLSR) [Anal. Chem. 74 (2002) 3555]. The utilization of informative regions aims to construct better PLS models than those based on the whole spectral points. The purpose of CSMWPLS and SCMWPLS is to optimize the informative regions and their combination to further improve the prediction ability of the PLS models. The results of their application to an open-path (OP)/FT-IR spectra data set show that the proposed methods, especially SCMWPLS can find out an optimized combination, with which one can improve, often significantly, the performance of the corresponding PLS model, in terms of low prediction error, root mean square error of prediction (RMSEP) with the reasonable latent variable (LVs) number, comparing with the results obtained using whole spectra or direct combination of informative regions for a compound. Regions consisting of the combinations obtained can easily be explained by the existence of IR absorption bands in those spectral regions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号