共查询到20条相似文献,搜索用时 0 毫秒
1.
Ling Gao Shouxin Ren 《Spectrochimica acta. Part A, Molecular and biomolecular spectroscopy》2009,73(5):960-965
A novel method named OSC-WPT-PLS approach based on partial least squares (PLS) regression with orthogonal signal correction (OSC) and wavelet packet transform (WPT) as pre-processed tools was proposed for the simultaneous spectrophotometric determination of Al(III), Mn(II) and Co(II). This method combines the ideas of OSC and WPT with PLS regression for enhancing the ability of extracting characteristic information and the quality of regression. OSC is used to remove information in the response matrix D by subtracting the structured noise that is orthogonal to the concentration matrix C. Wavelet packet transform was applied to perform data compression, to extract relevant information, and to eliminate noise and collinearity. PLS was applied for multivariate calibration and noise reduction by eliminating the less important latent variables. In this case, using trials, the kind of wavelet function, the decomposition level, the number of OSC components and the number of PLS factors for the OSC-WPT-PLS method were selected as Daubechies 4, 3, 2 and 3, respectively. A program (POSCWPTPLS) was designed to perform the simultaneous spectrophotometric determination of Al(III), Mn(II) and Co(II). The relative standard errors of prediction (RSEP) obtained for total elements using OSC-WPT-PLS, WPT-PLS and PLS were compared. Experimental results demonstrated that the OSC-WPT-PLS method had the best performance among the three methods and was successful even when there was severe overlap of spectra. 相似文献
2.
Yiming Bi Qiong Xie Silong Peng Liang Tang Yong Hu Jie Tan Yuhui Zhao Changwen Li 《Analytica chimica acta》2013
A new ensemble learning algorithm is presented for quantitative analysis of near-infrared spectra. The algorithm contains two steps of stacked regression and Partial Least Squares (PLS), termed Dual Stacked Partial Least Squares (DSPLS) algorithm. First, several sub-models were generated from the whole calibration set. The inner-stack step was implemented on sub-intervals of the spectrum. Then the outer-stack step was used to combine these sub-models. Several combination rules of the outer-stack step were analyzed for the proposed DSPLS algorithm. In addition, a novel selective weighting rule was also involved to select a subset of all available sub-models. Experiments on two public near-infrared datasets demonstrate that the proposed DSPLS with selective weighting rule provided superior prediction performance and outperformed the conventional PLS algorithm. Compared with the single model, the new ensemble model can provide more robust prediction result and can be considered an alternative choice for quantitative analytical applications. 相似文献
3.
Spectrophotometric simultaneous determination of nitroaniline isomers by orthogonal signal correction-partial least squares 总被引:1,自引:0,他引:1
The simultaneous determination of nitroaniline isomer mixtures by using spectrophotometric methods is a difficult problem in analytical chemistry, due to spectral interferences. By multivariate calibration methods, such as partial least squares (PLS), it is possible to obtain a model adjusted to the concentration values of the mixtures used in the calibration range. Orthogonal signal correction (OSC) is a preprocessing technique used for removes the information unrelated to the target variables based on constrained principal component analysis. OSC is a suitable preprocessing method for partial least squares calibration of mixtures without loss of prediction capacity using spectrophotometric method. In this study, the calibration model is based on absorption spectra in the 200–500 nm range for 21 different mixtures of nitroaniline isomers. Calibration matrices were containing 1–21, 1–15 and 1–18 μg ml−1 of m-nitroaniline, o-nitroaniline and p-nitroaniline, respectively. The RMSEP for m-nitroaniline, o-nitroaniline and p-nitroaniline with OSC and without OSC were 0.6567, 0.2692, and 0.3134, and 1.3818, 1.2181, and 0.3953, respectively. This procedure allows the simultaneous determination of nitroaniline isomers in real matrix samples and good reliability of the determination was proved. 相似文献
4.
Two alternative partial least squares (PLS) methods, averaged PLS and weighted average PLS, are proposed and compared with the classical PLS in terms of root mean square error of prediction (RMSEP) for three real data sets. These methods compute the (weighted) average of PLS models with different complexity. The prediction abilities of the alternative methods are comparable to that of the classical PLS but they do not require to determine how many components should be included in the model. They are also more robust in the sense that the quality of prediction depends less on a good choice of the number of components to be included. In addition, weighted average PLS is also compared with the weighted average part of LOCAL, a published method that also applies weighted average PLS, with however an entirely different weighting scheme. 相似文献
5.
Two novel algorithms which employ the idea of stacked generalization or stacked regression, stacked partial least squares (SPLS) and stacked moving‐window partial least squares (SMWPLS) are reported in the present paper. The new algorithms establish parallel, conventional PLS models based on all intervals of a set of spectra to take advantage of the information from the whole spectrum by incorporating parallel models in a way to emphasize intervals highly related to the target property. It is theoretically and experimentally illustrated that the predictive ability of these two stacked methods combining all subsets or intervals of the whole spectrum is never poorer than that of a PLS model based only on the best interval. These two stacking algorithms generate more parsimonious regression models with better predictive power than conventional PLS, and perform best when the spectral information is neither isolated to a single, small region, nor spread uniformly over the response. A simulation data set is employed in this work not only to demonstrate this improvement, but also to demonstrate that stacked regressions have the potential capability of predicting property information from an outlier spectrum in the prediction set. Moisture, oil, protein and starch in Cargill corn samples have been successfully predicted by these new algorithms, as well as hydroxyl number for different instruments of terpolymer samples including and excluding an outlier spectrum. Copyright © 2009 John Wiley & Sons, Ltd. 相似文献
6.
Changeable size moving window partial least squares (CSMWPLS) and searching combination moving window partial least squares (SCMWPLS) are proposed to search for an optimized spectral interval and an optimized combination of spectral regions from informative regions obtained by a previously proposed spectral interval selection method, moving window partial least squares (MWPLSR) [Anal. Chem. 74 (2002) 3555]. The utilization of informative regions aims to construct better PLS models than those based on the whole spectral points. The purpose of CSMWPLS and SCMWPLS is to optimize the informative regions and their combination to further improve the prediction ability of the PLS models. The results of their application to an open-path (OP)/FT-IR spectra data set show that the proposed methods, especially SCMWPLS can find out an optimized combination, with which one can improve, often significantly, the performance of the corresponding PLS model, in terms of low prediction error, root mean square error of prediction (RMSEP) with the reasonable latent variable (LVs) number, comparing with the results obtained using whole spectra or direct combination of informative regions for a compound. Regions consisting of the combinations obtained can easily be explained by the existence of IR absorption bands in those spectral regions. 相似文献
7.
Near-infrared spectroscopy(NIR),which is generally used for online monitoring of the food analysis and production process, was applied to determine the internal quality of toothpaste samples.It is acknowledged that the spectra can be significantly influenced by non-linearities introduced by light scatter,therefore,four data preprocessing methods,including off-set correction, 1st-derivative,standard normal variate(SNV) and multiplicative scatter correction(MSC),were employed before the date analysis. The multivariate calibration model of partial least squares(PLS) was established and then was used to predict the pH values of the toothpaste samples of different brand.The results showed that the spectral date processed by MSC was the best one for predicting the pH value of the toothpaste samples. 相似文献
8.
The performance of back-propagation artificial neural networks (NN) and partial least squares (PLS) regression for the calibration of linear and nonlinear systems has been investigated by using six types of synthetic data. Three PLS methods, conventional linear-PLS and two nonlinear-PLS methods, have been used in the study. In all but one of the synthetic data types, the band intensities varied nonlinearly with concentration. These five data types were designed to represent the effect of band shifts with increasing concentration, a nonlinear relationship between peak height and concentration, or a combination of both types of nonlinearities. The results showed that NNs perform better than PLS for all the nonlinear datasets. When a band shift is the major reason for the nonlinearity, the relative performance of NNs and PLS depends on the overlap of the absorption bands. If there is no band overlap, neither NN nor PLS can calibrate the data accurately but the results could be improved by convolving the spectral features with a Gaussian broadening function. The results indicate that a combination of peak position shift and peak height change is the most difficult nonlinearity to calibrate. NN and PLS were also used to determine the concentration of CHCl3 in pure component and mixtures of CHCl3 and CH2Cl2 using their Fourier transform infrared (FT-IR) spectra, a dataset that has been proved nonlinear in high concentrations due to the nonlinear response of the detector. The best results for the experimental data were obtained by applying one hidden layer NN to the mean-centered absorbance spectra. 相似文献
9.
Selecting the correct dimensionality is critical for obtaining partial least squares (PLS) regression models with good predictive ability. Although calibration and validation sets are best established using experimental designs, industrial laboratories cannot afford such an approach. Typically, samples are collected in an (formally) undesigned way, spread over time and their measurements are included in routine measurement processes. This makes it hard to evaluate PLS model dimensionality. In this paper, classical criteria (leave-one-out cross-validation and adjusted Wold's criterion) are compared to recently proposed alternatives (smoothed PLS-PoLiSh and a randomization test) to seek out the optimum dimensionality of PLS models. Kerosene (jet fuel) samples were measured by attenuated total reflectance-mid-IR spectrometry and their spectra where used to predict eight important properties determined using reference methods that are time-consuming and prone to analytical errors. The alternative methods were shown to give reliable dimensionality predictions when compared to external validation. By contrast, the simpler methods seemed to be largely affected by the largest changes in the modeling capabilities of the first components. 相似文献
10.
将多模型共识偏最小二乘法用于近红外光谱定量分析。利用随机抽取的训练子集建立一系列偏最小二乘模型,选取其中性能较好的部分模型作为成员模型,用这些成员模型来预测未知样品。将该方法用于一组生物样本的近红外光谱与样品中人血清白蛋白、γ-球蛋白以及葡萄糖含量之间的建模研究,并与单模型偏最小二乘法了进行比较。结果 PLS对独立测试集中三种组分进行50次重复预测的平均RMSEP分别为0.1066,0.0853和0.1338,RMSEP的标准偏差分别为0.0174,0.0144和0.0416;而本方法重复预测的平均RMSEP分别为0.0715,0.0750和0.0781,RMSEP的标准偏差分别为0.0033,0.2729×10-4和0.0025。 相似文献
11.
The work summarized in this paper presents the first part of a three‐paper series on robust partial least squares (RPLS) regression. Motivated by recent research activities in this area, this part provides a detailed algorithmic analysis of associated techniques, showing that existing work (i) may not represent a true robust formulation of partial least squares (PLS), (ii) may lead to convergence problems or (iii) may be insensitive to a certain type of outlier. On the basis of this analysis, Part I introduces a new conceptual RPLS algorithm that overcomes the deficiencies of existing work. The second part of this work details this new RPLS technique, compares its peformance with existing RPLS methods and provides an analysis on the computational efficiency and sensitivity of these algorithms. Whilst the first two parts of this work discuss algorithmic developments of RPLS, the final part concentrates on practical issues of RPLS implementations. This third part is devoted to practitioners of chemistry and chemical engineering covering a wide range of applications involving a calibration experiment, the analysis of recorded data from an industrial debutanizer process and data from a number of Raman spectroscopy experiments. Copyright © 2007 John Wiley & Sons, Ltd. 相似文献
12.
13.
Support vector machine (SVM) algorithms are a popular class of techniques to perform classification. However, outliers in the data can result in bad global misclassification percentages. In this paper, we propose a method to identify such outliers in the SVM framework. A specific robust classification algorithm is proposed adjusting the least squares SVM (LS‐SVM). This yields better classification performance for heavily tailed data and data containing outliers. Copyright © 2009 John Wiley & Sons, Ltd. 相似文献
14.
With an increasing number of publicly available microarray datasets, it becomes attractive to borrow information from other relevant studies to have more reliable and powerful analysis of a given dataset. We do not assume that subjects in the current study and other relevant studies are drawn from the same population as assumed by meta-analysis. In particular, the set of parameters in the current study may be different from that of the other studies. We consider sample classification based on gene expression profiles in this context. We propose two new methods, a weighted partial least squares (WPLS) method and a weighted penalized partial least squares (WPPLS) method, to build a classifier by a combined use of multiple datasets. The methods can weight the individual datasets depending on their relevance to the current study. A more standard approach is first to build a classifier using each of the individual datasets, then to combine the outputs of the multiple classifiers using a weighted voting. Using two quite different datasets on human heart failure, we show first that WPLS/WPPLS, by borrowing information from the other dataset, can improve the performance of PLS/PPLS built on only a single dataset. Second, WPLS/WPPLS performs better than the standard approach of combining multiple classifiers. Third, WPPLS can improve over WPLS, just as PPLS does over PLS for a single dataset. 相似文献
15.
An error analysis of predicted values using spectral correction matrix and partial least squares (PLS) modeling is applied for the determination of Zn2+ and Pb2+ with methylthymol blue (MTB) as a metallochromic indicator. The concentration ranges for Pb2+ and Zn2+ in standard solution sets are 0.5-5.2 and 0.1-2.5 μg ml−1, respectively. The experimental calibration set was composed of 20 sample solutions using a random design for two component mixtures. The absorption spectra were recorded from 400 to 700 nm. The two wavelengths, which exert the minimum error in prediction of two metal ion concentrations, are chosen according to an error analysis of different pairs of wavelengths. The effect of the pH on the sensitivity in determination of Zn2+ and Pb2+ using MTB was studied in order to choose the optimum pH (pH=6) for determination. The values of root mean square difference (RMSD) for lead and zinc using β-correction partial least squares were 0.0977 and 0.1266, respectively. The effect of diverse ions and several experimental parameters were studied. The method was used for the determination of lead and zinc in alloy samples. 相似文献
16.
17.
Rosilene S. Nascimento Nilton O.C. e Silva Denise B.C. Mendes José Bento B. Silva 《Talanta》2010,80(3):1102-1109
In this study we compared the use of ordinary least squares and weighted least squares in the calibration of the method for analyzing essential and toxic metals present in human milk by ICP-OES, in order to avoid systematic errors in the measurements used. Human milk samples were provided by maternity clinic Odete Valadares and digested by means of a high-performance microwave (MW) oven. Evaluation of plasma short and long-term stability was made using a solution of digested milk (1:50) with 2.0 mg L−1 Mg in HNO3 2% (v/v). The detection power resulted to be at or below the μg L−1 level, whilst the precision expressed as relative standard deviation R.S.D. was almost always equal to or better than 3.3%. Certified reference material Infant Formula (NIST SRM 1846) was used to assess the accuracy of the proposed method, which proved to be accurate and precise. Recovery rates were in the range of 83-117%. Aqueous calibration was carried out for each element under study. 相似文献
18.
将偏最小二乘法(PLS)用于同步荧光光谱严重重叠的多柔比星(doxorubicin, DOX)和柔红霉素(daunorubicin, DNR)两组分混合体系进行波谱解析, 建立了该混合体系含量同时测定的新方法. 在pH 3.45 B-R缓冲溶液中, 波长差Δλ=55 nm时, 用测得的25个混合标样的同步荧光原始光谱、一阶导数光谱值建立模型. DOX和DNR在质量浓度为0.05~3.0 μg/mL范围内呈现良好的线性关系, 所建立的测定二者模型的相关系数分别为0.9897和0.9909; 平均回收率分别为101.0%和101.4%; 预测均方根误差(RMSEP)分别为0.1400和0.1395; 预测相对标准误差(SEP)分别为0.1541和0.1525. 该方法可应用于尿液样品的分析测定. 相似文献
19.
Multi-way partial least squares modeling of water quality data 总被引:1,自引:0,他引:1
A 10 years surface water quality data set pertaining to a polluted river was analyzed using partial least squares (PLS) regression models. Both the unfold-PLS and N-PLS (tri-PLS and quadri-PLS) models were calibrated through leave-one out cross-validation method. These were applied to the multivariate, multi-way data array with a view to assess and compare their predictive capabilities for biochemical oxygen demand (BOD) of river water in terms of their relative mean squares error of cross-validation, prediction and variance captured. The sum of squares of residuals and leverages were computed and analyzed to identify the sites, variables, years and months which may have influence on the constructed model. Both the tri- and quadri-PLS models yielded relatively low validation error as compared to unfold-PLS and captured high variance in model. Moreover, both of these methods produced acceptable model precision and accuracy. In case of tri-PLS the root mean squares errors were 1.65 and 2.17 for calibration and prediction, respectively; whereas these were 2.58 and 1.09 for quadri-PLS. At a preliminary level it seems that BOD can be predicted but a different data arrangement is needed. Moreover, analysis of the scores and loadings plots of the N-PLS models could provide information on time evolution of the river water quality. 相似文献
20.
New approach for chemometrics algorithm named region orthogonal signal correction (ROSC) has been introduced to improve the predictive ability of PLS models for biomedical components in blood serum developed from their NIR spectra in the 1280-1849 nm region. Firstly, a moving window partial least squares regression (MWPLSR) method was employed to locate the region due to water as a region of interference signals and to find the informative regions of glucose, albumin, cholesterol and triglyceride from NIR spectra of bovine serum samples. Next, a novel chemometrics method named searching combination moving window partial least squares (SCMWPLS) was used to optimize those informative regions. Then, the specific regions that contained the information of water, glucose, albumin, cholesterol and triglyceride were obtained. When an interested component in the bovine serum solution, such as glucose, albumin, cholesterol or triglyceride is being an analyte, the other three interests and water are considered as the interference factors. Thus, new approach for ROSC has employed for each specific region of interference signal to calculate the orthogonal components to the concentrations of analyte that were removed specifically from the NIR spectra of bovine serum in the region of 1280-1849 nm and the highest interference signal for model of analyte will be revealed. The comparison of PLS results for glucose, albumin, cholesterol and triglyceride built by using the whole region of original spectra and those developed by using the optimized regions suggested by SCMWPLS of original spectra, spectra treated OSC for orthogonal components of 1-3 and spectra treated ROSC using selected removing the highest interference signals from the spectra for orthogonal components of 1-3 are reported. It has been found that new approach of ROSC to remove the highest interference signal located by SCMWPLS improves of the performance of PLS modeling, yielding the lower RMSECV and smaller number of PLS factors. 相似文献