共查询到20条相似文献,搜索用时 0 毫秒
1.
Two novel algorithms which employ the idea of stacked generalization or stacked regression, stacked partial least squares (SPLS) and stacked moving‐window partial least squares (SMWPLS) are reported in the present paper. The new algorithms establish parallel, conventional PLS models based on all intervals of a set of spectra to take advantage of the information from the whole spectrum by incorporating parallel models in a way to emphasize intervals highly related to the target property. It is theoretically and experimentally illustrated that the predictive ability of these two stacked methods combining all subsets or intervals of the whole spectrum is never poorer than that of a PLS model based only on the best interval. These two stacking algorithms generate more parsimonious regression models with better predictive power than conventional PLS, and perform best when the spectral information is neither isolated to a single, small region, nor spread uniformly over the response. A simulation data set is employed in this work not only to demonstrate this improvement, but also to demonstrate that stacked regressions have the potential capability of predicting property information from an outlier spectrum in the prediction set. Moisture, oil, protein and starch in Cargill corn samples have been successfully predicted by these new algorithms, as well as hydroxyl number for different instruments of terpolymer samples including and excluding an outlier spectrum. Copyright © 2009 John Wiley & Sons, Ltd. 相似文献
2.
In developing partial least squares calibration models, selecting the number of latent variables used for their construction to minimize both model bias and model variance remains a challenge. Several metrics exist for incorporating these trade‐offs, but the cost of model parsimony and the potential for underfitting on achievable prediction errors are difficult to anticipate. We propose a metric that penalizes growing model variance against decreasing bias as additional latent variables are added. The magnitude of the penalty is scaled by a user‐defined parameter that is formulated to provide a constraint on the fractional increase in root mean square error of cross‐validation (RMSECV) when selecting a parsimonious model over the conventional minimum RMSECV solution. We evaluate this approach for quantification of four organic functional groups using 238 laboratory standards and 750 complex atmospheric organic aerosol mixtures with mid‐infrared spectroscopy. Parametric variation of this penalty demonstrates that increase in prediction errors due to underfitting is bounded by the magnitude of the penalty for samples similar to laboratory standards used for model training and validation. Imposing an ensemble of penalties corresponding to a 0–30% allowable increase in RMSECV through sum of ranking differences leads to the selection of a model that increases the actual RMSECV up to 20% for laboratory standards but achieves an 85% reduction in the mean error in predicted concentrations for environmental mixtures. Partial least squares models developed with laboratory mixtures can provide useful predictions in complex environmental samples, but may benefit from protection against overfitting. © 2015 The Authors. Journal of Chemometrics published by John Wiley & Sons Ltd. 相似文献
3.
将多模型共识偏最小二乘法用于近红外光谱定量分析。利用随机抽取的训练子集建立一系列偏最小二乘模型,选取其中性能较好的部分模型作为成员模型,用这些成员模型来预测未知样品。将该方法用于一组生物样本的近红外光谱与样品中人血清白蛋白、γ-球蛋白以及葡萄糖含量之间的建模研究,并与单模型偏最小二乘法了进行比较。结果 PLS对独立测试集中三种组分进行50次重复预测的平均RMSEP分别为0.1066,0.0853和0.1338,RMSEP的标准偏差分别为0.0174,0.0144和0.0416;而本方法重复预测的平均RMSEP分别为0.0715,0.0750和0.0781,RMSEP的标准偏差分别为0.0033,0.2729×10-4和0.0025。 相似文献
4.
偏最小二乘近红外光谱法测定瘦肉脂肪酸组成的研究 总被引:2,自引:0,他引:2
利用偏最小二乘将瘦肉的近红外光谱数据分别与其棕榈酸、棕榈油酸、硬脂酸、油酸、亚油酸含量建立校正模型,并用交互校验和外部检验来考查模型的可靠性.各脂肪酸模型的校正相关系数分别为0.9998、0.9844、0.9963、0.9754、0.9969,均方估计残差(RMSEC)分别为0.0231、0.0485、0.111、0.373、0.311,交互校验均方残差(RMSECV)分别为0.509、0.115、0.225、0.848、0.649.应用所建立的各脂肪酸近红外模型对瘦肉脂肪酸组成进行预测,并对各脂肪酸的预测值与气相色谱法测定值进行配对t-检验,结果表明两者差异均不显著(p>0.05). 相似文献
5.
Changeable size moving window partial least squares (CSMWPLS) and searching combination moving window partial least squares (SCMWPLS) are proposed to search for an optimized spectral interval and an optimized combination of spectral regions from informative regions obtained by a previously proposed spectral interval selection method, moving window partial least squares (MWPLSR) [Anal. Chem. 74 (2002) 3555]. The utilization of informative regions aims to construct better PLS models than those based on the whole spectral points. The purpose of CSMWPLS and SCMWPLS is to optimize the informative regions and their combination to further improve the prediction ability of the PLS models. The results of their application to an open-path (OP)/FT-IR spectra data set show that the proposed methods, especially SCMWPLS can find out an optimized combination, with which one can improve, often significantly, the performance of the corresponding PLS model, in terms of low prediction error, root mean square error of prediction (RMSEP) with the reasonable latent variable (LVs) number, comparing with the results obtained using whole spectra or direct combination of informative regions for a compound. Regions consisting of the combinations obtained can easily be explained by the existence of IR absorption bands in those spectral regions. 相似文献
6.
An outlier detection method is proposed for near-infrared spectral analysis. The underlying philosophy of the method is that,in random test(Monte Carlo) cross-validation,the probability of outliers presenting in good models with smaller prediction residual error sum of squares(PRESS) or in bad models with larger PRESS should be obviously different from normal samples. The method builds a large number of PLS models by using random test cross-validation at first,then the models are sorted by the PRESS,and at last the outliers are recognized according to the accumulative probability of each sample in the sorted models. For validation of the proposed method,four data sets,including three published data sets and a large data set of tobacco lamina,were investigated. The proposed method was proved to be highly efficient and veracious compared with the conventional leave-one-out(LOO) cross validation method. 相似文献
7.
Olivier Cloarec 《Journal of Chemometrics》2011,25(4):208-215
This paper presents a modified version of the NIPALS algorithm for PLS regression with one single response variable. This version, denoted a CF‐PLS, provides significant advantages over the standard PLS. First of all, it strongly reduces the over‐fit of the regression. Secondly, R2 for the null hypothesis follows a Beta distribution only function of the number of observations, which allows the use of a probabilistic framework to test the validity of a component. Thirdly, the models generated with CF‐PLS have comparable if not better prediction ability than the models fitted with NIPALS. Finally, the scores and loadings of the CF‐PLS are directly related to the R2, which makes the model and its interpretation more reliable. Copyright © 2011 John Wiley & Sons, Ltd. 相似文献
8.
Feasibility study on qualitative and quantitative analysis in tea by near infrared spectroscopy with multivariate calibration 总被引:2,自引:0,他引:2
This study attempted the feasibility to use near infrared (NIR) spectroscopy as a rapid analysis method to qualitative and quantitative assessment of the tea quality. NIR spectroscopy with soft independent modeling of class analogy (SIMCA) method was proposed to identify rapidly tea varieties in this paper. In the experiment, four tea varieties from Longjing, Biluochun, Qihong and Tieguanyin were studied. The better results were achieved following as: the identification rate equals to 90% only for Longjing in training set; 80% only for Biluochun in test set; while, the remaining equal to 100%. A partial least squares (PLS) algorithm is used to predict the content of caffeine and total polyphenols in tea. The models are calibrated by cross-validation and the best number of PLS factors was achieved according to the lowest root mean square error of cross-validation (RMSECV). The correlation coefficients and the root mean square error of prediction (RMSEP) in the test set were used as the evaluation parameters for the models as follows: R = 0.9688, RMSEP = 0.0836% for the caffeine; R = 0.9299, RMSEP = 1.1138% for total polyphenols. The overall results demonstrate that NIR spectroscopy with multivariate calibration could be successfully applied as a rapid method not only to identify the tea varieties but also to determine simultaneously some chemical compositions contents in tea. 相似文献
9.
Terezinha Ferreira de Oliveira Eugenio Kahn Epprecht Roberta Loureno Ziolli Augusto Cesar Fonseca Saraiva Roberto Ribeiro de Avillez 《Journal of Chemometrics》2008,22(2):141-148
The present work used multivariate calibration by Partial Least Squares (PLS) to produce a Net Analyte Signal as a way of establishing the independent influence of each phase in the Quantitative Phase Analysis with the Rietveld method for three sources of potential error: preferred orientation, linear absorption and counting statistics. Ternary mixtures of Al2O3, MgO and NiO were employed and organized in three groups with different degrees of variation in the weight fractions of the three constituents. An analysis of variance indicated that the partial selectivity of the least variation group differed significantly from the other groups. As for the phases, MgO partial selectivity was significantly different. This is due to a strong correlation between the linear absorption and counting statistics in the region of the (2 0 0) reflection of the MgO phase that is strongly affected by preferred orientation and also corresponds to the strongest reflection for MgO as well as for NiO. On the whole, by using matrices of similarity, a great similarity was observed between the nominal weight fractions of the phases and the weight fractions observed by means of the Rietveld method. However, such similarity diminishes as the weight fractions of the phases of the mixture become closer to each other and, in the group of mixtures with least variation of weight fractions, the method is unable to quantify the small differences between the phases, even if these errors may be considered small relative to the weight fractions themselves. Copyright © 2008 John Wiley & Sons, Ltd. 相似文献
10.
Xiangzhen Kong Weihua Zhu Zhimin Zhao Xiangyan Li Hui Wang Ran Chen Chuchu Chen Feng Zhu Xiaoying Guo 《Journal of Chemometrics》2012,26(1-2):25-33
Fluorescence spectrum, as well as the first and second derivative spectra in the region of 220–900 nm, was utilized to determine the concentration of triglyceride in human serum. Nonlinear partial least squares regression with cubic B‐spline‐function‐based nonlinear transformation was employed as the chemometric method. Window genetic algorithms partial least squares (WGAPLS) was proposed as a new wavelength selection method to find the optimized spectra wavelengths combination. Study shows that when WGAPLS is applied within the optimized regions ascertained by changeable size moving window partial least squares (CSMWPLS) or searching combination moving window partial least squares (SCMWPLS), the calibration and prediction performance of the model can be further improved at a reasonable latent variable number. SCMWPLS should start from the sub‐region found by CSMWPLS with the smallest root mean squares error of calibration (RMSEC). In addition, WGAPLS should be utilized within the region of smallest RMSEC whether it is the sub‐region found by CSMWPLS or region combination found by SCMWPLS. Moreover, the prediction ability of nonlinear models was better than the linear models significantly. The prediction performance of the three spectra was in the following order: second derivative spectrum < original spectrum < first derivative spectrum. Wavelengths within the region of 300–367 nm and 386–392 nm in the first derivative of the original fluorescence spectrum were the optimized wavelength combination for the prediction model. Copyright © 2012 John Wiley & Sons, Ltd. 相似文献
11.
A novel near infrared (NIR) modeling method—Laplacian regularized least squares regression (LapRLSR) was presented, which can take the advantage of many unlabeled spectra to promote the prediction performance of the model even if there are only few calibration samples. Using LapRLSR modeling, NIR spectral analysis was applied to the online monitoring of the concentration of salvia acid B in the column separation of Salvianolate. The results demonstrated that LapRLSR outperformed partial least squares (PLS) significantly, and NIR online analysis was applicable. 相似文献
12.
In this paper, a methodology to evaluate the probability of false non-compliance and false compliance for screening methods, which give first or second-order multivariate signals is proposed. For this task 120 samples of 6 different kinds of milk have been measured by excitation-emission fluorescence. The samples have been spiked with different amounts of three sulfonamides (sulfadiazine, sulfamerazine and sulfamethazine). These substances have been classified in group B1 (veterinary medicines and contaminants) of annex I of Directive 96/23/EC. The European Union (Commission Regulation EC no. 281/96) has set the maximum residue level (MRL) of total sulfonamides at 100 μg kg−1 in muscle, liver, kidney and milk.The work shows that excitation-emission fluorescence together with the partial least squares class modeling (PLS-CM) procedure may be a suitable and cheap screening method for the total amount of sulfonamides in milk. Three models, PLS-CM, have been built, for the emission and excitation spectra (first-order signals) and for the excitation-emission matrices (second-order signals). In all the cases it reaches probabilities of false compliance below 5% as required by Decision 2002/657/EC.With the same flourescence signals, the total quantity of sulfonamide was calibrated using 2-PLS, 3-PLS and PARAFAC regressions. Using this quantitative approach, the capability of detection, CCβ, around the MRL has been estimated between 114.3 and 115.1 μg kg−1 for a probability of false non-compliance and false compliance equal to 5%. 相似文献
13.
Pure component selectivity analysis (PCSA) was successfully utilized to enhance the robustness of a partial least squares (PLS) model by examining the selectivity of a given component to other components. The samples used in this study were composed of NH4OH, H2O2 and H2O, a popular etchant solution in the electronic industry. Corresponding near-infrared (NIR) spectra (9000-7500 cm−1) were used to build PLS models. The selective determination of H2O2 without influences from NH4OH and H2O was a key issue since its molecular structure is similar to that of H2O and NH4OH also has a hydroxyl functional group. The best spectral ranges for the determination of NH4OH and H2O2 were found with the use of moving window PLS (MW-PLS) and corresponding selectivity was examined by pure component selectivity analysis. The PLS calibration for NH4OH was free from interferences from the other components due to the presence of its unique NH absorption bands. Since the spectral variation from H2O2 was broadly overlapping and much less distinct than that from NH4OH, the selectivity and prediction performance for the H2O2 calibration were sensitively varied depending on the spectral ranges and number of factors used. PCSA, based on the comparison between regression vectors from PLS and the net analyte signal (NAS), was an effective method to prevent over-fitting of the H2O2 calibration. A robust H2O2 calibration model with minimal interferences from other components was developed. PCSA should be included as a standard method in PLS calibrations where prediction error only is the usual measure of performance. 相似文献
14.
Kernel partial least squares (KPLS) has become a popular technique for regression and classification of complex data sets, which is a nonlinear extension of linear PLS in which training samples are transformed into a feature space via a nonlinear mapping. The PLS algorithm can then be carried out in the feature space. In the present study, we attempt to develop a novel tree KPLS (TKPLS) classification algorithm by constructing an informative kernel on the basis of decision tree ensembles. The constructed tree kernel can effectively discover the similarities of samples and select informative features by variable importance ranking in the process of building the kernel. Simultaneously, TKPLS can also handle nonlinear relationships in the structure–activity relationship data by such a kernel. Finally, three data sets related to different categorical bioactivities of compounds are used to evaluate the performance of TKPLS. The results show that the TKPLS algorithm can be regarded as an alternative and promising classification technique. Copyright © 2013 John Wiley & Sons, Ltd. 相似文献
15.
Accurate prediction of the model is fundamental to the successful analysis of complex samples. To utilize abundant information embedded over frequency and time domains, a novel regression model is presented for quantitative analysis of hydrocarbon contents in the fuel oil samples. The proposed method named as high and low frequency unfolded PLSR (HLUPLSR), which integrates empirical mode decomposition (EMD) and unfolded strategy with partial least squares regression (PLSR). In the proposed method, the original signals are firstly decomposed into a finite number of intrinsic mode functions (IMFs) and a residue by EMD. Secondly, the former high frequency IMFs are summed as a high frequency matrix and the latter IMFs and residue are summed as a low frequency matrix. Finally, the two matrices are unfolded to an extended matrix in variable dimension, and then the PLSR model is built between the extended matrix and the target values. Coupled with Ultraviolet (UV) spectroscopy, HLUPLSR has been applied to determine hydrocarbon contents of light gas oil and diesel fuels samples. Comparing with single PLSR and other signal processing techniques, the proposed method shows superiority in prediction ability and better model interpretation. Therefore, HLUPLSR method provides a promising tool for quantitative analysis of complex samples. 相似文献
16.
Near-infrared (NIR) spectra in the region of 5000-4000 cm−1 with a chemometric method called searching combination moving window partial least squares (SCMWPLS) were employed to determine the concentrations of human serum albumin (HSA), γ-globulin, and glucose contained in the control serum IIB (CS IIB) solutions with various concentrations. SCMWPLS is proposed to search for the optimized combinations of informative regions, which are spectral intervals, considered containing useful information for building partial least squares (PLS) models. The informative regions can easily be found by moving window partial least squares regression (MWPLSR) method. PLS calibration models using the regions obtained by SCMWPLS were developed for HSA, γ-globulin, and glucose. These models showed good prediction with the smallest root mean square error of predictions (RMSEP), the relatively small number of PLS factors, and the highest correlation coefficients among the results achieved by using whole region and MWPLSR methods. The RMSEP values of HSA, γ-globulin, and glucose yielded by SCMWPLS were 0.0303, 0.0327, and 0.0195 g/dl, respectively. These results prove that SCMWPLS can be successfully applied to determine simultaneously the concentrations of HSA, γ-globulin, and glucose in complicated biological fluids such as CS IIB solutions by using NIR spectroscopy. 相似文献
17.
A green analytical method was developed for the analysis of sugar-based depilatories. Three independent partial least squares (PLS) regression models were built for the direct determination of glucose, fructose and maltose without any sample pretreatment based on their attenuated total reflectance - Fourier transform infrared (ATR-FTIR) spectra. The models showed adequate prediction capabilities with root-mean-square-errors of prediction ranging from 7.04 to 12.55 mg sugar g−1 sample. As a reference procedure, gradient liquid chromatography with on-line infrared detection, employing background correction based on cubic smoothing splines, was used. The analysis revealed changes in the sugar concentration due to the formulation process as compared to information on the ingredients provided by the manufacturers. Although fructose, glucose and sucrose were declared to be used for the production of depilatories, in the final products only fructose, glucose and maltose were determined. This fact was attributed to pH and temperature conditions employed during the production process as well as to the use of glucose syrup instead of crystalline glucose. The present ATR-FTIR-PLS method enables an accurate, cheap and fast determination without solvent consumption or toxic waste generation and offers therefore a green screening alternative to methods employing chromatographic techniques. 相似文献
18.
Near-infrared spectroscopy and multivariate calibration for the quantitative determination of certain properties in the petrochemical industry 总被引:3,自引:0,他引:3
Near-infrared (NIR) spectroscopy in conjunction with chemometric techniques allows on-line monitoring in real time, which can be of considerable use in industry. If it is to be correctly used in industrial applications, generally some basic considerations need to be taken into account, although this does not always apply. This study discusses some of the considerations that would help evaluate the possibility of applying multivariate calibration in combination with NIR to properties of industrial interest. Examples of these considerations are whether there is a relation between the NIR spectrum and the property of interest, what the calibration constraints are and how a sample-specific error of prediction can be quantified. Various strategies for maintaining a multivariate model after it has been installed are also presented and discussed. 相似文献
19.
Chen-Bo Cai Hong-Wei YangBo Wang Yong-Yuan TaoMei-Qiong Wen Lu Xu 《Vibrational Spectroscopy》2011,56(2):202-209
This paper demonstrates the application of near-infrared (NIR) process analysis to study gas-solid adsorption process non-invasively: its experimental setup, data treatment, and potentials as a convenient tool to investigate the gas-solid adsorption process. The experimental setup includes a differential adsorption bed (DAB) monitored by a NIR spectrometer via an optical fiber probe, which makes it convenient and reliable to construct adsorption mass-transfer models. A chemometrics strategy based on back propagation-artificial neural network (BP-ANN) and partial least squares (PLS) has been developed to treat NIR spectra collected during the adsorption process because of the obvious nonlinearity in concentration prediction. This nonlinear problem results from the great concentration variation of the adsorbate adsorbed by the adsorbent during the whole adsorption process, the extraordinarily low concentration of the adsorbed adsorbate at the beginning of the process, and probably NIR distinction between the adsorbate on the first adsorption layer at the beginning of the process and that on the other layers afterward. With the strategy, NIR spectra are pretreated with PLS for data compression and noise reduction, and then a BP-ANN is built as the nonlinear calibration model. As compared with linear calibration algorithm, our strategy has the higher predication ability for the whole adsorption process, even with less calibration samples. Finally, as an example the kinetics of aniline-silica gel adsorption process has been studied through the experimental setup and chemometrics strategy. 相似文献
20.
Beatriz lvarez‐Snchez Feliciano Priego‐Capote Juan García‐Olmo María C. Ortiz‐Fernndez Luis A. Sarabia‐Peinador María D. Luque de Castro 《Journal of Chemometrics》2013,27(9):221-232
Near‐infrared spectroscopy has been used in nutritional metabolomics fingerprinting for the assessment of the intake of intervention breakfasts prepared with four different vegetable oils that were previously subjected to a deep frying process of 20 cycles for 5 min at 180°C. The target oils were an extra virgin olive oil and three varieties of refined sunflower oil. Of the three latter, one of them was used as such, other was spiked with a synthetic oxidation inhibitor (dimethylsiloxane) and, finally, the last one was enriched with an extract of phenolic compounds from olive pomace, the antioxidant properties of which are well known. Urine sampled from individuals before intake and 2 and 4 h after intake was directly analyzed by NIRS to obtain fingerprint characteristics of the metabolome composition. The resulting urinary patterns were combined for statistical analysis by unsupervised and supervised approaches. Partial least squares‐class modeling enabled to develop class‐models for each intervention breakfast, thus achieving discrimination of urinary fingerprints from individuals after breakfast intake. The models were statistically characterized by estimation of sensitivity and specificity parameters for the training and evaluation (validation) steps. The application of variable importance in projection algorithm enabled to detect the spectral regions with higher significance to explain the variability observed in the partial least squares class‐models. Quantitative differences of variable importance in projection scores discriminated among the different classes under study. Copyright © 2013 John Wiley & Sons, Ltd. 相似文献