首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The paper reports a direct method for the determination of pyridine in water and wastewater samples based on ultraviolet spectrophotometric measurements using multi-way modeling techniques. Parallel factor analysis (PARAFAC) and multi-way partial least squares (N-PLS) regression methods were employed for the decomposition of spectra and quantification of pyridine. The study was carried out in the pH range of 1.0-12.0 and concentration range of 0.67-51.7 μg mL−1 of pyridine. Both the three-way PARAFAC and tri-PLS1 models successfully predicted the concentration of pyridine in synthetic (spiked) river water and field wastewater samples. The mean recovery obtained from PARAFAC regression model were 97.39% for the spiked and 99.84% for the field wastewater samples, respectively. The sensitivity and precision of the method for pyridine determination were 0.58% and 5.95%, respectively. The N-PLS regression model yielded mean recoveries of 99.29% and 100.18% for the spiked and field wastewater samples, respectively. The prediction accuracy of the methods was evaluated through the root mean square error of prediction (RMSEP). For PARAFAC, it was 0.65 and 0.82 μg mL−1 for spiked river water and field wastewater samples, respectively, while for N-PLS, it was 0.25 and 0.37 μg mL−1, respectively. Both the PARAFAC and N-PLS methods, thus, yielded satisfactory results for the prediction of pyridine concentration in water and wastewater samples.  相似文献   

2.
Two alternative partial least squares (PLS) methods, averaged PLS and weighted average PLS, are proposed and compared with the classical PLS in terms of root mean square error of prediction (RMSEP) for three real data sets. These methods compute the (weighted) average of PLS models with different complexity. The prediction abilities of the alternative methods are comparable to that of the classical PLS but they do not require to determine how many components should be included in the model. They are also more robust in the sense that the quality of prediction depends less on a good choice of the number of components to be included. In addition, weighted average PLS is also compared with the weighted average part of LOCAL, a published method that also applies weighted average PLS, with however an entirely different weighting scheme.  相似文献   

3.
An outlier detection method is proposed for near-infrared spectral analysis. The underlying philosophy of the method is that,in random test(Monte Carlo) cross-validation,the probability of outliers presenting in good models with smaller prediction residual error sum of squares(PRESS) or in bad models with larger PRESS should be obviously different from normal samples. The method builds a large number of PLS models by using random test cross-validation at first,then the models are sorted by the PRESS,and at last the outliers are recognized according to the accumulative probability of each sample in the sorted models. For validation of the proposed method,four data sets,including three published data sets and a large data set of tobacco lamina,were investigated. The proposed method was proved to be highly efficient and veracious compared with the conventional leave-one-out(LOO) cross validation method.  相似文献   

4.
偏最小二乘近红外光谱法测定瘦肉脂肪酸组成的研究   总被引:2,自引:0,他引:2  
利用偏最小二乘将瘦肉的近红外光谱数据分别与其棕榈酸、棕榈油酸、硬脂酸、油酸、亚油酸含量建立校正模型,并用交互校验和外部检验来考查模型的可靠性.各脂肪酸模型的校正相关系数分别为0.9998、0.9844、0.9963、0.9754、0.9969,均方估计残差(RMSEC)分别为0.0231、0.0485、0.111、0.373、0.311,交互校验均方残差(RMSECV)分别为0.509、0.115、0.225、0.848、0.649.应用所建立的各脂肪酸近红外模型对瘦肉脂肪酸组成进行预测,并对各脂肪酸的预测值与气相色谱法测定值进行配对t-检验,结果表明两者差异均不显著(p>0.05).  相似文献   

5.
A fast, non-destructive and eco-friendly method was developed to simultaneously determine the oil and water contents of soybean based on low field nuclear magnetic resonance(LF-NMR) relaxometry combined with chemometrics, such as partial least squares regression(PLSR). The Carr-Purcell-Meiboom-Gill(CPMG) magnetization decay data of ten soybean samples were acquired by LF-NMR and directly applied to the PLSR analysis. Calibration models were established via PLSR with full cross-validation based on the reference values obtained by the Soxhlet extraction method for measuring oil and oven-drying method for measuring water. The results indicate that the calibration models are satisfactory for both oil and water determinations; the root mean squared errors of cross-validation(RMSECV) for oil and water are 0.2285% and 0.0178%, respectively. Furthermore, the oil and water contents in unknown soybean samples were predicted by the PLSR models and the results were compared with the reference values. The relative errors of the predicted oil and water contents were in ranges of 1.25%-4.96% and 0.44%-2.49%, respectively. These results demonstrate that the combination of LF-NMR relaxometry with chemometrics shows great potential for the simultaneous determination of contents of oil and water in soybean with high accuracy.  相似文献   

6.
Changeable size moving window partial least squares (CSMWPLS) and searching combination moving window partial least squares (SCMWPLS) are proposed to search for an optimized spectral interval and an optimized combination of spectral regions from informative regions obtained by a previously proposed spectral interval selection method, moving window partial least squares (MWPLSR) [Anal. Chem. 74 (2002) 3555]. The utilization of informative regions aims to construct better PLS models than those based on the whole spectral points. The purpose of CSMWPLS and SCMWPLS is to optimize the informative regions and their combination to further improve the prediction ability of the PLS models. The results of their application to an open-path (OP)/FT-IR spectra data set show that the proposed methods, especially SCMWPLS can find out an optimized combination, with which one can improve, often significantly, the performance of the corresponding PLS model, in terms of low prediction error, root mean square error of prediction (RMSEP) with the reasonable latent variable (LVs) number, comparing with the results obtained using whole spectra or direct combination of informative regions for a compound. Regions consisting of the combinations obtained can easily be explained by the existence of IR absorption bands in those spectral regions.  相似文献   

7.
A large data set pertaining to water quality of an alluvial river was analyzed using multi-way data analysis methods with a view to extract the hidden information, spatial and temporal variation trends in the river water quality. Four-way data (8 monitoring sites × 22 water quality variables × 10 monitoring years × 12 sampling months) analysis was performed using PARAFAC and Tucker3 models. A two component PARAFAC model, although explained 35.1% of the data variance, could not fit to the data set. Tucker3 model of optimum complexity (2,3,1,3) explaining 39.7% of the data variance, allowed interpretation of the data information in four modes. The model explained spatial and temporal variation trends in terms of water quality variables during the study period and revealed that sampling sites in mid-stretch of the river were dominated mainly by the variables of anthropogenic origin. The results delineated the mid stretch of the river as critical from pollution point of view and also identified summer months as having high influence on river water quality in this stretch. The information regarding spatial and temporal variations in water quality generated by the four-way modeling of data would be useful in developing long-term water resources management strategies in the river basin.  相似文献   

8.
Fluorescence spectrum, as well as the first and second derivative spectra in the region of 220–900 nm, was utilized to determine the concentration of triglyceride in human serum. Nonlinear partial least squares regression with cubic B‐spline‐function‐based nonlinear transformation was employed as the chemometric method. Window genetic algorithms partial least squares (WGAPLS) was proposed as a new wavelength selection method to find the optimized spectra wavelengths combination. Study shows that when WGAPLS is applied within the optimized regions ascertained by changeable size moving window partial least squares (CSMWPLS) or searching combination moving window partial least squares (SCMWPLS), the calibration and prediction performance of the model can be further improved at a reasonable latent variable number. SCMWPLS should start from the sub‐region found by CSMWPLS with the smallest root mean squares error of calibration (RMSEC). In addition, WGAPLS should be utilized within the region of smallest RMSEC whether it is the sub‐region found by CSMWPLS or region combination found by SCMWPLS. Moreover, the prediction ability of nonlinear models was better than the linear models significantly. The prediction performance of the three spectra was in the following order: second derivative spectrum < original spectrum < first derivative spectrum. Wavelengths within the region of 300–367 nm and 386–392 nm in the first derivative of the original fluorescence spectrum were the optimized wavelength combination for the prediction model. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

9.
The paper describes linear and nonlinear modeling of the wastewater data for the performance evaluation of an up-flow anaerobic sludge blanket (UASB) reactor based wastewater treatment plant (WWTP). Partial least squares regression (PLSR), multivariate polynomial regression (MPR) and artificial neural networks (ANNs) modeling methods were applied to predict the levels of biochemical oxygen demand (BOD) and chemical oxygen demand (COD) in the UASB reactor effluents using four input variables measured weekly in the influent wastewater during the peak (morning and evening) and non-peak (noon) hours over a period of 48 weeks. The performance of the models was assessed through the root mean squared error (RMSE), relative error of prediction in percentage (REP), the bias, the standard error of prediction (SEP), the coefficient of determination (R2), the Nash-Sutcliffe coefficient of efficiency (Ef), and the accuracy factor (Af), computed from the measured and model predicted values of the dependent variables (BOD, COD) in the WWTP effluents. Goodness of the model fit to the data was also evaluated through the relationship between the residuals and the model predicted values of BOD and COD. Although, the model predicted values of BOD and COD by all the three modeling approaches (PLSR, MPR, ANN) were in good agreement with their respective measured values in the WWTP effluents, the nonlinear models (MPR, ANNs) performed relatively better than the linear ones. These models can be used as a tool for the performance evaluation of the WWTPs.  相似文献   

10.
In developing partial least squares calibration models, selecting the number of latent variables used for their construction to minimize both model bias and model variance remains a challenge. Several metrics exist for incorporating these trade‐offs, but the cost of model parsimony and the potential for underfitting on achievable prediction errors are difficult to anticipate. We propose a metric that penalizes growing model variance against decreasing bias as additional latent variables are added. The magnitude of the penalty is scaled by a user‐defined parameter that is formulated to provide a constraint on the fractional increase in root mean square error of cross‐validation (RMSECV) when selecting a parsimonious model over the conventional minimum RMSECV solution. We evaluate this approach for quantification of four organic functional groups using 238 laboratory standards and 750 complex atmospheric organic aerosol mixtures with mid‐infrared spectroscopy. Parametric variation of this penalty demonstrates that increase in prediction errors due to underfitting is bounded by the magnitude of the penalty for samples similar to laboratory standards used for model training and validation. Imposing an ensemble of penalties corresponding to a 0–30% allowable increase in RMSECV through sum of ranking differences leads to the selection of a model that increases the actual RMSECV up to 20% for laboratory standards but achieves an 85% reduction in the mean error in predicted concentrations for environmental mixtures. Partial least squares models developed with laboratory mixtures can provide useful predictions in complex environmental samples, but may benefit from protection against overfitting. © 2015 The Authors. Journal of Chemometrics published by John Wiley & Sons Ltd.  相似文献   

11.
This study attempted the feasibility to use near infrared (NIR) spectroscopy as a rapid analysis method to qualitative and quantitative assessment of the tea quality. NIR spectroscopy with soft independent modeling of class analogy (SIMCA) method was proposed to identify rapidly tea varieties in this paper. In the experiment, four tea varieties from Longjing, Biluochun, Qihong and Tieguanyin were studied. The better results were achieved following as: the identification rate equals to 90% only for Longjing in training set; 80% only for Biluochun in test set; while, the remaining equal to 100%. A partial least squares (PLS) algorithm is used to predict the content of caffeine and total polyphenols in tea. The models are calibrated by cross-validation and the best number of PLS factors was achieved according to the lowest root mean square error of cross-validation (RMSECV). The correlation coefficients and the root mean square error of prediction (RMSEP) in the test set were used as the evaluation parameters for the models as follows: R = 0.9688, RMSEP = 0.0836% for the caffeine; R = 0.9299, RMSEP = 1.1138% for total polyphenols. The overall results demonstrate that NIR spectroscopy with multivariate calibration could be successfully applied as a rapid method not only to identify the tea varieties but also to determine simultaneously some chemical compositions contents in tea.  相似文献   

12.
Most multivariate calibration methods require selection of tuning parameters, such as partial least squares (PLS) or the Tikhonov regularization variant ridge regression (RR). Tuning parameter values determine the direction and magnitude of respective model vectors thereby setting the resultant predication abilities of the model vectors. Simultaneously, tuning parameter values establish the corresponding bias/variance and the underlying selectivity/sensitivity tradeoffs. Selection of the final tuning parameter is often accomplished through some form of cross-validation and the resultant root mean square error of cross-validation (RMSECV) values are evaluated. However, selection of a “good” tuning parameter with this one model evaluation merit is almost impossible. Including additional model merits assists tuning parameter selection to provide better balanced models as well as allowing for a reasonable comparison between calibration methods. Using multiple merits requires decisions to be made on how to combine and weight the merits into an information criterion. An abundance of options are possible. Presented in this paper is the sum of ranking differences (SRD) to ensemble a collection of model evaluation merits varying across tuning parameters. It is shown that the SRD consensus ranking of model tuning parameters allows automatic selection of the final model, or a collection of models if so desired. Essentially, the user’s preference for the degree of balance between bias and variance ultimately decides the merits used in SRD and hence, the tuning parameter values ranked lowest by SRD for automatic selection. The SRD process is also shown to allow simultaneous comparison of different calibration methods for a particular data set in conjunction with tuning parameter selection. Because SRD evaluates consistency across multiple merits, decisions on how to combine and weight merits are avoided. To demonstrate the utility of SRD, a near infrared spectral data set and a quantitative structure activity relationship (QSAR) data set are evaluated using PLS and RR.  相似文献   

13.
Automotive fuel adulteration is an old and significant problem. One common type of fuel adulteration is the addition of diesel to gasoline. Unsupervised models were developed through hierarchical cluster and principal component analysis models. Supervised models through partial least square discriminant analysis using 1H nuclear magnetic resonance spectra as the input were used to classify samples as adulterated or unadulterated. Quantitative models were developed using partial least squares to determine the gasoline and diesel concentrations in the samples. This set contained samples composed of pure gasoline and anhydrous ethanol reproducing commercial gasoline and other samples treated with diesel. Hierarchical cluster and principal component analysis did not distinguish between adulterated and unadulterated samples except for the most adulterated materials. However, partial least square discriminant analysis classified 100% of the samples correctly. The partial least square algorithm provided excellent regression models for the gasoline and diesel content. The determination coefficient was 0.9920 for both models, whereas the root mean square error of cross-validation and root mean square error of prediction for the diesel model were 2.32 and 1.42%, respectively, and 2.40 and 1.38% for the gasoline model.  相似文献   

14.
Support vector machines in water quality management   总被引:1,自引:0,他引:1  
Support vector classification (SVC) and regression (SVR) models were constructed and applied to the surface water quality data to optimize the monitoring program. The data set comprised of 1500 water samples representing 10 different sites monitored for 15 years. The objectives of the study were to classify the sampling sites (spatial) and months (temporal) to group the similar ones in terms of water quality with a view to reduce their number; and to develop a suitable SVR model for predicting the biochemical oxygen demand (BOD) of water using a set of variables. The spatial and temporal SVC models rendered grouping of 10 monitoring sites and 12 sampling months into the clusters of 3 each with misclassification rates of 12.39% and 17.61% in training, 17.70% and 26.38% in validation, and 14.86% and 31.41% in test sets, respectively. The SVR model predicted water BOD values in training, validation, and test sets with reasonably high correlation (0.952, 0.909, and 0.907) with the measured values, and low root mean squared errors of 1.53, 1.44, and 1.32, respectively. The values of the performance criteria parameters suggested for the adequacy of the constructed models and their good predictive capabilities. The SVC model achieved a data reduction of 92.5% for redesigning the future monitoring program and the SVR model provided a tool for the prediction of the water BOD using set of a few measurable variables. The performance of the nonlinear models (SVM, KDA, KPLS) was comparable and these performed relatively better than the corresponding linear methods (DA, PLS) of classification and regression modeling.  相似文献   

15.
《Analytical letters》2012,45(15):2388-2399
There is a high demand for rapid determination of fipronil in pesticide preparations because it has been restricted and even prohibited in many countries. An infrared-based methodology was developed for this analyte in acetamiprid formulations by attenuated total reflectance mid-infrared spectroscopy. The quantitative calibration models of fipronil were established by partial least squares regression. The determination coefficients (R2) of the model were above 0.99 while both the root mean square error of prediction and root mean square error of calibration were below 0.0011, which showed the partial least squares model accurately predicted fipronil concentrations in acetamiprid. The accuracy was further demonstrated by comparison with another two models' results of low (<1.0%, w/w) and high concentration sample sets (1.0%–4.5%, w/w). These results demonstrate the potential of infrared spectroscopy to quickly detect fipronil in acetamiprid.  相似文献   

16.
Simultaneous anodic stripping voltammetric determination of Pb and Cd is restricted on gold electrodes as a result of the overlapping of these two peaks. This work describes the quantitative determination of a binary mixture system of Pb and Cd, at low concentration levels (up to 15.0 and 10.0 µg L?1 for Pb and Cd, respectively) by differential pulse anodic stripping voltammetry (DPASV; deposition time of 30 s), using a green electrode (vibrating gold microwire electrode) without purging in a chloride medium (0.5 M NaCl) under moderate acidic conditions (HCl 1.0 mM), assisted by chemometric tools. The application of multivariate curve resolution alternating least squares (MCR‐ALS) for the resolution and quantification of both metals is shown. The optimized MCR‐ALS models showed good prediction ability with concentration prediction errors of 12.4 and 11.4 % for Pb and Cd, respectively. The quantitative results obtained by MCR‐ALS were compared to those obtained with partial least squares (PLS) and classical least squares (CLS) regression methods. For both metals, PLS and MCR‐ALS results are comparable and superior to CLS. For Cd, as a result of the peak shift problem, the application of CLS was unsuitable. MCR‐ALS provides additional advantage compared to PLS since it estimates the pure response of the analytes signal. Finally, the built up multivariate calibration models, based either in MCR‐ALS or PLS regression, allowed to quantify concentrations of Pb and Cd in surface river water samples, with satisfactory results.  相似文献   

17.
Models such as ordinary least squares, independent component analysis, principle component analysis, partial least squares, and artificial neural networks can be found in the calibration literature. Linear or nonlinear methods can be used to explain the structure of the same phenomenon. Each type of model has its own advantages with respect to the other. These methods are usually grouped taxonomically, but different models can sometimes be applied to the same data set. Taxonomically, ordinary least square and artificial neural network use completely different analytical procedures but are occasionally applied to the same data set. The aim of the study of methodological superiority is to compare the residuals of models because the model with the minimum error is preferred in real analyses. Calibration models, in general, are based on deterministic and stochastic parts; in other words, the data are equal to the model + the error. Explaining a model solely using statistics such as the coefficient of determination or its related significance values is sometimes inadequate. The errors of a model, also called its residuals, must have minimum variance compared to its alternatives. Additionally, the residuals must be unpredictable, uncorrelated, and symmetric. Under these conditions, the model can be considered adequate. In this study, calibration methods were applied to the raw materials, hydrochlorothiazide and amiloride hydrochloride, of a drug, as well as a sample of the drug tablet. The applied chemical procedure was fast, simple, and reproducible. The various linear and nonlinear calibration methods mentioned above were applied, and the adequacy of the calibration methods was compared according to their residuals.  相似文献   

18.
Near infrared (NIR) reflectance and Raman spectrometry were compared for determination of the oil and water content of olive pomace, a by-product in olive oil production. To enable comparison of the spectral techniques the same sample sets were used for calibration (1.74–3.93% oil, 48.3–67.0% water) and for validation (1.77–3.74% oil, 50.0–64.5% water). Several partial least squares (PLS) regression models were optimized by cross-validation with cancellation groups, including different spectral pretreatments for each technique. Best models were achieved with first-derivative spectra for both oil and water content. Prediction results for an independent validation set were similar for both techniques. The values of root mean square error of prediction (RMSEP) were 0.19 and 0.20–0.21 for oil content and 2.0 and 1.8 for water content, using Raman and NIR, respectively. The possibility of improving these results by combining the information of both techniques was also tested. The best models constructed using the appended spectra resulted in slightly better performance for oil content (RMSEP 0.17) but no improvement for water content.  相似文献   

19.
20.
We introduce a new nonlinear partial least squares algorithm ‘Quadratic Fuzzy PLS (QFPLS)’ that combines the outer linear Partial Least Squares (PLS) framework and the Takagi–Sugeno–Kang (TSK) fuzzy inference system. The inner relation between the input and the output PLS score vectors is modeled by a quadratic TSK fuzzy inference system. The performance of the proposed QFPLS method is tested and compared against four other well‐known partial least squares methods (Linear PLS (LPLS), Quadratic PLS (QPLS), Linear Fuzzy PLS (LFPLS), and Neural Network PLS (NNPLS)) on various different types of randomly generated test data. QFPLS outperformed competitors based on two comparison measures: the output variables cumulative per cent variance captured by the PLS latent variables and the root mean‐square error of prediction (RMSEP). Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号