首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Glycerol monolaurate (GML) products contain many impurities, such as lauric acid and glucerol. The GML content is an important quality indicator for GML production. A hybrid variable selection algorithm, which is a combination of wavelet transform (WT) technology and modified uninformative variable eliminate (MUVE) method, was proposed to extract useful information from Fourier transform infrared (FT-IR) transmission spectroscopy for the determination of GML content. FT-IR spectra data were compressed by WT first; the irrelevant variables in the compressed wavelet coefficients were eliminated by MUVE. In the MUVE process, simulated annealing (SA) algorithm was employed to search the optimal cutoff threshold. After the WT-MUVE process, variables for the calibration model were reduced from 7366 to 163. Finally, the retained variables were employed as inputs of partial least squares (PLS) model to build the calibration model. For the prediction set, the correlation coefficient (r) of 0.9910 and root mean square error of prediction (RMSEP) of 4.8617 were obtained. The prediction result was better than the PLS model with full-spectra data. It was indicated that proposed WT-MUVE method could not only make the prediction more accurate, but also make the calibration model more parsimonious. Furthermore, the reconstructed spectra represented the projection of the selected wavelet coefficients into the original domain, affording the chemical interpretation of the predicted results. It is concluded that the FT-IR transmission spectroscopy technique with the proposed method is promising for the fast detection of GML content.  相似文献   

2.
A new cut-off criterion has been proposed for the selection of uninformative variables prior to chemometric partial least squares (PLS) modelling. After variable elimination, PLS regressions were made and assessed comparing the results with those obtained by PLS models based on the full spectral range. To assess the prediction capabilities, uninformative variable elimination (UVE)-PLS and PLS were applied to diffuse reflectance near-infrared spectra of heroin samples. The application of the proposed new cut-off criterion, based on the t-Students distribution, provided similar predictive capabilities of the PLS models than those obtained using the original criteria based on quantile value. However, the repeatability of the number of selected variables was improved significantly.  相似文献   

3.
Han QJ  Wu HL  Cai CB  Xu L  Yu RQ 《Analytica chimica acta》2008,612(2):121-125
An improved method based on an ensemble of Monte Carlo uninformative variable elimination (EMCUVE) is presented for wavelength selection in multivariate calibration of spectral data. The proposed algorithm introduces Monte Carlo (MC) strategy to uninformative variable elimination-PLS (UVE-PLS) instead of leave-one-out strategy for estimating the contributions of each wavelength variable in the PLS model. In EMCUVE wavelength variables are evaluated by different Monte Carlo uninformative variable elimination (MCUVE) models. Moreover, a fusion of MCUVE and the vote rule can obtain an improvement over the original uninformative variable elimination method. Results obtained from simulated data and real data sets demonstrate that EMCUVE can properly carry out wavelength selection in the course of data analysis and improve predictive ability for multivariate calibration model.  相似文献   

4.
The paper focuses on solving a common and important problem of NIR quantitative analysis in multi-component systems: how to significantly reduce the size of the calibration set while not impairing the predictive precision. To cope with the problem orthogonal discrete wavelet packet transform (WPT), the least correlation design and correlation coefficient test (r-test) have been combined together. As three examples, a two-component carbon tetrachloride system with 21 calibration samples, a two-component aqueous system with 21 calibration samples, and a two-component aqueous system with 41 calibration samples have been treated with the proposed strategy, respectively. In comparison with some previous methods based on much more calibration samples, the results out of the strategy showed that the predictive ability was not obviously decreased for the first system while being clearly strengthened for the second one, and the predictive precision out of the third one was even satisfactory enough for most cases of quantitative analysis. In addition, all important factors and parameters related to our strategy are discussed in detail.  相似文献   

5.
Chen-Bo Cai 《Talanta》2008,77(2):822-826
Through randomly arranging samples of a calibration set, treating their NIR spectra with orthogonal discrete wavelet transform, and selecting suitable variables in terms of correlation coefficient test (r-test), it is possible to extract features of each component in a multi-component system respectively and partial least squares (PLS) models based on these features are capable of predicting the concentration of every component. What is perhaps more important, with the proposed strategy, the predictive ability of the model is at least not impaired while the size of the calibration set can be obviously reduced. Therefore, it provides a more economical, rapid, as well as convenient approach of NIR quantitative analysis for multi-component system. In addition, all important factors and parameters related to the proposed strategy are discussed in detail.  相似文献   

6.
This paper proposes an analytical method for simultaneous near-infrared (NIR) spectrometric determination of α-linolenic and linoleic acid in eight types of edible vegetable oils and their blending. For this purpose, a combination of spectral wavelength selection by wavelet transform (WT) and elimination of uninformative variables (UVE) was proposed to obtain simple partial least square (PLS) models based on a small subset of wavelengths. WT was firstly utilized to compress full NIR spectra which contain 1413 redundant variables, and 42 wavelet approximate coefficients were obtained. UVE was then carried out to further select the informative variables. Finally, 27 and 19 wavelet approximate coefficients were selected by UVE for α-linolenic and linoleic acid, respectively. The selected variables were used as inputs of PLS model. Due to original spectra were compressed, and irrelevant variables were eliminated, more parsimonious and efficient model based on WT-UVE was obtained compared with the conventional PLS model with full spectra data. The coefficient of determination (r2) and root mean square error prediction set (RMSEP) for prediction set were 0.9345 and 0.0123 for α-linolenic acid prediction by WT-UVE-PLS model. The r2 and RMSEP were 0.9054, 0.0437 for linoleic acid prediction. The good performance showed a potential application using WT-UVE to select NIR effective variables. WT-UVE can both speed up the calculation and improve the predicted results. The results indicated that it was feasible to fast determine α-linolenic acid and linoleic acid content in edible oils using NIR spectroscopy.  相似文献   

7.
Sample selection is often used to improve the cost-effectiveness of near-infrared (NIR) spectral analysis. When raw NIR spectra are used, however, it is not easy to select appropriate samples, because of background interference and noise. In this paper, a novel adaptive strategy based on selection of representative NIR spectra in the continuous wavelet transform (CWT) domain is described. After pretreatment with the CWT, an extension of the Kennard–Stone (EKS) algorithm was used to adaptively select the most representative NIR spectra, which were then submitted to expensive chemical measurement and multivariate calibration. With the samples selected, a PLS model was finally built for prediction. It is of great interest to find that selection of representative samples in the CWT domain, rather than raw spectra, not only effectively eliminates background interference and noise but also further reduces the number of samples required for a good calibration, resulting in a high-quality regression model that is similar to the model obtained by use of all the samples. The results indicate that the proposed method can effectively enhance the cost-effectiveness of NIR spectral analysis. The strategy proposed here can also be applied to different analytical data for multivariate calibration.  相似文献   

8.
A new hybrid algorithm is proposed for construction of a high-quality calibration model for near-infrared (NIR) spectra that is robust against both spectral interference (including background and noise) and multiple outliers. The algorithm is a combination of continuous wavelet transform (CWT) and a modified iterative reweighted PLS (mIRPLS) procedure. In the proposed algorithm the spectral interference is filtered by CWT at the first stage then mIRPLS is proposed to detect the multiple outliers in the CWT domain. Compared with the original IRPLS method, mIRPLS does not need to adjust variable parameters to achieve optimum calibration results, which makes it very convenient to perform in practice. The final PLS model is constructed robustly because both the spectral interference and multiple outliers are eliminated. In order to validate the effectiveness and universality of the algorithm, it was applied to two different sets of NIR spectra. The results indicate that the proposed strategy can greatly enhance the robustness and predictive ability of NIR spectral analysis.  相似文献   

9.
Hydroxyl (OH) number of polyol was measured using near-infrared (NIR) spectroscopy with the use of a disposable glass vial as a sample container. Polyols are viscous, so disposable vials are advantageous when spectroscopic methods are employed. Due to the curvature of the vial walls, a narrow aperture was used to minimize the spectroscopic deviations. The narrow aperture attenuated the NIR radiation and increased the spectral noise in the collected polyol spectra. Wavelet transformation (WT) was employed to reduce this noise and partial least squares (PLS) calibration model was developed. The overall prediction results compare well with those from conventional wet analysis that requires time (1–3 h) and large amounts of chemical reagents. NIR spectroscopy with the use of disposable vials can be utilized for a simple and fast quality assurance of polyol in actual industrial settings.  相似文献   

10.
《Analytical letters》2012,45(11):1707-1719
A method based on piecewise direct standardization was developed to directly predict leaf chlorophyll concentrations by correction of near-infrared spectra to construct a robust calibration model. Chinar, camphor, and gingko leaves collected from two growth intervals were evaluated. Spectral pretreatment methods and wavelength selection were investigated. The first derivative combined with stability competitive adaptive reweighted sampling before piecewise direct standardization provided the best performance. Under the optimized parameters, the root mean square error of prediction was significantly reduced by using piecewise direct standardization. This study demonstrates that the calibration model may be used to rapidly characterize chlorophyll concentrations across species and growth intervals.  相似文献   

11.
Based on a so-called ensemble strategy, an algorithm is proposed for near-infrared (NIR) spectral calibration of complex beverage samples. This algorithm is a combination of a novel training set/test set sample-selection procedure based on a Kohonen self-organizing map (SOM) with a simple procedure to calculate an average partial least-squares (PLS) calibration model, which is therefore named SOMEPLS. In order to verify the proposed SOMEPLS, two NIR beverage datasets involving the determination of sugar content are considered, and three kinds of reference algorithm, i.e., conventional PLS (CPLS), the Kennard-Stone (KS) algorithm in combination with PLS (KSPLS), and sample set partitioning based on the joint x-y distance (SPXY) algorithm in combination with PLS (SPXYPLS), are used. Of these, both KS and SPXY are well-known representative sample-selection algorithms. By comparison, it was found that when there is a training set of appropriate size, SOMEPLS can achieve better prediction accuracy than the three reference algorithms, but without increasing the complexity of the corresponding calibration model for the future application, indicating that SOMEPLS can serve as a promising tool for NIR spectral calibration.  相似文献   

12.
Understanding the thermal stability of the proteins in human serum is essential since human serum is the important source of pharmaceutical proteins. Near-infrared (NIR) spectroscopy was applied to the investigation of thermal changes in secondary structure and hydration of human serum proteins. However, as a multicomponent system, the overlap of the broad NIR bands makes the structural analysis very difficult directly using the spectra of serum samples. Therefore, continuous wavelet transform (CWT) was used to improve the resolution of NIR spectra, and Monte Carlo-uninformative variable elimination (MC-UVE) method was applied to the selection of the variables associated with the proteins for the structural analysis. The variables (5956, 5867, 5815, 5747, 4525, 4401, 4359 and 4328 cm-1) related to protein secondary structures and those (7074, 6951, 6827 and 6700 cm-1) connected with water species were selected. Then, the thermal stability was analyzed through the intensity variations of the selected variables with temperature from 30 ℃ to 80 ℃. It was found that the variation of the spectral variables related to both α-helix and β-sheetchanges apparently around 60 ℃, indicating the beginning of the thermal denaturation and the transition from α-helix to β-sheet. Moreover, an obvious change was found around 60 ℃ for the content of the water specie S3, i.e., the water cluster containing three hydrogen bonds. The result demonstrates that MC-UVE can identify the protein-related NIR spectral variables, and the water species may be a marker for investigation of the structural change of proteins in biochemical systems.  相似文献   

13.
《Analytical letters》2012,45(12):1910-1921
Multiblock partial least squares (MB-PLS) are applied for determination of corn and tobacco samples by using near-infrared diffuse reflection spectroscopy. In the model, the spectra are separated into several sub-blocks along the wavenumber, and different latent variable number was used for each sub-block. Compared with ordinary PLS, the importance and the contribution of each sub-block can be balanced by super-weights and the usage of different latent variable numbers. Therefore, the prediction obtained by the MB-PLS model is superior to that of the ordinary PLS, especially for the large data sets of tobacco samples with a large number of variables.  相似文献   

14.
Near-infrared spectroscopy (NIR) models built on a particular instrument are often invalid on other instruments due to spectral inconsistencies between the instruments. In the present work, global and robust NIR calibration models were constructed by partial least square (PLS) regression based on hybrid calibration sets, which are composed of both primary and secondary spectra. Three datasets were used as case studies. The first consisted of 72 radix scutellaria samples measured on two NIR spectrometers with known baicalin content. The second was composed of 80 corn samples measured on two instruments with known moisture, oil, and protein concentrations. The third dataset included 279 primary samples of tobacco with known nicotine content and 78 secondary samples of tobacco with known nicotine concentrations. The effect of the number of secondary spectra in the hybrid calibration sets and the methods for selecting secondary spectra on the PLS model performance were investigated by comparing the results obtained from different calibration sets. This study shows that the global and robust calibration models accurately predicted both primary and secondary samples as long as the ratios of the number of primary spectra to the number of secondary spectra were less than 22. The models performance was not influenced by the selection method of the secondary spectra. The hybrid calibration sets included the primary spectral information and also the secondary spectra; information, rendering the constructed global and robust models applicable to both primary and secondary instruments.  相似文献   

15.
M.T. Bona 《Talanta》2007,72(4):1423-1431
An extensive study was carried out in coal samples coming from several origins trying to establish a relationship between nine coal properties (moisture (%), ash (%), volatile matter (%), fixed carbon (%), heating value (kcal/kg), carbon (%), hydrogen (%), nitrogen (%) and sulphur (%)) and the corresponding near-infrared spectral data. This research was developed by applying both quantitative (partial least squares regression, PLS) and qualitative multivariate analysis techniques (hierarchical cluster analysis, HCA; linear discriminant analysis, LDA), to determine a methodology able to estimate property values for a new coal sample. For that, it was necessary to define homogeneous clusters, whose calibration equations could be obtained with accuracy and precision levels comparable to those provided by commercial online analysers and, study the discrimination level between these groups of samples attending only to the instrumental variables. These two steps were performed in three different situations depending on the variables used for the pattern recognition: property values, spectral data (principal component analysis, PCA) or a combination of both. The results indicated that it was the last situation what offered the best results in both two steps previously described, with the added benefit of outlier detection and removal.  相似文献   

16.
In developing partial least squares calibration models, selecting the number of latent variables used for their construction to minimize both model bias and model variance remains a challenge. Several metrics exist for incorporating these trade‐offs, but the cost of model parsimony and the potential for underfitting on achievable prediction errors are difficult to anticipate. We propose a metric that penalizes growing model variance against decreasing bias as additional latent variables are added. The magnitude of the penalty is scaled by a user‐defined parameter that is formulated to provide a constraint on the fractional increase in root mean square error of cross‐validation (RMSECV) when selecting a parsimonious model over the conventional minimum RMSECV solution. We evaluate this approach for quantification of four organic functional groups using 238 laboratory standards and 750 complex atmospheric organic aerosol mixtures with mid‐infrared spectroscopy. Parametric variation of this penalty demonstrates that increase in prediction errors due to underfitting is bounded by the magnitude of the penalty for samples similar to laboratory standards used for model training and validation. Imposing an ensemble of penalties corresponding to a 0–30% allowable increase in RMSECV through sum of ranking differences leads to the selection of a model that increases the actual RMSECV up to 20% for laboratory standards but achieves an 85% reduction in the mean error in predicted concentrations for environmental mixtures. Partial least squares models developed with laboratory mixtures can provide useful predictions in complex environmental samples, but may benefit from protection against overfitting. © 2015 The Authors. Journal of Chemometrics published by John Wiley & Sons Ltd.  相似文献   

17.
A novel method named OSC-WPT-PLS approach based on partial least squares (PLS) regression with orthogonal signal correction (OSC) and wavelet packet transform (WPT) as pre-processed tools was proposed for the simultaneous spectrophotometric determination of Al(III), Mn(II) and Co(II). This method combines the ideas of OSC and WPT with PLS regression for enhancing the ability of extracting characteristic information and the quality of regression. OSC is used to remove information in the response matrix D by subtracting the structured noise that is orthogonal to the concentration matrix C. Wavelet packet transform was applied to perform data compression, to extract relevant information, and to eliminate noise and collinearity. PLS was applied for multivariate calibration and noise reduction by eliminating the less important latent variables. In this case, using trials, the kind of wavelet function, the decomposition level, the number of OSC components and the number of PLS factors for the OSC-WPT-PLS method were selected as Daubechies 4, 3, 2 and 3, respectively. A program (POSCWPTPLS) was designed to perform the simultaneous spectrophotometric determination of Al(III), Mn(II) and Co(II). The relative standard errors of prediction (RSEP) obtained for total elements using OSC-WPT-PLS, WPT-PLS and PLS were compared. Experimental results demonstrated that the OSC-WPT-PLS method had the best performance among the three methods and was successful even when there was severe overlap of spectra.  相似文献   

18.
In multivariate calibration with the spectral dataset, variable selection is often applied to identify relevant subset of variables, leading to improved prediction accuracy and easy interpretation of the selected fingerprint regions. Until now, numerous variable selection methods have been proposed, but a proper choice among them is not trivial. Furthermore, in many cases, a set of variables found by those methods might not be robust due to the irreproducibility and uncertainty issues, posing a great challenge in improving the reliability of the variable selection. In this study, the reproducibility of the 5 variable selection methods was investigated quantitatively for evaluating their performance. The reproducibility of variable selection was quantified by using Monte-Carlo sub-sampling (MCS) techniques together with the quantitative similarity measure designed for the highly collinear spectral dataset. The investigation of reproducibility and prediction accuracy of the several variable selection algorithms with two different near-infrared (NIR) datasets illustrated that the different variable selection methods exhibited wide variability in their performance, especially in their capabilities to identify the consistent subset of variables from the spectral datasets. Thus the thorough assessment of the reproducibility together with the predictive accuracy of the identified variables improved the statistical validity and confidence of the selection outcome, which cannot be addressed by the conventional evaluation schemes.  相似文献   

19.
Yankun Li 《Talanta》2007,72(1):217-222
Consensus modeling of combining the results of multiple independent models to produce a single prediction avoids the instability of single model. Based on the principle of consensus modeling, a consensus least squares support vector regression (LS-SVR) method for calibrating the near-infrared (NIR) spectra was proposed. In the proposed approach, NIR spectra of plant samples were firstly preprocessed using discrete wavelet transform (DWT) for filtering the spectral background and noise, then, consensus LS-SVR technique was used for building the calibration model. With an optimization of the parameters involved in the modeling, a satisfied model was achieved for predicting the content of reducing sugar in plant samples. The predicted results show that consensus LS-SVR model is more robust and reliable than the conventional partial least squares (PLS) and LS-SVR methods.  相似文献   

20.
Two novel algorithms which employ the idea of stacked generalization or stacked regression, stacked partial least squares (SPLS) and stacked moving‐window partial least squares (SMWPLS) are reported in the present paper. The new algorithms establish parallel, conventional PLS models based on all intervals of a set of spectra to take advantage of the information from the whole spectrum by incorporating parallel models in a way to emphasize intervals highly related to the target property. It is theoretically and experimentally illustrated that the predictive ability of these two stacked methods combining all subsets or intervals of the whole spectrum is never poorer than that of a PLS model based only on the best interval. These two stacking algorithms generate more parsimonious regression models with better predictive power than conventional PLS, and perform best when the spectral information is neither isolated to a single, small region, nor spread uniformly over the response. A simulation data set is employed in this work not only to demonstrate this improvement, but also to demonstrate that stacked regressions have the potential capability of predicting property information from an outlier spectrum in the prediction set. Moisture, oil, protein and starch in Cargill corn samples have been successfully predicted by these new algorithms, as well as hydroxyl number for different instruments of terpolymer samples including and excluding an outlier spectrum. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号