首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper presents a Bayesian approach to the development of spectroscopic calibration models. By formulating the linear regression in a probabilistic framework, a Bayesian linear regression model is derived, and a specific optimization method, i.e. Bayesian evidence approximation, is utilized to estimate the model “hyper-parameters”. The relation of the proposed approach to the calibration models in the literature is discussed, including ridge regression and Gaussian process model. The Bayesian model may be modified for the calibration of multivariate response variables. Furthermore, a variable selection strategy is implemented within the Bayesian framework, the motivation being that the predictive performance may be improved by selecting a subset of the most informative spectral variables. The Bayesian calibration models are applied to two spectroscopic data sets, and they demonstrate improved prediction results in comparison with the benchmark method of partial least squares.  相似文献   

2.
Determination of benzo[a]pyrene (BaP) in cigarette smoke can be very important for the tobacco quality control and the assessment of its harm to human health. In this study, mid-infrared spectroscopy (MIR) coupled to chemometric algorithm (DPSO-WPT-PLS), which was based on the wavelet packet transform (WPT), discrete particle swarm optimization algorithm (DPSO) and partial least squares regression (PLS), was used to quantify harmful ingredient benzo[a]pyrene in the cigarette mainstream smoke with promising result. Furthermore, the proposed method provided better performance compared to several other chemometric models, i.e., PLS, radial basis function-based PLS (RBF-PLS), PLS with stepwise regression variable selection (Stepwise-PLS) as well as WPT-PLS with informative wavelet coefficients selected by correlation coefficient test (rtest-WPT-PLS). It can be expected that the proposed strategy could become a new effective, rapid quantitative analysis technique in analyzing the harmful ingredient BaP in cigarette mainstream smoke.  相似文献   

3.
《Analytical letters》2012,45(1):171-183
Based on wavelet transformation (WT) and mutual information (MI), a simple and effective procedure is proposed for multivariate calibration of near-infrared spectroscopy. In such a procedure, the original spectra of the training set are first transformed into a set of wavelet representations by wavelet prism transform. Then, the MI value between each wavelet coefficient variable and the dependent variable is calculated, resulting in a MI spectrum; by retaining a subset set of coefficients with higher MI, an update training set consisting of wavelet coefficients is obtained and reconstructed/converted back to the original domain. Based on this, a partial least square (PLS) model can be constructed and optimized. The optimal wavelet and decomposition level are determined by experiment. A NIR quantitative problem involving the determination of total sugar in tobacco is used to demonstrate the overall performance of the proposed procedure, named RPLS, meaning PLS in reconstructed original domain coupled with MI-induced variable selection in wavelet domain (RPLS). Three kinds of procedures, that is, conventional full-spectrum PLS in original domain (FPLS), PLS in original domain coupled with MI-induced variable selection (OPLS), and direct PLS in MI-based wavelet coefficients (WPLS), are used as reference. The result confirms that it can build more accurate and robust calibration models without increasing the complexity.  相似文献   

4.
5.
Molecular factor computing (MFC) is a new strategy that employs chemometric methods in an optical instrument to obtain analytical results directly using an appropriate filter without data processing. In the present contribution, a method for designing an MFC filter using wavelet functions was proposed for spectroscopic analysis. In this method, the MFC filter is designed as a linear combination of a set of wavelet functions. A multiple linear regression model relating the concentration to the wavelet coefficients is constructed, so that the wavelet coefficients are obtained by projecting the spectra onto the selected wavelet functions. These wavelet functions are selected by optimizing the model using a genetic algorithm (GA). Once the MFC filter is obtained, the concentration of a sample can be calculated directly by projecting the spectrum onto the filter. With three NIR datasets of corn, wheat and blood, it was shown that the performance of the designed filter is better than that of the optimized partial least squares models, and commonly used signal processing methods, such as background correction and variable selection, were not needed. More importantly, the designed filter can be used as an MFC filter in designing MFC-based instruments.  相似文献   

6.
The wavelet packet transform (WPT) is a variant of the standard wavelet transform that offers greater flexibility in the decomposition of instrumental signals. Although encouraging results have been published concerning the use of WPT for signal compression and denoising, its application in multivariate calibration problems has received comparatively little attention, with very few contributions reported in the literature. This paper presents an investigation concerning the use of WPT as a feature extraction tool to improve the prediction ability of PLS models. The optimization of the wavelet packet tree is accomplished by using the classic dynamic programming algorithm and an entropy cost function modified to take into account the variance explained by the WPT coefficients. The selection of WPT coefficients for inclusion in the PLS model is carried out on the basis of correlation with the dependent variable, in order to exploit the joint statistics of the instrumental response and the parameter of interest. This WPT-PLS strategy is applied in a case study involving FT-IR spectrometric determination of four gasoline parameters, namely specific mass (SM) and the distillation temperatures at which 10%, 50%, 90% of the sample has evaporated. The dataset comprises 103 gasoline samples collected from gas stations and 6144 wavelengths in the range 2500-15000 nm. By applying WPT to the FT-IR spectra, considerable compression with respect to the original wavelength domain is achieved. The effect of varying the wavelet and the threshold level on the prediction ability of the resulting models is investigated. The results show that WPT-PLS outperforms standard PLS in most wavelet-threshold combinations for all determined parameters.  相似文献   

7.
Multivariate calibration is tested as an alternative to model chromium(III) concentration versus chemiluminescence registers obtained from luminol-hydrogen peroxide reaction. The multivariate calibration approaches included have been: conventional linear methods (principal component regression (PCR) and partial least squares (PLS)), nonlinear methods (nonlinear variants and variants of locally weighted regression) and linear methods combined with variable selection performed in the original or in the transformed data (stepwise multiple linear regression procedure). Both the direct and inverse univariate approaches have been also tested.

The use of a double logarithmic transformation previous to the linear regression has been also evaluated. A new double logarithmic transformation previous to the linear regression is proposed in order to avoid the effect of the noise in the calibration model. Pre-processing, optimization and prediction ability of the multivariate calibration models has been studied at nine different experimental conditions including batch and FIA measurements. Box-plots, PCA and cluster analysis have been employed to test the prediction ability of the different models tested. Nonlinear PCR and nonlinear PLS provide the best results. Real samples have been analyzed and compared with the reference method. The results confirm the successful use of the proposed methodology.  相似文献   


8.
Sample selection is often used to improve the cost-effectiveness of near-infrared (NIR) spectral analysis. When raw NIR spectra are used, however, it is not easy to select appropriate samples, because of background interference and noise. In this paper, a novel adaptive strategy based on selection of representative NIR spectra in the continuous wavelet transform (CWT) domain is described. After pretreatment with the CWT, an extension of the Kennard–Stone (EKS) algorithm was used to adaptively select the most representative NIR spectra, which were then submitted to expensive chemical measurement and multivariate calibration. With the samples selected, a PLS model was finally built for prediction. It is of great interest to find that selection of representative samples in the CWT domain, rather than raw spectra, not only effectively eliminates background interference and noise but also further reduces the number of samples required for a good calibration, resulting in a high-quality regression model that is similar to the model obtained by use of all the samples. The results indicate that the proposed method can effectively enhance the cost-effectiveness of NIR spectral analysis. The strategy proposed here can also be applied to different analytical data for multivariate calibration.  相似文献   

9.
Near-infrared (NIR) spectrometry is now widely used in various fields and great attention is paid to the application of it to addressing complex problems, which brings about the need for the calibration of systems that fail to exhibit satisfactional linear relationship between input-output data. In this work we present a novel method to build a multivariate calibration model for NIR spectra, i.e. genetic algorithm-radial basis function network in wavelet domain (WT-GA-RBFN), which combines the advantages of wavelet transform and genetic algorithm. The variable selection is accomplished in two stages in wavelet domain: at the first stage, the variables are pre-selected (compressed) by variance and at the second stage the variables are further reduced by a special designed GA. The proposed method is illustrated through presenting its application to three NIR data sets in different fields and the comparison to PLS model.  相似文献   

10.
Orthogonal WAVElet correction (OWAVEC) is a pre-processing method aimed at simultaneously accomplishing two essential needs in multivariate calibration, signal correction and data compression, by combining the application of an orthogonal signal correction algorithm to remove information unrelated to a certain response with the great potential that wavelet analysis has shown for signal processing. In the previous version of the OWAVEC method, once the wavelet coefficients matrix had been computed from NIR spectra and deflated from irrelevant information in the orthogonalization step, effective data compression was achieved by selecting those largest correlation/variance wavelet coefficients serving as the basis for the development of a reliable regression model. This paper presents an evolution of the OWAVEC method, maintaining the first two stages in its application procedure (wavelet signal decomposition and direct orthogonalization) intact but incorporating genetic algorithms as a wavelet coefficients selection method to perform data compression and to improve the quality of the regression models developed later. Several specific applications dealing with diverse NIR regression problems are analyzed to evaluate the actual performance of the new OWAVEC method. Results provided by OWAVEC are also compared with those obtained with original data and with other orthogonal signal correction methods.  相似文献   

11.
Near-infrared (NIR) spectrometry will present a more promising tool for quantitative measurement if the robustness and predictive ability of the partial least square (PLS) model are improved. In order to achieve the purpose, we present a new algorithm for simultaneous wavelength selection and outlier detection; at the same time, the problems of background and noise in multivariate calibration are also solved. The strategy is a combination of continuous wavelet transform (CWT) and modified iterative predictors and objects weighting PLS (mIPOW-PLS). CWT is performed as a pretreatment tool for eliminating background and noise synchronously; then, mIPOW-PLS is proposed to remove both the useless wavelengths and the multiple outliers in CWT domain. After pretreatment with CWT-mIPOW-PLS, a PLS model is built finally for prediction. The results indicate that the combination of CWT and mIPOW-PLS produces robust and parsimonious regression models with very few wavelengths.  相似文献   

12.
李正华  程凡圣  夏之宁 《色谱》2011,29(1):63-69
应用分子电性距离矢量(MEDV)对114个多环芳香硫化合物(PASHs)进行结构表征,通过多元线性回归建立了PASHs的气相色谱保留指数与MEDV参数之间的定量结构-保留值关系模型;同时采用逐步回归分析进行变量筛选,继而以留一法交互检验对所得优化模型进行预测能力评价,所建立的模型的相关系数为0.9947,交互检验相关系数为0.9940,表明该优化模型具有良好的稳定性和预测能力。此外,通过将样本集按2:1分成校准集和测试集预测,统计分析结果显示所建的模型具有良好的相关性和稳定性。本文所建的定量结构-保留值关系(QSRR)模型为预测PASHs的气相色谱保留指数提供了一个便捷有效的新方法。  相似文献   

13.
Ant colony optimization (ACO) is a meta-heuristic algorithm, which is derived from the observation of real ants. In this paper, ACO algorithm is proposed to feature selection in quantitative structure property relationship (QSPR) modeling and to predict λmax of 1,4-naphthoquinone derivatives. Feature selection is the most important step in classification and regression systems. The performance of the proposed algorithm (ACO) is compared with that of a stepwise regression, genetic algorithm and simulated annealing methods. The average absolute relative deviation in this QSPR study using ACO, stepwise regression, genetic algorithm and simulated annealing using multiple linear regression method for calibration and prediction sets were 5.0%, 3.4% and 6.8%, 6.1% and 5.1%, 8.6% and 6.0%, 5.7%, respectively. It has been demonstrated that the ACO is a useful tool for feature selection with nice performance.  相似文献   

14.
This paper introduces the ant colony algorithm, a novel swarm intelligence based optimization method, to select appropriate wavelet coefficients from mass spectral data as a new feature selection method for ovarian cancer diagnostics. By determining the proper parameters for the ant colony algorithm (ACA) based searching algorithm, we perform the feature searching process for 100 times with the number of selected features fixed at 5. The results of this study show: (1) the classification accuracy based on the five selected wavelet coefficients can reach up to 100% for all the training, validating and independent testing sets; (2) the eight most popular selected wavelet coefficients of the 100 runs can provide 100% accuracy for the training set, 100% accuracy for the validating set, and 98.8% accuracy for the independent testing set, which suggests the robustness and accuracy of the proposed feature selection method; and (3) the mass spectral data corresponding to the eight popular wavelet coefficients can be located by reverse wavelet transformation and these located mass spectral data still maintain high classification accuracies (100% for the training set, 97.6% for the validating set, and 98.8% for the testing set) and also provide sufficient physical and medical meaning for future ovarian cancer mechanism studies. Furthermore, the corresponding mass spectral data (potential biomarkers) are in good agreement with other studies which have used the same sample set. Together these results suggest this feature extraction strategy will benefit the development of intelligent and real-time spectroscopy instrumentation based diagnosis and monitoring systems.  相似文献   

15.
An algorithm is proposed for extracting relevant information from near-infrared (NIR) spectra for multivariate calibration of routine components in complex plant samples. The algorithm is a combination of wavelet transform (WT) data compression and a procedure for uninformative variable elimination (UVE). After compression of the NIR spectra by WT, the UVE approach is used to eliminate the irrelevant wavelet coefficients. Finally, a calibration model is built from the retained wavelet coefficients to enable prediction. Because irrelevant information can be removed from the spectra used for multivariate calibration, the model based on the extracted relevant features is better than those obtained with full-spectrum data. Both prediction precision and calculation speed are improved.  相似文献   

16.
Glycerol monolaurate (GML) products contain many impurities, such as lauric acid and glucerol. The GML content is an important quality indicator for GML production. A hybrid variable selection algorithm, which is a combination of wavelet transform (WT) technology and modified uninformative variable eliminate (MUVE) method, was proposed to extract useful information from Fourier transform infrared (FT-IR) transmission spectroscopy for the determination of GML content. FT-IR spectra data were compressed by WT first; the irrelevant variables in the compressed wavelet coefficients were eliminated by MUVE. In the MUVE process, simulated annealing (SA) algorithm was employed to search the optimal cutoff threshold. After the WT-MUVE process, variables for the calibration model were reduced from 7366 to 163. Finally, the retained variables were employed as inputs of partial least squares (PLS) model to build the calibration model. For the prediction set, the correlation coefficient (r) of 0.9910 and root mean square error of prediction (RMSEP) of 4.8617 were obtained. The prediction result was better than the PLS model with full-spectra data. It was indicated that proposed WT-MUVE method could not only make the prediction more accurate, but also make the calibration model more parsimonious. Furthermore, the reconstructed spectra represented the projection of the selected wavelet coefficients into the original domain, affording the chemical interpretation of the predicted results. It is concluded that the FT-IR transmission spectroscopy technique with the proposed method is promising for the fast detection of GML content.  相似文献   

17.
将小波变换和多维偏最小二乘法相结合用于近红外光谱定量校正模型的建立。首先将原始光谱进行小波变换分解,得到系列小波细节系数,通过选取一组受外界因素少、信息强的小波系数组成三维光谱阵,然后再采用多维偏最小二乘法建立校正模型。实验结果表明,该方法所建近红外校正模捌的预测能力更强,并更具稳健性。  相似文献   

18.
By employing the simple but effective principle ‘survival of the fittest’ on which Darwin's Evolution Theory is based, a novel strategy for selecting an optimal combination of key wavelengths of multi-component spectral data, named competitive adaptive reweighted sampling (CARS), is developed. Key wavelengths are defined as the wavelengths with large absolute coefficients in a multivariate linear regression model, such as partial least squares (PLS). In the present work, the absolute values of regression coefficients of PLS model are used as an index for evaluating the importance of each wavelength. Then, based on the importance level of each wavelength, CARS sequentially selects N subsets of wavelengths from N Monte Carlo (MC) sampling runs in an iterative and competitive manner. In each sampling run, a fixed ratio (e.g. 80%) of samples is first randomly selected to establish a calibration model. Next, based on the regression coefficients, a two-step procedure including exponentially decreasing function (EDF) based enforced wavelength selection and adaptive reweighted sampling (ARS) based competitive wavelength selection is adopted to select the key wavelengths. Finally, cross validation (CV) is applied to choose the subset with the lowest root mean square error of CV (RMSECV). The performance of the proposed procedure is evaluated using one simulated dataset together with one near infrared dataset of two properties. The results reveal an outstanding characteristic of CARS that it can usually locate an optimal combination of some key wavelengths which are interpretable to the chemical property of interest. Additionally, our study shows that better prediction is obtained by CARS when compared to full spectrum PLS modeling, Monte Carlo uninformative variable elimination (MC-UVE) and moving window partial least squares regression (MWPLSR).  相似文献   

19.
Net analyte signal (NAS)-based multivariate calibration methods were employed for simultaneous determination of anthazoline and naphazoline. The NAS vectors calculated from the absorbance data of the drugs mixture were used as input for classical least squares (CLS), principal component and partial least squares regression PCR and PLS methods. A wavelength selection strategy was used to find the best wavelength region for each drug separately. As a new procedure, we proposed an experimental design-neural network strategy for wavelength region optimization. By use of a full factorial design method, some different wavelength regions were selected by taking into account different spectral parameters including the starting wavelength, the ending wavelength and the wavelength interval. The performance of all the multivariate calibration methods, in all selected wavelength regions for both drugs, was evaluated by calculating a fitness function based on the root mean square error of calibration and validation. A three-layered feed-forward artificial neural network (ANN) model with back-propagation learning algorithm was employed to model the nonlinear relationship between the spectral parameters and fitness of each regression method. From the resulted ANN models, the spectral regions in which lowest fitness could be obtained were chosen. Comparison of the results revealed that the net NAS-PLS resulted in lower prediction error than the other models. The proposed NAS-based calibration method was successfully applied to the simultaneous analyses of anthazoline and naphazoline in a commercial eye drop sample.  相似文献   

20.
Multivariate calibration problems often involve the identification of a meaningful subset of variables, from a vast number of variables for better prediction of output variables. A new graph theoretic method based on partial correlations (variable interaction network—VIN) is proposed. Many well studied representative calibration datasets spanning different application domains are selected for investigating the performance. Partial least squares (PLS) regression models combined with variable selection techniques are employed for benchmarking the performance. Subsets of variables with different number of variables are retained for the final analysis after VIN selection and progressive prediction accuracies are used for comparison. VIN-PLS results show significant improvement in prediction efficiencies and variable subset optimization. Improvement of up to 45% over existing methods with significantly fewer variables is achieved using the new method. Advantages of VIN based variable selection are highlighted.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号