首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 102 毫秒
1.
This paper proposes a regression method, ROSCAS, which regularizes smart contrasts and sums of regression coefficients by an L1 penalty. The contrasts and sums are based on the sample correlation matrix of the predictors and are suggested by a latent variable regression model. The contrasts express the idea that a priori correlated predictors should have similar coefficients. The method has excellent predictive performance in situations, where there are groups of predictors with each group representing an independent feature that influences the response. In particular, when the groups differ in size, ROSCAS can outperform LASSO, elastic net, partial least squares (PLS) and ridge regression by a factor of two or three in terms of mean squared error. In other simulation setups and on real data, ROSCAS performs competitively. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

2.
Near infrared (NIR) reflectance and Raman spectrometry were compared for determination of the oil and water content of olive pomace, a by-product in olive oil production. To enable comparison of the spectral techniques the same sample sets were used for calibration (1.74–3.93% oil, 48.3–67.0% water) and for validation (1.77–3.74% oil, 50.0–64.5% water). Several partial least squares (PLS) regression models were optimized by cross-validation with cancellation groups, including different spectral pretreatments for each technique. Best models were achieved with first-derivative spectra for both oil and water content. Prediction results for an independent validation set were similar for both techniques. The values of root mean square error of prediction (RMSEP) were 0.19 and 0.20–0.21 for oil content and 2.0 and 1.8 for water content, using Raman and NIR, respectively. The possibility of improving these results by combining the information of both techniques was also tested. The best models constructed using the appended spectra resulted in slightly better performance for oil content (RMSEP 0.17) but no improvement for water content.  相似文献   

3.
Determination of edible oil parameters by near infrared spectrometry   总被引:6,自引:0,他引:6  
A chemometric method has been developed for the determination of acidity and peroxide index in edible oils of different types and origins by using near infrared spectroscopy (NIR) measurements. Different methods for selecting the calibration set, after an hierarchical cluster analysis, were applied. After discrimination of olive oils from maize, seed and sunflower, the prediction capabilities of partial least squares (PLS) multivariate calibration of NIR data were evaluated. Several preprocessing alternatives (first derivative, multiplicative scatter correction, vector normalization, constant offset elimination, mean centering and standard normal variate) were investigated by using the root mean square error of validation (RMSEV) and prediction (RMSEP), as control parameters. Under the best conditions studied, the validation set provides RMSEP values of 0.034 and 0.037% (w/w) for acidity in (I) olive oil group and (II) sunflower, seed and maize oils group. RMSEP values for peroxide in both sample groups, expressed as mequiv. O2 kg−1, were, respectively 1.87 and 0.79. The limit of detection of the methodology developed was 0.03% for acidity in both groups of edible oils (I and II), and 0.9 and 0.8 mequiv. O2 kg−1 for peroxide in the olive oil and other edible oils groups, respectively. In fact, the methodology developed is proposed for direct acidity quantification and for the screening of peroxide index in edible oils, requiring less than 30 s per sample without any previous treatment.  相似文献   

4.
This paper evaluates analytical methods based on near infrared (NIR) and middle infrared (MIR) spectroscopy and multivariate calibration to monitor the stability of biodiesel. There was a focus on three parameters: oxidative stability index, acid number and water content. Ethylic and methylic biodiesel from different feedstocks were used in experiments of accelerated aging, in order to take into account the wide variety of oilseeds and feedstocks available in Brazil. Partial least squares (PLS) and multiple linear regression (MLR) models were developed. Different pre-processing techniques and spectral variable/regions selection algorithms were evaluated. For MLR models, the successive projection algorithm (SPA) was employed. Interval PLS (iPLS) and selection of variables taking into account the significant regression coefficients were used for PLS models. Results showed that both near and middle infrared regions, and all variable selection methods tested were efficient for predicting these three important quality parameters of B100, the root mean squares error of prediction (RMSEP) values being comparable to the reproducibility of the corresponding standard method for each property investigated.  相似文献   

5.
Near-infrared (NIR) spectra in the region of 5000-4000 cm−1 with a chemometric method called searching combination moving window partial least squares (SCMWPLS) were employed to determine the concentrations of human serum albumin (HSA), γ-globulin, and glucose contained in the control serum IIB (CS IIB) solutions with various concentrations. SCMWPLS is proposed to search for the optimized combinations of informative regions, which are spectral intervals, considered containing useful information for building partial least squares (PLS) models. The informative regions can easily be found by moving window partial least squares regression (MWPLSR) method. PLS calibration models using the regions obtained by SCMWPLS were developed for HSA, γ-globulin, and glucose. These models showed good prediction with the smallest root mean square error of predictions (RMSEP), the relatively small number of PLS factors, and the highest correlation coefficients among the results achieved by using whole region and MWPLSR methods. The RMSEP values of HSA, γ-globulin, and glucose yielded by SCMWPLS were 0.0303, 0.0327, and 0.0195 g/dl, respectively. These results prove that SCMWPLS can be successfully applied to determine simultaneously the concentrations of HSA, γ-globulin, and glucose in complicated biological fluids such as CS IIB solutions by using NIR spectroscopy.  相似文献   

6.
Changeable size moving window partial least squares (CSMWPLS) and searching combination moving window partial least squares (SCMWPLS) are proposed to search for an optimized spectral interval and an optimized combination of spectral regions from informative regions obtained by a previously proposed spectral interval selection method, moving window partial least squares (MWPLSR) [Anal. Chem. 74 (2002) 3555]. The utilization of informative regions aims to construct better PLS models than those based on the whole spectral points. The purpose of CSMWPLS and SCMWPLS is to optimize the informative regions and their combination to further improve the prediction ability of the PLS models. The results of their application to an open-path (OP)/FT-IR spectra data set show that the proposed methods, especially SCMWPLS can find out an optimized combination, with which one can improve, often significantly, the performance of the corresponding PLS model, in terms of low prediction error, root mean square error of prediction (RMSEP) with the reasonable latent variable (LVs) number, comparing with the results obtained using whole spectra or direct combination of informative regions for a compound. Regions consisting of the combinations obtained can easily be explained by the existence of IR absorption bands in those spectral regions.  相似文献   

7.
利用近红外光谱技术对食用植物油中反式脂肪酸(Trans fatty acids,TFA)含量进行快速定量检测,并通过波段选择、预处理方法、变量筛选及建模方法对TFA含量预测模型进行优化.采用AntarisⅡ傅里叶变换近红外光谱仪在4000~10000 cm-1光谱范围采集98个食用植物油样本的近红外透射光谱,然后采用气相色谱法测定TFA的真实含量.首先,对样本原始光谱进行波段、预处理方法优选;在此基础上,采用竞争自适应重加权法(Competitive adaptive reweighted sampling,CARS)筛选TFA相关的重要变量,最后应用主成分回归、偏最小二乘和最小二乘支持向量机方法分别建立食用植物油中TFA含量的预测模型.研究结果表明,近红外光谱技术检测食用植物油中的TFA含量是可行的,优化后的最佳预测模型的校正集和预测集R2分别为0.992和0.989,RMSEC和RMSEP分别为0.071%和0.075%.最佳预测模型所用的变量仅26个,占全波段变量的0.854%.此外,与全波段偏最小二乘预测模型相比,其预测集R2由0.904上升为0.989,RMSEP由0.230%下降为0.075%.由此表明,模型优化非常必要,CARS能有效筛选TFA相关的重要变量,极大减少建模变量数,从而简化预测模型,并较大提高预测模型的精度和稳定性.  相似文献   

8.
Owing to spectral variations from other sources than the component of interest, large investments in the NIR model development may be required to obtain satisfactory and robust prediction performance. To make the NIR model development for routine active pharmaceutical ingredient (API) prediction in tablets more cost-effective, alternative modelling strategies were proposed. They used a massive amount of prior spectral information on intra- and inter-batch variation and the pure component spectra to define a clutter, i.e., the detrimental spectral information. This was subsequently used for artificial data augmentation and/or orthogonal projections. The model performance improved statistically significantly, with a 34–40% reduction in RMSEP while needing fewer model latent variables, by applying the following procedure before PLS regression: (1) augmentation of the calibration spectra with the spectral shapes from the clutter, and (2) net analyte pre-processing (NAP). The improved prediction performance was not compromised when reducing the variability in the calibration set, making exhaustive calibration unnecessary. Strong water content variations in the tablets caused frequency shifts of the API absorption signals that could not be included in the clutter. Updating the model for this kind of variation demonstrated that the completeness of the clutter is critical for the performance of these models and that the model will only be more robust for spectral variation that is not co-linear with the one from the property of interest.  相似文献   

9.
《Analytica chimica acta》2004,509(2):217-227
In near-infrared (NIR) measurements, some physical features of the sample can be responsible for effects like light scattering, which lead to systematic variations unrelated to the studied responses. These errors can disturb the robustness and reliability of multivariate calibration models. Several mathematical treatments are usually applied to remove systematic noise in data, being the most common derivation, standard normal variate (SNV) and multiplicative scatter correction (MSC). New mathematical treatments, such as orthogonal signal correction (OSC) and direct orthogonal signal correction (DOSC), have been developed to minimize the variability unrelated to the response in spectral data. In this work, these two new pre-processing methods were applied to a set of roasted coffee NIR spectra. A separate calibration model was developed to quantify the ash content and lipids in roasted coffee samples by PLS regression. The results provided by these correction methods were compared to those obtained with the original data and the data corrected by derivation, SNV and MSC. For both responses, OSC and DOSC treatments gave PLS calibration models with improved prediction abilities (4.9 and 3.3% RMSEP with corrected data versus 7.1 and 8.3% RMSEP with original data, respectively).  相似文献   

10.
This article studies calibration maintenance and transfer to build a statistical model that is able to predict analyte concentrations by a set of spectra. Noticing that the wavelength atoms are naturally ordered in a meaningful way, we propose a novel robust fused LASSO (RFL) based on high‐dimensional sparsity techniques and a recent Θ‐IPOD technique for robustification. This new approach can attain simultaneous wavelength selection and grouping as well as outlier identification, without any human intervention. An efficient and scalable algorithm is developed on the basis of the alternating direction method of multipliers. The obtained RFL model is sparse and shows improved prediction performance over the LASSO and ridge regression. Our results reveal that wavelengths can be combined into blocks, in a smart manner, to enhance the interpretability and reliability for super‐resolution spectral analysis. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

11.
Piecewise direct standardization (PDS) is applied to multivariate standardization of fluorescence signals using partial least squares (PLS) and principal component regression (PCR) as the calibration models. The multivariate standardization was used to transfer spectra obtained after a step of solid phase extraction (SPE) to spectra registered in pure solvent in the determination of carbendazim, fuberidazole and thiabendazole in water samples. The influential parameters, such as tolerance, window size and the number of samples of the standardization subset were optimized by means of the root mean squared error of prediction (RMSEP). Similar RMSEP values were obtained by PLS and PCR using the optimized influential parameters in the standardization. However, better predictions of the compounds were obtained in test set by the PLS model.  相似文献   

12.
基于小波系数的近红外光谱局部建模方法与应用研究   总被引:2,自引:0,他引:2  
局部建模方法使用与预测样本相似的样本建立模型,可解决光谱响应与浓度之间的非线性问题,扩大模型的适用范围,提高预测准确度。采用小波变换进行数据压缩并利用小波系数之间的欧氏距离作为光谱相似性的判据,实现了近红外光谱定量分析的局部建模方法,避免了样本之间的依赖性。将所建立的方法用于烟草样品中氯含量的测定,100次重复计算得到的预测集均方根误差(RMSEP)平均值为0.0665,标准偏差(σ)为0.0045,优于全局建模和基于主成分的局部建模方法。  相似文献   

13.
Adulteration of foods has been known to exist for a long time and various analytical tests have been reported to address this problem. Among them, authenticity of sesame oil has attracted much attention. Near-infrared (NIR) spectral quantitative detection models of sesame oil adulterated with other oils are constructed by chemometric methods, i.e., competitive adaptive reweighted sampling (CARS), elastic component regression (ECR) and partial least squares (PLS). Sixty samples adulterated with different proportions of five kinds of other oils of lower price were scanned by a Fourier-transform-NIR spectrometer and the NIR spectra were collected in 4500–10000 cm−1 region by transmission mode. All samples were divided into the training set and an independent test set. Model population analysis has also been carried out and confirms the importance of selecting representative samples. The experimental results indicate that the PLS model using only 10 variables from CARS and the ECR model show similar performance and both are superior to the full-spectrum PLS model. CARS focuses on selecting variables and ECR focuses on optimizing the parameters, implying that both roads lead to the same destination. It seems that NIR technique combined with CARS or ECR is feasible for rapidly detecting sesame oil adulterated with other vegetable oils.  相似文献   

14.
Near infrared (NIR) spectroscopy was employed for simultaneous determination of methanol and ethanol contents in gasoline. Spectra were collected in the range from 714 to 2500 nm and were used to construct quantitative models based on partial least squares (PLS) regression. Samples were prepared in the laboratory and the PLS regression models were developed using the spectral range from 1105 to 1682 nm, showing a root mean square error of prediction (RMSEP) of 0.28% (v/v) for ethanol for both PLS-1 and PLS-2 models and of 0.31 and 0.32% (v/v) for methanol for the PLS-1 and PLS-2 models, respectively. A RMSEP of 0.83% (v/v) was obtained for commercial samples. The effect of the gasoline composition was investigated, it being verified that some solvents, such as toluene and o-xylene, interfere in ethanol content prediction, while isooctane, o-xylene, m-xylene and p-xylene interfere in the methanol content prediction. Other spectral ranges were investigated and the range 1449-1611 nm showed the best results.  相似文献   

15.
彩色相机的颜色校正是实现成像色彩一致性的必要保障手段。传统的相机颜色校正中,对测量数据多采用多项式回归分析来确定颜色定标系数,存在着精度不高的缺点,因此,本文对测量数据提出了基于LASSO的高阶多项式回归拟合方法,利用LASSO压缩系数的特点,在保证计算复杂度的前提下,有效提高了回归模型的校正精度。在D65标准光源下对ColorChecker 24色卡进行了实际成像实验,并用CIELAB色差公式表征了校正效果,实验结果表明,新方法的校正效果明显优于传统的线性回归、二次多项式回归方法,平均色差指标可以达到5个CIELAB色差单位。  相似文献   

16.
The performances of three multivariate analysis methods—partial least squares (PLS) regression, secured principal component regression (sPCR) and modified secured principal component regression (msPCR)—are compared and tested for the determination of human serum albumin (HSA), γ-globulin, and glucose in phosphate buffer solutions and blood glucose quantification by near-infrared (NIR) spectroscopy. Results from the application of PLS, sPCR and msPCR are presented, showing that the three methods can determine the concentrations of HSA, γ-globulin and glucose in phosphate buffer solutions almost equally well provided that the prediction samples contain the same spectral information as the calibration samples. On the other hand, when some potential spectral features appear in new measurements, sPCR and msPCR outperform PLS significantly. The reason for this is that such spectral features are not included during calibration, which leads to a degradation in PLS prediction performance, while sPCR and msPCR can improve their predictions for the concentrations of the analytes by removing the uncalibrated features from the original spectra. This point is demonstrated by successfully applying sPCR and msPCR to in vivo blood glucose measurements. This work therefore shows that sPCR and msPCR may provide possible alternatives to PLS in cases where some uncalibrated spectral features are present in measurements used for concentration prediction.  相似文献   

17.
Edible oils are used in the preparation of foods as a part of their recipe or for frying. So to ensure of food safety, checking the quality of the oils before and after usage is an important subject in food control laboratories. In this study, edible oils from four different sources (canola, corn, sunflower and frying) were heated for 36 h at 170 °C and sampling was done every 6 h. The free fatty acid, peroxide value and the content of some fatty acids (C16:0, C18:0, C18:1, C18:2, C18:3) of the oil samples were determined by standard methods. Then, the ATR-FTIR spectra of the samples were collected. The partial least squares (PLS) regression combined with genetic algorithm was performed on the spectroscopic data to obtain the appropriate predictive models for the simultaneous estimation of acid value, peroxide value and the percentage of five kinds of fatty acids. The effect of some preprocessing methods on these models was also investigated. Preprocessing of data by orthogonal signal correction (OSC) resulted in the best predictive models for all oil properties. The correlation coefficients of calibration set (>0.99) and validation set (>0.86 and in most case >0.94) of the OSC–PLS model suggested suitable predictive modeling for all studied parameters in the oil samples. This method could be suggested as a rapid, economical and environmental friendly technique for simultaneous determination of seven noted parameters in the edible oils.  相似文献   

18.
Fourier transform (FT) Raman spectrometry in combination with partial least squares (PLS) regression was used for direct, reagent-free determination of free fatty acid (FFA) content in olive oils and olives. Oils were directly investigated in a simple flow cell. Milled olives were measured in a dedicated sample cup, which was rotated eccentrically to the horizontal laser beam during spectrum acquisition in order to compensate sample heterogeneity. Both external and internal (leave-one-out) validation were used to assess the predictive ability of the PLS calibration models for FFA content (in terms of oleic acid) in oil and olives in the range 0.20-6.14 and 0.15-3.79%, respectively. The root mean square error of prediction (RMSEP) was 0.29% for oil and 0.28% for olives. The predicted FFA contents were used to classify oils and olives in different categories according to the European Union regulations. Ninety percent of the oil samples and 80% of the olives were correctly classified. These results demonstrate that the proposed procedures can be used for screening of good quality olives before processing, as well as, for the on-line control of the produced oil.  相似文献   

19.
Near-infrared spectroscopy (NIR) is widely used in food quantitative and qualitative analysis. Variable selection technique is a critical step of the spectrum modeling with the development of chemometrics. In this study, a novel variable selection strategy, automatic weighting variable combination population analysis (AWVCPA), is proposed. Firstly, binary matrix sampling (BMS) strategy, which provides each variable the same chance to be selected and generates different variable combinations, is used to produce a population of subsets to construct a population of sub-models. Then, the variable frequency (Fre) and partial least squares regression (Reg), two kinds of information vector (IVs), are weighted to obtain the value of the contribution of each spectral variables, and the influence of two IVs of Rre and Reg is considered to each spectral variable. Finally, it uses the exponentially decreasing function (EDF) to remove the low contribution wavelengths so as to select the characteristic variables. In the case of near infrared spectra of beer and corn, yeast and oil concentration models based on partial least squares (PLS) of prediction are established. Compared with other variable selection methods, the research shows that AWVCPA is the best variable selection strategy in the same situation. It has 72.7% improvement comparing AWVCPA-PLS to PLS and the predicted root mean square error (RMSEP) decreases from 0.5348 to 0.1457 on beer dataset. Also it has 64.7% improvement comparing AWVCPA-PLS to PLS and the RMSEP decreases from 0.0702 to 0.0248 on corn dataset.  相似文献   

20.
This study uses Raman and IR spectroscopic methods for the detection of adulterants in marine oils. These techniques are used individually and as low-level fused spectroscopic data sets. We used cod liver oil (CLO) and salmon oil (SO) as the valuable marine oils mixed with common adulterants, such as palm oil (PO), omega-3 concentrates in ethyl ester form (O3C), and generic fish oil (FO). We showed that support vector machines (SVM) can classify the adulterant present in both CLO and SO samples. Furthermore, partial least squares regression (PLSR) may be used to quantify the adulterants present. For example, PO and O3C adulterated samples could be detected with a RMSEP value less than 4%. However, the FO adulterant was more difficult to quantify because of its compositional similarity to CLO and SO. In general, data fusion improved the RMSEP for PO and O3C detection. This shows that Raman and IR spectroscopy can be used in concert to provide a useful analytical test for common adulterants in CLO and SO.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号