首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 24 毫秒
1.
A new cut-off criterion has been proposed for the selection of uninformative variables prior to chemometric partial least squares (PLS) modelling. After variable elimination, PLS regressions were made and assessed comparing the results with those obtained by PLS models based on the full spectral range. To assess the prediction capabilities, uninformative variable elimination (UVE)-PLS and PLS were applied to diffuse reflectance near-infrared spectra of heroin samples. The application of the proposed new cut-off criterion, based on the t-Students distribution, provided similar predictive capabilities of the PLS models than those obtained using the original criteria based on quantile value. However, the repeatability of the number of selected variables was improved significantly.  相似文献   

2.
This paper proposes a methodology for the classification and determination of total protein in milk powder using near infrared reflectance spectrometry (NIRS) and variable selection. Two brands of milk powder were acquired from three Brazilian cities (Natal-RN, Salvador-BA and Rio de Janeiro-RJ). The protein content of 38 samples was determined by the Kjeldahl method and NIRS analysis. Principal component regression (PCR) and partial least squares (PLS) multivariate calibrations were used to predict the total protein. Soft independent modeling of class analogy (SIMCA) was also used for full-spectrum classification, resulting in almost 100% classification accuracy, regardless of the significance level adopted for the F-test. Using this strategy, it was feasible to classify powder milk rapidly and nondestructively without the need for various analytical determinations. Concerning the multivariate calibration models, the results show that PCR, PLS and MLR-SPA models are good for predicting total protein in powder milk; the respective root mean square errors of prediction (RMSEP) were 0.28 (PCR), 0.25 (PLS), 0.11 wt% (MLR-SPA) with an average sample protein content of 8.1 wt%. The results obtained in this investigation suggest that the proposed methodology is a promising alternative for the determination of total protein in milk powder.  相似文献   

3.
偏最小二乘(partial least squares,PLS)与广义回归神经网络(generalized regression neural networks,GRNN)联用对土豆样品建立起粗纤维、淀粉、蛋白质含量的预测校正模型,用PLS法将原始数据压缩为主成份,取前3个主成份的12个特征吸收峰输入GRNN网络,网络光滑因子σi为0.1.PLS-GRNN模型对样品3个组分含量的预测决定系数(R2)分别为: 0.945、 0.992、 0.938.结果表明,近红外光谱技术可以快速、准确地同时测定土豆中的粗纤维、淀粉、蛋白质,该方法可应用于果蔬产业的品质管理与控制.  相似文献   

4.
Ni Xin  Qinghua Meng  Yizhen Li  Yuzhu Hu 《中国化学》2011,29(11):2533-2540
This paper indicates the possibility to use near infrared (NIR) spectral similarity as a rapid method to estimate the quality of Flos Lonicerae. Variable selection together with modelling techniques is utilized to select representative variables that are used to calculate the similarity. NIR is used to build calibration models to predict the bacteriostatic activity of Flos Lonicerae. For the determination of the bacteriostatic activity, the in vitro experiment is used. Models are built for the Gram‐positive bacteria and also for the Gram‐negative bacteria. A genetic algorithm combined with partial least squares regression (GA‐PLS) is used to perform the calibration. The results of GA‐PLS models are compared to interval partial least squares (iPLS) models, full‐spectrum PLS and full‐spectrum principal component regression (PCR) models. Then, the variables in the two GA‐PLS models are combined and then used to calculate the NIR spectral similarity of samples. The similarity based on the characteristic variables and full spectrum is used for evaluating the fingerprints of Flos Lonicerae, respectively. The results show that the combination of variable selection method, modelling techniques and similarity analysis might be a powerful tool for quality control of traditional Chinese medicine (TCM).  相似文献   

5.
The pharmaceutical industry faces increasing regulatory pressure to optimize quality control. Content uniformity is a basic release test for solid dosage forms. To accelerate test throughput and comply with the Food and Drug Administration's process analytical technology initiative, attention is increasingly turning to nondestructive spectroscopic techniques, notably near-infrared (NIR) spectroscopy (NIRS). However, validation of NIRS using requisite linearity and standard error of prediction (SEP) criteria remains a challenge. This study applied wavelet transformation of the NIR spectra of a commercial tablet to build a model using conventional partial least squares (PLS) regression and an artificial neural network (ANN). Wavelet coefficients in the PLS and ANN models reduced SEP by up to 60% compared to PLS models using mathematical spectra pretreatment. ANN modeling yielded high-linearity calibration and a correlation coefficient exceeding 0.996.  相似文献   

6.
以普通玉米籽粒为试验材料,在应用遗传算法结合偏最小二乘回归法对近红外光谱数据进行特征波长选择的基础上,应用偏最小二乘回归法建立了特征波长测定玉米籽粒中淀粉含量的校正模型.试验结果表明,基于11个特征波长所建立的校正模型,其校正误差(RMSEC)、交叉检验误差(RMSECV)和预测误差(RMSEP)分别为0.30%、0.35%和0.27%,校正数据集和独立的检验数据集的预测值与实际测定值之间的相关系数分别达到0.9279和0.9390,与全光谱数据所建立的预测模型相比,在预测精度上均有所改善,表明应用遗传算法和PLS进行光谱特征选择,能获得更简单和更好的模型,为玉米籽粒中淀粉含量的近红外测定和红外光谱数据的处理提供了新的方法与途径.  相似文献   

7.
This study proposes an analytical method for the simultaneous near infrared (NIR) spectrometric determination of palmitic, oleic, linoleic and linolenic acids in sea buckthorn seed oil. For this purpose, four different combinations of multivariate calibration methods and variable selections were evaluated: partial least squares (PLS) with full spectrum; PLS with uninformative variables elimination (UVE); PLS with competitive adaptive reweighted sampling (CARS); and multiple linear regression (MLR) with uninformative variable elimination combined with successive projections algorithm (UVE-SPA). An independent set of samples was employed to evaluate the performance of the resulting models. The UVE-SPA-MLR model developed with a few spectral variables provided the best results for each parameter. The values of relative errors of prediction (REP) from the UVE-SPA-MLR model for palmitic, oleic, linoleic and linolenic acids are 1.77%, 1.20%, 1.02% and 1.40%, respectively. These results indicate that this method is a feasible and fast method for the determination of the fatty acid content of sea buckthorn seed oil.  相似文献   

8.
构建支持向量机-偏最小二乘法为药物构效关系建模   总被引:6,自引:0,他引:6  
李剑  陈德钊  成忠  叶子青 《分析化学》2006,34(2):263-266
为研究药物构效关系积累样本数据的过程中,需为小样本建模。此时较易造成过拟合,影响模型的预测性能和稳定性。为此可用偏最小二乘(PLS)法从样本数据中成对地提取最优成分,消除自变量间的复共线性,并有效的降维,然后应用最小二乘支持向量机对成对成分进行非线性回归,并以基于误差修正的策略调整,使之更有效地表达自、因变量间的非线性关系。由此构建为EB-LSSVM-PLS算法,所建模型的预报精度高,稳定性良好。将其应用于新型黄烷酮类衍生物的QSAR建模,效果令人满意,其泛化性能优于其它方法。  相似文献   

9.
The calibration performance of partial least squares regression for one response (PLS1) can be improved by eliminating uninformative variables. Many variable-reduction methods are based on so-called predictor-variable properties or predictive properties, which are functions of various PLS-model parameters, and which may change during the steps of the variable-reduction process. Recently, a new predictive-property-ranked variable reduction method with final complexity adapted models, denoted as PPRVR-FCAM or simply FCAM, was introduced. It is a backward variable elimination method applied on the predictive-property-ranked variables. The variable number is first reduced, with constant PLS1 model complexity A, until A variables remain, followed by a further decrease in PLS complexity, allowing the final selection of small numbers of variables.  相似文献   

10.
The calibration performance of partial least squares for one response variable (PLS1) can be improved by elimination of uninformative variables. Many methods are based on so-called predictive variable properties, which are functions of various PLS-model parameters, and which may change during the variable reduction process. In these methods variable reduction is made on the variables ranked in descending order for a given variable property. The methods start with full spectrum modelling. Iteratively, until a specified number of remaining variables is reached, the variable with the smallest property value is eliminated; a new PLS model is calculated, followed by a renewed ranking of the variables. The Stepwise Variable Reduction methods using Predictive-Property-Ranked Variables are denoted as SVR-PPRV. In the existing SVR-PPRV methods the PLS model complexity is kept constant during the variable reduction process. In this study, three new SVR-PPRV methods are proposed, in which a possibility for decreasing the PLS model complexity during the variable reduction process is build in. Therefore we denote our methods as PPRVR-CAM methods (Predictive-Property-Ranked Variable Reduction with Complexity Adapted Models). The selective and predictive abilities of the new methods are investigated and tested, using the absolute PLS regression coefficients as predictive property. They were compared with two modifications of existing SVR-PPRV methods (with constant PLS model complexity) and with two reference methods: uninformative variable elimination followed by either a genetic algorithm for PLS (UVE-GA-PLS) or an interval PLS (UVE-iPLS). The performance of the methods is investigated in conjunction with two data sets from near-infrared sources (NIR) and one simulated set. The selective and predictive performances of the variable reduction methods are compared statistically using the Wilcoxon signed rank test. The three newly developed PPRVR-CAM methods were able to retain significantly smaller numbers of informative variables than the existing SVR-PPRV, UVE-GA-PLS and UVE-iPLS methods without loss of prediction ability. Contrary to UVE-GA-PLS and UVE-iPLS, there is no variability in the number of retained variables in each PPRV(R) method. Renewed variable ranking, after deletion of a variable, followed by remodelling, combined with the possibility to decrease the PLS model complexity, is beneficial. A preferred PPRVR-CAM method is proposed.  相似文献   

11.
In this paper, a fast strategy for determining the total antioxidant capacity of Chinese green tea extracts is developed. This strategy includes the use of experimental techniques, such as fast high-performance liquid chromatography (HPLC) on monolithic columns and a spectrophotometric approach to determine the total antioxidant capacity of the extracts. To extract the chemically relevant information from the obtained data, chemometrical approaches are used. Among them there are correlation optimized warping (COW) to align the chromatograms, robust principal component analysis (robust PCA) to detect outliers, and partial least squares (PLS) and uninformative variable elimination partial least squares (UVE-PLS) to construct a reliable multivariate regression model to predict the total antioxidant capacity from the fast chromatograms.  相似文献   

12.
This paper reports the results of a rapid method to determine sucrose in chocolate mass using near infrared spectroscopy (NIRS). We applied a broad-based calibration approach, which consists in putting together in one single calibration samples of various types of chocolate mass. This approach increases the concentration range for one or more compositional parameters, improves the model performance and requires just one calibration model for several recipes. The data were modelled using partial least squares (PLS) and multiple linear regression (MLR). The MLR models were developed using a variable selection based on the coefficient regression of PLS and genetic algorithm (GA). High correlation coefficients (0.998, 0.997, 0.998 for PLS, MLR and GA-MLR, respectively) and low prediction errors confirms the good predictability of the models. The results show that NIR can be used as rapid method to determine sucrose in chocolate mass in chocolate factories.  相似文献   

13.
应用近红外光谱(NIRS)技术结合偏最小二乘(PLS)和最小二乘支持向量机(LS-SVM)建立了附子中多指标成分的快速无损检测方法。选取38批样品建立了同时测定附子样品中6种成分含量的高效液相色谱(HPLC)方法;通过采集附子样品的NIRS图,分别采用PLS和LS-SVM建立了各个成分HPLC测定值与NIRS图的定量校正模型。所建立的苯甲酰新乌头原碱、苯甲酰乌头原碱、苯甲酰次乌头原碱、新乌头碱、次乌头碱、乌头碱、单酯型生物碱总量和双酯型生物碱总量LS-SVM模型的相对预测偏差(RPD)分别为3.3、3.2、4.1、7.7、8.8、7.6、4.0和8.6;验证集相关系数(rpre)分别为0.9486、0.9475、0.9668、0.9909、0.9946、0.9969、0.9669和0.9927,且LS-SVM模型优于PLS模型,说明NIRS模型验证集与HPLC测定值具有良好的非线性关系,模型预测效果良好。采用NIRS技术结合LS-SVM模型可以快速对附子中的上述6个生物碱含量以及单酯型生物碱总量和双酯型生物碱总量进行检测,方法操作简便,对控制附子中的生物碱含量具有一定的指导作用。  相似文献   

14.
《Analytical letters》2012,45(18):2914-2930
Abstract

American Petroleum Institute (API) gravity is an important parameter in the crude oil industry and the nitrogen compounds are related to the toxic effects of the oil in refineries and the environment. In this paper, 194 crude oil samples with API gravities ranging from 11.4 to 57.5 were used for the purpose of estimating the physicochemical properties: API gravity, total nitrogen content (TNC) and basic nitrogen content (BNC). Initially, infrared spectra in the mid and near regions (MIR and NIR) were collected, then full-spectral partial least squares (PLS) and the orthogonal projections to latent structures (OPLS) chemometric models were developed and validated, as well as models using interval PLS (iPLS), synergy interval PLS (siPLS) and competitive adaptive reweighted sampling PLS (CARSPLS) as variable selection tools. For API gravity and TNC, the best calibration technique is the NIR CARSPLS with a root mean square error of prediction (RMSEP) values of 0.9 and 0.0275?wt%, respectively. For BNC, the best technique is MIR siPLS with a prediction error of 0.0134?wt%. The results were validated based on the evaluation of the figures of merit, a statistical evaluation of the accuracy, characterization of the systematic error and measurement for errors in the residues. The results were satisfactory considering the high variability of the data and the diversity of the samples, demonstrating suitable applicability for practical analysis.  相似文献   

15.
Near infrared (NIR) spectrometry was used for the rapid characterization of quality parameters in desi chickpea flour (besan). Partial least square regression, principal component regression (PCR), interval partial least squares (iPLS), and synergy interval partial least squares (siPLS) were used to determine the protein, carbohydrate, fat, and moisture concentrations of besan. Spectra were collected in reflectance mode using a lab-built predispersive filter-based instrument from 700 to 2500?nm. The quality parameters were also determined by standard methods. The root mean square error (RMSE) for the calibration and validation sets was used to evaluate the performance of the models. The correlation coefficients for moisture, fat, protein, and carbohydrates in chickpea flour exceeded 0.96 using PLS and PCR models using the full spectral range. Wavelengths from 2100 to 2345?nm had the lowest RMSE for quality parameters by iPLS. The error was further decreased by 0.41, 0.1, and 1.1% for carbohydrates, fats, and proteins by siPLS. The NIR spectral regions yielding the lowest RMSE of prediction were 1620–2345?nm for carbohydrates, 1180–1590?nm and 1860–2094?nm for fat, and 1700–2345?nm for proteins. The study shows that chickpea flour quality parameters were accurately determined using the optimized wavelengths.  相似文献   

16.
将多模型共识偏最小二乘法用于近红外光谱定量分析。利用随机抽取的训练子集建立一系列偏最小二乘模型,选取其中性能较好的部分模型作为成员模型,用这些成员模型来预测未知样品。将该方法用于一组生物样本的近红外光谱与样品中人血清白蛋白、γ-球蛋白以及葡萄糖含量之间的建模研究,并与单模型偏最小二乘法了进行比较。结果 PLS对独立测试集中三种组分进行50次重复预测的平均RMSEP分别为0.1066,0.0853和0.1338,RMSEP的标准偏差分别为0.0174,0.0144和0.0416;而本方法重复预测的平均RMSEP分别为0.0715,0.0750和0.0781,RMSEP的标准偏差分别为0.0033,0.2729×10-4和0.0025。  相似文献   

17.
The on‐line monitoring of batch processes based on principal component analysis (PCA) has been widely studied. Nonetheless, researchers have not paid so much attention to the on‐line application of partial least squares (PLS). In this paper, the influence of several issues in the predictive power of a PLS model for the on‐line estimation of key variables in a batch process is studied. Some of the conclusions can help to better understand the capabilities of the proposals presented for on‐line PCA‐based monitoring. Issues like the convenience of batch‐wise or variable‐wise unfolding, the method for the imputation of future measurements and the use of several sub‐models are addressed. This is the first time that the adaptive hierarchical (or multi‐block) approach is extended to the PLS modelling. Also, the formulation of the so‐called trimmed scores regression (TSR), a powerful imputation method defined for PCA, is extended for its application with PLS modelling. Data from two processes, one simulated and one real, are used to illustrate the results. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

18.
Hui Chen  Zan Lin  Tong Wu 《Analytical letters》2018,51(17):2695-2707
Textile products must be marked by fabric type and composition on the label and cotton is by far the most important fiber in the industry and often needs fast quantitative analysis. The corresponding standard methods are very time-consuming and labor-intensive. The work focuses on exploring the feasibility of combining near-infrared (NIR) spectroscopy and interval-based partial least squares (iPLS) for determining cotton content in textiles. Three types of partial least square (PLS)-based algorithms were used for experimental measurements. A total of 91 cloth samples with cotton content ranging from 0 to 100% (w/w) were collected and all compositions are commercially available on the market in China. In all cases, the original spectrum axis was split into 20 subintervals. As a result, three final models, i.e., the iPLS model on a single subinterval, the backward interval partial least squares (biPLS) model on the region remaining six subintervals, and the moving window partial least squares (mwPLS) model with a window of 75 variables, achieved better results than the full-spectrum PLS model. Also, no obvious differences in performance were observed for the three models. Thus, either iPLS or mwPLS was preferred considering their simplicity, which suggested that iPLS and mwPLS combined with NIR technique may have potential for the rapid determination of the cotton content of textile products with comparable accuracy to standard procedures. In addition, this approach may have commercial and regulatory advantages that avoid labor-intensive and time-consuming chemical analysis.  相似文献   

19.
Extension of standard regression to the case of multiple regressor arrays is given via the Kronecker product. The method is illustrated using ordinary least squares regression (OLS) as well as the latent variable (LV) methods principal component regression (PCR) and partial least squares regression (PLS). Denoting the method applied to PLS as mrPLS, the latter was shown to explain as much or more variance for the first LV relative to the comparable L‐partial least squares regression (L‐PLS) model. The same relationship holds when mrPLS is compared to PLS or n‐way partial least squares (N‐PLS) and the response array is 2‐way or 3‐way, respectively, where the regressor array corresponding to the first mode of the response array is 2‐way and the second mode regressor array is an identity matrix. In a comparison with N‐PLS using fragrance data, mrPLS proved superior in a validation sense when model selection was used. Though the focus is on 2‐way regressor arrays, the method can be applied to n‐way regressors via N‐PLS. Copyright © 2007 John Wiley & Sons, Ltd.  相似文献   

20.
In this work, different approaches for variable selection are studied in the context of near-infrared (NIR) multivariate calibration of textile. First, a model-based regression method is proposed. It consists in genetic algorithm optimisation combined with partial least squares regression (GA-PLS). The second approach is a relevance measure of spectral variables based on mutual information (MI), which can be performed independently of any given regression model. As MI makes no assumption on the relationship between X and Y, non-linear methods such as feed-forward artificial neural network (ANN) are thus encouraged for modelling in a prediction context (MI-ANN). GA-PLS and MI-ANN models are developed for NIR quantitative prediction of cotton content in cotton-viscose textile samples. The results are compared to full-spectrum (480 variables) PLS model (FS-PLS). The model requires 11 latent variables and yielded a 3.74% RMS prediction error in the range 0-100%. GA-PLS provides more robust model based on 120 variables and slightly enhanced prediction performance (3.44% RMS error). Considering MI variable selection procedure, great improvement can be obtained as 12 variables only are retained. On the basis of these variables, a 12 inputs ANN model is trained and the corresponding prediction error is 3.43% RMS error.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号