首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
A novel method for underdetermined regression problems, multicomponent self-organizing regression (MCSOR), has been recently introduced. Here, its performance is compared with partial least-squares (PLS), which is perhaps the most widely adopted multivariate method in chemometrics. A potpourri of models is presented, and MCSOR appears to provide highly predictive models that are comparable with or better than the corresponding PLS models in large internal (leave-one-out, LOO) and pseudo-external (leave-many-out, LMO) validation tests. The “blind” external predictive ability of MCSOR and PLS is demonstrated employing large melting point, factor Xa, log?P and log?S data sets. In a nutshell, MCSOR is fast, conceptually simple (employing multiple linear regression, MLR, as a statistical tool), and applicable to all kinds of multivariate problems with single Y-variable.  相似文献   

2.
Self-Organizing Molecular Field Analysis (SOMFA) comes with a built-in regression methodology, the Self-Organizing Regression (SOR), instead of relying on external methods such as PLS. In this article we present a proof of the equivalence between SOR and SIMPLS with one principal component. Thus, the modest performance of SOMFA on complex datasets can be primarily attributed to the low performance of the SOMFA regression methodology. A multi-component extension of the original SOR methodology (MCSOR) is introduced, and the performances of SOR, MCSOR and SIMPLS are compared using several datasets. The results indicate that in general the performance of SOMFA models is greatly improved if SOR is replaced with a more sophisticated regression method. The results obtained for the Cramer (CBG) dataset further underline the fact that it is a very poor benchmark dataset and should not be used to evaluate the performance of QSAR techniques.  相似文献   

3.
A new variable selection algorithm is described, based on ant colony optimization (ACO). The algorithm aim is to choose, from a large number of available spectral wavelengths, those relevant to the estimation of analyte concentrations or sample properties when spectroscopic analysis is combined with multivariate calibration techniques such as partial least-squares (PLS) regression. The new algorithm employs the concept of cooperative pheromone accumulation, which is typical of ACO selection methods, and optimizes PLS models using a pre-defined number of variables, employing a Monte Carlo approach to discard irrelevant sensors. The performance has been tested on a simulated system, where it shows a significant superiority over other commonly employed selection methods, such as genetic algorithms. Several near infrared spectroscopic experimental data sets have been subjected to the present ACO algorithm, with PLS leading to improved analytical figures of merit upon wavelength selection. The method could be helpful in other chemometric activities such as classification or quantitative structure-activity relationship (QSAR) problems.  相似文献   

4.
5.
A method for sulfur determination in diesel fuel employing near infrared spectroscopy, variable selection and multivariate calibration is described. The performances of principal component regression (PCR) and partial least square (PLS) chemometric methods were compared with those shown by multiple linear regression (MLR), performed after variable selection based on the genetic algorithm (GA) or the successive projection algorithm (SPA). Ninety seven diesel samples were divided into three sets (41 for calibration, 30 for internal validation and 26 for external validation), each of them covering the full range of sulfur concentrations (from 0.07 to 0.33% w/w). Transflectance measurements were performed from 850 to 1800 nm. Although principal component analysis identified the presence of three groups, PLS, PCR and MLR provided models whose predicting capabilities were independent of the diesel type. Calibration with PLS and PCR employing all the 454 wavelengths provided root mean square errors of prediction (RMSEP) of 0.036% and 0.043% for the validation set, respectively. The use of GA and SPA for variable selection provided calibration models based on 19 and 9 wavelengths, with a RMSEP of 0.031% (PLS-GA), 0.022% (MLR-SPA) and 0.034% (MLR-GA). As the ASTM 4294 method allows a reproducibility of 0.05%, it can be concluded that a method based on NIR spectroscopy and multivariate calibration can be employed for the determination of sulfur in diesel fuels. Furthermore, the selection of variables can provide more robust calibration models and SPA provided more parsimonious models than GA.  相似文献   

6.
7.
8.
9.
10.
11.
12.
13.
A QSAR model for predicting the blood brain barrier permeability (BBBP) in a large and heterogeneous variety of compounds (136 compounds) has been developed using approximate similarity (AS) matrices as predictors and PLS as multivariate regression technique. AS values fuse information of both the isomorphic similarity and nonisomorphic dissimilarity with the purpose of achieving an accurate predictive space. In addition to the fact of applying AS values to heterogeneous data sets, a new concept on graph isomorphism based on the extended maximum common subgraph (EMCS) is defined for the building of AS spaces considering the atoms and bonds, which are bridges between the isomorphic and nonisomorphic substructures. This new isomorphism detection has as objective to take into account the position and nature of the nucleus substituents, thus allowing the development of accurate models for large and diverse sets of compounds. After an outliers study, the training and test stages were made and the results obtained using several AS approaches were compared. Several validation processes were carried out by means of employing several test sets, and high predictive ability was obtained for all the cases (Q(2) = 0.81 and standard error in prediction, SEP = 0.29).  相似文献   

14.
Carbamazepine is a poorly soluble drug, with known bioavailability problems related to its polymorphism, and a form (C-monoclinic or form IV) less soluble than the pharmaceutically acceptable (P-monoclinic or form III) can be formed under various conditions, possible to occur during drug formulation. Therefore, quantitative analysis of form IV in form III is important to the drug formulators. In the present study, a fast and simple non-destructive method was developed for quantification of form IV in form III, by using DRIFTS spectral data subjected to the standard normal variate transformation (row centering and scaling) and to the lazy learning algorithm. Fast principal component (fast PCR) and partial least squares (PLS) regression methods of multivariate calibration were also used, which were compared with lazy learning. The lazy learning algorithm was performing better than the fast PCR and PLS methods (root mean squared error of cross-validation 1.318% versus 3.337 and 3.058%, respectively). Even with a small number of calibration samples it gave satisfactory predictive performance (root mean squared error of prediction <2.0% versus >3.3% of fast PCR and >2.6% of PLS), in the concentration range below 30% (w/w) of form IV. This is attributed to the capability of handling non-linearity in the relation of reflectance and concentration as well as to local modeling using a pre-selected number of nearest neighbor concentrations.  相似文献   

15.
Near-infrared (NIR) spectrometry will present a more promising tool for quantitative measurement if the robustness and predictive ability of the partial least square (PLS) model are improved. In order to achieve the purpose, we present a new algorithm for simultaneous wavelength selection and outlier detection; at the same time, the problems of background and noise in multivariate calibration are also solved. The strategy is a combination of continuous wavelet transform (CWT) and modified iterative predictors and objects weighting PLS (mIPOW-PLS). CWT is performed as a pretreatment tool for eliminating background and noise synchronously; then, mIPOW-PLS is proposed to remove both the useless wavelengths and the multiple outliers in CWT domain. After pretreatment with CWT-mIPOW-PLS, a PLS model is built finally for prediction. The results indicate that the combination of CWT and mIPOW-PLS produces robust and parsimonious regression models with very few wavelengths.  相似文献   

16.
《Analytica chimica acta》2003,480(1):23-37
Carrageenans are natural products obtained from seaweeds which are used as additives in many industrial fields, mainly in the food industry (labelled as E407). The three most employed ones being the so-called κ-, ι- and λ-carrageenans. So far, their industrial characterisation is based on physical measurements, which exhibit a rather large uncertainty. The aim of this work is to develop a new analytical methodology for the quantitative determination of carrageenans in industrial mixtures employing FTIR and multivariate regression (partial least squares, PLS), avoiding complex sample pre-treatment steps and reducing the turnaround time. The methodology allows to handle liquid (dissolved) carrageenans at room temperature (to prevent degradation), and therefore, their straightforward IR measurement using thin films. The standard prediction errors for the different models selected range from 3.3 to 4.2%, which can be considered as excellent.  相似文献   

17.
18.
19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号