首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Balabin RM  Lomakina EI 《The Analyst》2011,136(8):1703-1712
In this study, we make a general comparison of the accuracy and robustness of five multivariate calibration models: partial least squares (PLS) regression or projection to latent structures, polynomial partial least squares (Poly-PLS) regression, artificial neural networks (ANNs), and two novel techniques based on support vector machines (SVMs) for multivariate data analysis: support vector regression (SVR) and least-squares support vector machines (LS-SVMs). The comparison is based on fourteen (14) different datasets: seven sets of gasoline data (density, benzene content, and fractional composition/boiling points), two sets of ethanol gasoline fuel data (density and ethanol content), one set of diesel fuel data (total sulfur content), three sets of petroleum (crude oil) macromolecules data (weight percentages of asphaltenes, resins, and paraffins), and one set of petroleum resins data (resins content). Vibrational (near-infrared, NIR) spectroscopic data are used to predict the properties and quality coefficients of gasoline, biofuel/biodiesel, diesel fuel, and other samples of interest. The four systems presented here range greatly in composition, properties, strength of intermolecular interactions (e.g., van der Waals forces, H-bonds), colloid structure, and phase behavior. Due to the high diversity of chemical systems studied, general conclusions about SVM regression methods can be made. We try to answer the following question: to what extent can SVM-based techniques replace ANN-based approaches in real-world (industrial/scientific) applications? The results show that both SVR and LS-SVM methods are comparable to ANNs in accuracy. Due to the much higher robustness of the former, the SVM-based approaches are recommended for practical (industrial) application. This has been shown to be especially true for complicated, highly nonlinear objects.  相似文献   

2.
In the present study, different multivariate regression techniques have been applied to two large near-infrared data sets of feed and feed ingredients in order to fulfil the regulations and laws that exist about the chemical composition of these products. The aim of this paper was to compare the performances of different linear and nonlinear multivariate calibration techniques: PLS, ANN and LS-SVM. The results obtained show that ANN and LS-SVM are very powerful methods for non-linearity but LS-SVM can also perform quite well in the case of linear models. Using LS-SVM an improvement of the RMS for independent test sets of 10% is obtained in average compared to ANN and of 24% compared to PLS.  相似文献   

3.
支持向量机分类和回归用于肽的QSAR研究   总被引:4,自引:0,他引:4  
周鹏  曾晖  李波  周原  李志良 《化学通报》2006,69(5):342-346
使用支持向量机技术对两类肽化合物体系进行了分类和回归研究,并将其系统地与K最邻近法、多元线性回归、偏最小二乘、人工神经网络进行了比较。结果表明,对于小样本、非线性问题,支持向量机具有较强的稳定性能及泛化能力,在大多数情况下能够得到优于传统方法的建模效果。对于分类问题,支持向量机对训练集和测试集都达到了100%的分类正确率;对于回归问题,支持向量机虽对训练集样本拟合效果略低于人工神经网络,但对外部测试集却表现出较强的预测能力。  相似文献   

4.
5.
Balabin RM  Smirnov SV 《The Analyst》2012,137(7):1604-1610
Modern analytical chemistry of industrial products is in need of rapid, robust, and cheap analytical methods to continuously monitor product quality parameters. For this reason, spectroscopic methods are often used to control the quality of industrial products in an on-line/in-line regime. Vibrational spectroscopy, including mid-infrared (MIR), Raman, and near-infrared (NIR), is one of the best ways to obtain information about the chemical structures and the quality coefficients of multicomponent mixtures. Together with chemometric algorithms and multivariate data analysis (MDA) methods, which were especially created for the analysis of complicated, noisy, and overlapping signals, NIR spectroscopy shows great results in terms of its accuracy, including classical prediction error, RMSEP. However, it is unclear whether the combined NIR + MDA methods are capable of dealing with much more complex interpolation or extrapolation problems that are inevitably present in real-world applications. In the current study, we try to make a rather general comparison of linear, such as partial least squares or projection to latent structures (PLS); "quasi-nonlinear", such as the polynomial version of PLS (Poly-PLS); and intrinsically non-linear, such as artificial neural networks (ANNs), support vector regression (SVR), and least-squares support vector machines (LS-SVM/LSSVM), regression methods in terms of their robustness. As a measure of robustness, we will try to estimate their accuracy when solving interpolation and extrapolation problems. Petroleum and biofuel (biodiesel) systems were chosen as representative examples of real-world samples. Six very different chemical systems that differed in complexity, composition, structure, and properties were studied; these systems were gasoline, ethanol-gasoline biofuel, diesel fuel, aromatic solutions of petroleum macromolecules, petroleum resins in benzene, and biodiesel. Eighteen different sample sets were used in total. General conclusions are made about the applicability of ANN- and SVM-based regression tools in the modern analytical chemistry. The effectiveness of different multivariate algorithms is different when going from classical accuracy to robustness. Neural networks, which are capable of producing very accurate results with respect to classical RMSEP, are not able to solve interpolation problems or, especially, extrapolation problems. The chemometric methods that are based on the support vector machine (SVM) ideology are capable of solving both classical regression and interpolation/extrapolation tasks.  相似文献   

6.
A method for the quantification of density of Chinese Fir samples based on visible/near-infrared (vis–NIR) spectrometry and least squares-support vector machine (LS-SVM) was proposed. Sample set partitioning based on joint xy distances (SPXY) algorithm was used for dividing calibration and prediction samples, it is of value for prediction of property involving complex matrices. A stepwise procedure is employed to select samples according to their differences in both x (instrumental responses) and y (predicted parameter) spaces. For comparison, the models were also constructed by Kennard–Stone method, as well as by using the duplex and random sampling methods for subset partitioning. The results revealed that the SPXY algorithm may be an advantageous alternative to the other three strategies. To validate the reliability of LS-SVM, comparisons were made among other modeling methods such as support vector machine (SVM) and partial least squares (PLS) regression. Satisfactory models were built using LS-SVM, with lower prediction errors and superior performance in relation to SVM and PLS. These results showed possibility of building robust models to quantify the density of Chinese Fir using near-infrared spectroscopy and LS-SVM combined SPXY algorithm as a nonlinear multivariate calibration procedure.  相似文献   

7.
Ternary mixtures of thiamin, riboflavin and pyridoxal have been simultaneously determined in synthetic and real samples by applications of spectrophotometric and least-squares support vector machines. The calibration graphs were linear in the ranges of 1.0 - 20.0, 1.0 - 10.0 and 1.0 - 20.0 microg ml(-1) with detection limits of 0.6, 0.5 and 0.7 microg ml(-1) for thiamin, riboflavin and pyridoxal, respectively. The experimental calibration matrix was designed with 21 mixtures of these chemicals. The concentrations were varied between calibration graph concentrations of vitamins. The simultaneous determination of these vitamin mixtures by using spectrophotometric methods is a difficult problem, due to spectral interferences. The partial least squares (PLS) modeling and least-squares support vector machines were used for the multivariate calibration of the spectrophotometric data. An excellent model was built using LS-SVM, with low prediction errors and superior performance in relation to PLS. The root mean square errors of prediction (RMSEP) for thiamin, riboflavin and pyridoxal with PLS and LS-SVM were 0.6926, 0.3755, 0.4322 and 0.0421, 0.0318, 0.0457, respectively. The proposed method was satisfactorily applied to the rapid simultaneous determination of thiamin, riboflavin and pyridoxal in commercial pharmaceutical preparations and human plasma samples.  相似文献   

8.
9.
This tutorial provides a concise overview of support vector machines and different closely related techniques for pattern classification. The tutorial starts with the formulation of support vector machines for classification. The method of least squares support vector machines is explained. Approaches to retrieve a probabilistic interpretation are covered and it is explained how the binary classification techniques can be extended to multi-class methods. Kernel logistic regression, which is closely related to iteratively weighted least squares support vector machines, is discussed. Different practical aspects of these methods are addressed: the issue of feature selection, parameter tuning, unbalanced data sets, model evaluation and statistical comparison. The different concepts are illustrated on three real-life applications in the field of metabolomics, genetics and proteomics.  相似文献   

10.
《Analytical letters》2012,45(16):2398-2411
In this paper, three different types of biodiesel, which were synthesized from peanut, corn, and canola oils, were characterized by positive-ion electrospray ionization (ESI) and Fourier transform ion cyclotron resonance mass spectrometry (FT-ICR MS). Different biodiesel/diesel blends containing 2–90% (V/V) of each biodiesel type were prepared and analyzed by near infrared spectroscopy (NIR). In the next step, the chemometric methods of hierarchical clusters analysis (HCA), principal component analysis (PCA), and support vector machines (SVM) were used for exploratory analysis of the different biodiesel samples, and the SVM was able to give the best classification results (correct classification of 50 peanut and 50 corn samples, and only one misclassification out of 49 canola samples). Then, partial least squares (PLS) and multivariate adaptive regression splines (MARS) models were evaluated for biodiesel quantification. Both methods were considered equivalent for quantification purposes based on the values smaller than 5% for the root mean square error of calibration (RMSEC) and root mean square of validation (RMSEP), as well as Pearson correlation coefficients of at least 0.969. The combination of NIR to the chemometric techniques of SVM and PLS/MARS was proven to be appropriate to classify and quantify biodiesel from different origins.  相似文献   

11.
In recent 10 years, like other disciplines influenced by the fast development of PC technique, chemometrics has been used in many analytical methods, especially in instrumental analysis. This article describes applications and comparison of multivariate linear regression (MLR), principal component analysis (PCA), principal component regression (PCR), partial least square (PLS), neural network (ANN), fuzzy and model recognition. A better calibration method can be a great help to improve the efficiency of the routine analytical work.  相似文献   

12.
13.
Ren S  Gao L 《The Analyst》2011,136(6):1252-1261
This paper suggests a novel method named DF-LS-SVM, which is based on least squares support vector machines (LS-SVM) regression combined with data fusion (DF) to enhance the ability to extract characteristic information and improve the quality of the regression. Simultaneous multicomponent determination of Fe(III), Co(II) and Cu(II) was conducted for the first time by using the proposed method. Data fusion is a technique that integrates information from disparate sources to produce a single model or decision. The LS-SVM technique allows for learning a high-dimensional feature with fewer training data, and reduces the computational complexity by only requiring the solution of a set of linear equations instead of a quadratic programming problem. Experimental results showed that the DF-LS-SVM method was successful for simultaneous multicomponent determination even when severe overlap of spectra existed. The DF-LS-SVM method is an attractive and promising hybrid approach that combines the best properties of the two techniques. The results obtained from an additional test case, simultaneous differential pulse voltammetric determination of o-nitrophenol, m-nitrophenol and p-nitrophenol, also demonstrated that the DF-LS-SVM method performed somewhat better than LS-SVM and PLS methods.  相似文献   

14.
Successful applications of multivariate calibration in the field of electrochemistry have been recently reported, using various approaches such as multilinear regression (MLR), continuum regression, partial least squares regression (PLS) and artificial neural networks (ANN). Despite the good performance of these methods, it is nowadays accepted that they can benefit from data transformations aiming at removing baseline effects, reducing noise and compressing the data. In this context the wavelet transform seems a very promising tool. Here, we propose a methodology, based on the fast wavelet transform, for feature selection prior to calibration. As a benchmark, a data set consisting of lead and thallium mixtures measured by differential pulse anodic stripping voltammetry and giving seriously overlapped responses has been used. Three regression techniques are compared: MLR, PLS and ANN. Good predictive and effective models are obtained. Through inspection of the reconstructed signals, identification and interpretation of significant regions in the voltammograms are possible.  相似文献   

15.
This paper proposes the use of the least-squares support vector machine (LS-SVM) as an alternative multivariate calibration method for the simultaneous quantification of some common adulterants (starch, whey or sucrose) found in powdered milk samples, using near-infrared spectroscopy with direct measurements by diffuse reflectance. Due to the spectral differences of the three adulterants a nonlinear behavior is present when all groups of adulterants are in the same data set, making the use of linear methods such as partial least squares regression (PLSR) difficult. Excellent models were built using LS-SVM, with low prediction errors and superior performance in relation to PLSR. These results show it possible to built robust models to quantify some common adulterants in powdered milk using near-infrared spectroscopy and LS-SVM as a nonlinear multivariate calibration procedure.  相似文献   

16.
In this study, different approaches to the multivariate calibration of the vapors of two refrigerants are reported. As the relationships between the time-resolved sensor signals and the concentrations of the analytes are nonlinear, the widely used partial least-squares regression (PLS) fails. Therefore, different methods are used, which are known to be able to deal with nonlinearities present in data. First, the Box–Cox transformation, which transforms the dependent variables nonlinearly, was applied. The second approach, the implicit nonlinear PLS regression, tries to account for nonlinearities by introducing squared terms of the independent variables to the original independent variables. The third approach, quadratic PLS (QPLS), uses a nonlinear quadratic inner relationship for the model instead of a linear relationship such as PLS. Tree algorithms are also used, which split a nonlinear problem into smaller subproblems, which are modeled using linear methods or discrete values. Finally, neural networks are applied, which are able to model any relationship. Different special implementations, like genetic algorithms with neural networks and growing neural networks, are also used to prevent an overfitting. Among the fast and simpler algorithms, QPLS shows good results. Different implementations of neural networks show excellent results. Among the different implementations, the most sophisticated and computing-intensive algorithms (growing neural networks) show the best results. Thus, the optimal method for the data set presented is a compromise between quality of calibration and complexity of the algorithm.Electronic Supplementary Material Supplementary material is available for this article at  相似文献   

17.
周鹏  梅虎  田菲菲  李志良 《应用化学》2006,23(12):1410-0
支持向量机;定量构性相关;高聚物;折射率  相似文献   

18.
Quantitative structure-activity relationship (QSAR) studies based on chemometric techniques are reviewed. Partial least squares (PLS) is introduced as a novel robust method to replace classical methods such as multiple linear regression (MLR). Advantages of PLS compared to MLR are illustrated with typical applications. Genetic algorithm (GA) is a novel optimization technique which can be used as a search engine in variable selection. A novel hybrid approach comprising GA and PLS for variable selection developed in our group (GAPLS) is described. The more advanced method for comparative molecular field analysis (CoMFA) modeling called GA-based region selection (GARGS) is described as well. Applications of GAPLS and GARGS to QSAR and 3D-QSAR problems are shown with some representative examples. GA can be hybridized with nonlinear modeling methods such as artificial neural networks (ANN) for providing useful tools in chemometric and QSAR.  相似文献   

19.
20.
Partial least-squares (PLS) regression, singular value decomposition-based PLS, and an artificial neural network (ANN) were tested as calibration procedures for the simultaneous determination of promethazine, chlorpromazine, and perphenazine by both conventional and derivative spectrophotometry. Comparison of the results revealed that the application of the ANN to the derivative spectra is superior to the application of the 2 PLS methods used. Different binary and ternary synthetic mixtures of the phenothiazine drugs in pure form and in tablets were analyzed by the proposed method, and acceptable results were obtained.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号