首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Quantitative structure-activity relationship (QSAR) studies based on chemometric techniques are reviewed. Partial least squares (PLS) is introduced as a novel robust method to replace classical methods such as multiple linear regression (MLR). Advantages of PLS compared to MLR are illustrated with typical applications. Genetic algorithm (GA) is a novel optimization technique which can be used as a search engine in variable selection. A novel hybrid approach comprising GA and PLS for variable selection developed in our group (GAPLS) is described. The more advanced method for comparative molecular field analysis (CoMFA) modeling called GA-based region selection (GARGS) is described as well. Applications of GAPLS and GARGS to QSAR and 3D-QSAR problems are shown with some representative examples. GA can be hybridized with nonlinear modeling methods such as artificial neural networks (ANN) for providing useful tools in chemometric and QSAR.  相似文献   

2.
3.
Fuzzy adaptive least squares (FALS), a pattern recognition method designed to correlate molecular structure with activity rating, has been developed. A novel feature of FALS is that the degree to which each sample belongs to an activity class is given using a membership function. The algorithm involves an iterative modification of forcing factors to maximize the sum of the membership function values over all samples. This paper first describes the method and calculation procedure of FALS89 (1989 version of FALS), and then shows its application to the correlation of structure with a potency rating of anticarcinogenic mitomycin derivatives and arginine-vasopressin antagonists. FALS89 applied to these samples showed considerably high reliability in both recognition and leave-one-out prediction.  相似文献   

4.
The calibration performance of partial least squares for one response variable (PLS1) can be improved by elimination of uninformative variables. Many methods are based on so-called predictive variable properties, which are functions of various PLS-model parameters, and which may change during the variable reduction process. In these methods variable reduction is made on the variables ranked in descending order for a given variable property. The methods start with full spectrum modelling. Iteratively, until a specified number of remaining variables is reached, the variable with the smallest property value is eliminated; a new PLS model is calculated, followed by a renewed ranking of the variables. The Stepwise Variable Reduction methods using Predictive-Property-Ranked Variables are denoted as SVR-PPRV. In the existing SVR-PPRV methods the PLS model complexity is kept constant during the variable reduction process. In this study, three new SVR-PPRV methods are proposed, in which a possibility for decreasing the PLS model complexity during the variable reduction process is build in. Therefore we denote our methods as PPRVR-CAM methods (Predictive-Property-Ranked Variable Reduction with Complexity Adapted Models). The selective and predictive abilities of the new methods are investigated and tested, using the absolute PLS regression coefficients as predictive property. They were compared with two modifications of existing SVR-PPRV methods (with constant PLS model complexity) and with two reference methods: uninformative variable elimination followed by either a genetic algorithm for PLS (UVE-GA-PLS) or an interval PLS (UVE-iPLS). The performance of the methods is investigated in conjunction with two data sets from near-infrared sources (NIR) and one simulated set. The selective and predictive performances of the variable reduction methods are compared statistically using the Wilcoxon signed rank test. The three newly developed PPRVR-CAM methods were able to retain significantly smaller numbers of informative variables than the existing SVR-PPRV, UVE-GA-PLS and UVE-iPLS methods without loss of prediction ability. Contrary to UVE-GA-PLS and UVE-iPLS, there is no variability in the number of retained variables in each PPRV(R) method. Renewed variable ranking, after deletion of a variable, followed by remodelling, combined with the possibility to decrease the PLS model complexity, is beneficial. A preferred PPRVR-CAM method is proposed.  相似文献   

5.
Kernel partial least squares (KPLS) has become a popular technique for regression and classification of complex data sets, which is a nonlinear extension of linear PLS in which training samples are transformed into a feature space via a nonlinear mapping. The PLS algorithm can then be carried out in the feature space. In the present study, we attempt to develop a novel tree KPLS (TKPLS) classification algorithm by constructing an informative kernel on the basis of decision tree ensembles. The constructed tree kernel can effectively discover the similarities of samples and select informative features by variable importance ranking in the process of building the kernel. Simultaneously, TKPLS can also handle nonlinear relationships in the structure–activity relationship data by such a kernel. Finally, three data sets related to different categorical bioactivities of compounds are used to evaluate the performance of TKPLS. The results show that the TKPLS algorithm can be regarded as an alternative and promising classification technique. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

6.
The number of latent variables (LVs) or the factor number is a key parameter in PLS modeling to obtain a correct prediction. Although lots of work have been done on this issue, it is still a difficult task to determine a suitable LV number in practical uses. A method named independent factor diagnostics (IFD) is proposed for investigation of the contribution of each LV to the predicted results on the basis of discussion about the determination of LV number in PLS modeling for near infrared (NIR) spectra of complex samples. The NIR spectra of three data sets of complex samples, including a public data set and two tobacco lamina ones, are investigated. It is shown that several high order LVs constitute main contributions to the predicted results, albeit the contribution of the low order LVs should not be neglected in the PLS models. Therefore, in practical uses of PLS for analysis of complex samples, it may be better to use a slightly large LV number for NIR spectral analysis of complex samples. Supported by the National Natural Science Foundation of China (Grant Nos. 20775036 & 20835002)  相似文献   

7.
The present work used multivariate calibration by Partial Least Squares (PLS) to produce a Net Analyte Signal as a way of establishing the independent influence of each phase in the Quantitative Phase Analysis with the Rietveld method for three sources of potential error: preferred orientation, linear absorption and counting statistics. Ternary mixtures of Al2O3, MgO and NiO were employed and organized in three groups with different degrees of variation in the weight fractions of the three constituents. An analysis of variance indicated that the partial selectivity of the least variation group differed significantly from the other groups. As for the phases, MgO partial selectivity was significantly different. This is due to a strong correlation between the linear absorption and counting statistics in the region of the (2 0 0) reflection of the MgO phase that is strongly affected by preferred orientation and also corresponds to the strongest reflection for MgO as well as for NiO. On the whole, by using matrices of similarity, a great similarity was observed between the nominal weight fractions of the phases and the weight fractions observed by means of the Rietveld method. However, such similarity diminishes as the weight fractions of the phases of the mixture become closer to each other and, in the group of mixtures with least variation of weight fractions, the method is unable to quantify the small differences between the phases, even if these errors may be considered small relative to the weight fractions themselves. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

8.
9.
As the structural diversity in a quantitative structure-activity relationship (QSAR) model increases, constructing a good model becomes increasingly difficult, and simply performing variable selection might not be sufficient to improve the model quality to make it practically usable. To combat this difficulty, an approach based on piecewise hypersphere modeling by particle swarm optimization (PHMPSO) is developed in this paper. It treats the linear models describing the sought-for subsets as hyperspheres which have different radii in the data space. According to the attribute of each hypersphere, all compounds in the training set are allocated to hyperspheres to construct submodels, and particle swarm optimization (PSO) is applied to search the optimal hyperspheres for finding satisfactory piecewise linear models. A new objective function is formulated to determine the appropriate piecewise models. The performance is assessed using three QSAR data sets. Experimental results have shown the good performance of this technique in improving the QSAR modeling.  相似文献   

10.
A curve fitting technique for optical spectra based on a robust estimator, least median squares (LMedS), is introduced in this study. For the effective calculation of LMedS, particle swarm optimization (PSO) is also introduced. Unlike a standard curve fitting method using least squares (LS) estimator, the method based on LMedS estimator is less influenced by outliers in experimental data. Two kinds of data sets, simulated data with outliers and temperature-dependent near-infrared (NIR) spectra of oleic acid (OA) are applied for the demonstration of the proposed method. The results clearly reveal that, compared with the LS estimator, the proposed method can effectively reduce undesirable effects of low SN ratio and can yield more accurate fitting results.  相似文献   

11.
12.
13.
Changeable size moving window partial least squares (CSMWPLS) and searching combination moving window partial least squares (SCMWPLS) are proposed to search for an optimized spectral interval and an optimized combination of spectral regions from informative regions obtained by a previously proposed spectral interval selection method, moving window partial least squares (MWPLSR) [Anal. Chem. 74 (2002) 3555]. The utilization of informative regions aims to construct better PLS models than those based on the whole spectral points. The purpose of CSMWPLS and SCMWPLS is to optimize the informative regions and their combination to further improve the prediction ability of the PLS models. The results of their application to an open-path (OP)/FT-IR spectra data set show that the proposed methods, especially SCMWPLS can find out an optimized combination, with which one can improve, often significantly, the performance of the corresponding PLS model, in terms of low prediction error, root mean square error of prediction (RMSEP) with the reasonable latent variable (LVs) number, comparing with the results obtained using whole spectra or direct combination of informative regions for a compound. Regions consisting of the combinations obtained can easily be explained by the existence of IR absorption bands in those spectral regions.  相似文献   

14.
The combination of unfolded partial least‐squares (U‐PLS) with residual bilinearization (RBL) provides a second‐order multivariate calibration method capable of achieving the second‐order advantage. RBL is performed by varying the test sample scores in order to minimize the residues of a combined U‐PLS model for the calibrated components and a principal component model for the potential interferents. The sample scores are then employed to predict the analyte concentration, with regression coefficients taken from the calibration step. When the contribution of multiple potential interferents is severe, particle swarm optimization (PSO) helps in preventing RBL to be trapped by false minima, restoring its predictive ability and making it comparable to the standard parallel factor (PARAFAC) analysis. Both simulated and experimental systems are analyzed in order to show the potentiality of the new technique. Copyright © 2007 John Wiley & Sons, Ltd.  相似文献   

15.
In this work we compare the analytical results obtained by traditional calibration curves (CC) and multivariate Partial Least Squares (PLS) algorithm when applied to the LIBS spectra obtained from ten brass samples (nine standards of known composition and one ‘unknown’). Both major (Cu and Zn) and trace (Sn, Pb, Fe) elements in the sample matrix were analyzed. After the analysis, the composition of the ‘unknown’ sample, measured by X-ray Fluorescence (XRF) technique, was revealed. The predicted concentrations of major elements obtained by rapid PLS algorithms are in very good agreement with the nominal concentrations, as well as with those obtained by the more time-consuming CC approach. A discussion about the possible effects leading to discrepancies of the results is reported. The results of this study open encouraging perspectives towards the development of cheap LIBS instrumentation which would be capable, despite the limitations of the experimental apparatus, to perform fast and precise quantitative analysis on complex samples.  相似文献   

16.
Using a series of thirteen organic materials that includes novel high-nitrogen energetic materials, conventional organic military explosives, and benign organic materials, we have demonstrated the importance of variable selection for maximizing residue discrimination with partial least squares discriminant analysis (PLS-DA). We built several PLS-DA models using different variable sets based on laser induced breakdown spectroscopy (LIBS) spectra of the organic residues on an aluminum substrate under an argon atmosphere. The model classification results for each sample are presented and the influence of the variables on these results is discussed. We found that using the whole spectra as the data input for the PLS-DA model gave the best results. However, variables due to the surrounding atmosphere and the substrate contribute to discrimination when the whole spectra are used, indicating this may not be the most robust model. Further iterative testing with additional validation data sets is necessary to determine the most robust model.  相似文献   

17.
Simultaneous multicomponent analysis is usually carried out by multivariate calibration models such as partial least squares (PLS) that utilize the full spectrum. It has been demonstrated by both experimental and theoretical considerations that better results can be obtained by a proper selection of the spectral range to be included in calculations. A genetic algorithm is one of the most popular methods for selecting variables for PLS calibration of mixtures with almost identical spectra without loss of prediction capacity. In this work, a simple and precise method for rapid and accurate simultaneous determination of sulfide and sulfite ions based on the addition reaction of these ions with new fuchsin at pH 8 and 25°C by PLS regression and using a genetic algorithm (GA) for variable selection is proposed. The concentrations of sulfide and sulfite ions varied between 0.05–2.50 and 0.15–2.00 μg/mL, respectively. A series of synthetic solutions containing different concentrations of sulfide and sulfite were used to check the prediction ability of GA-PLS models. The root mean square error of prediction with PLS on the whole data set was 0.19 μg/mL for sulfide and 0.09 μg/mL for sulfite. After the application of GA, these values were reduced to 0.04 and 0.03 μg/mL, respectively. The text was submitted by the authors in English.  相似文献   

18.
Near-infrared (NIR) imaging systems simultaneously record spectral and spatial information. Near-infrared imaging was applied to the identification of (E,Z)-4-(3-(4-chlorophenyl)-3-(3,4-dimethoxyphenyl)acryloyl)morpholine (dimethomorph) in both mixed samples and commercial formulation in this study. The distributions of technical dimethomorph and additive in the heterogeneous counterfeit product were obtained by the relationship imaging (RI) mode. Furthermore, a series of samples which consisted of different contents of uniformly distributed dimethomorph were prepared and three data cubes were generated for each content. The spectra extracted from these images were imported to establish the partial least squares model. The model??s evaluating indicators were: coefficient of determination (R 2) 99.42 %, root mean square error of calibration (RMSEC) 0.02612, root mean square error of cross-validation (RMSECV) 0.01693, RMSECVmean 0.03577, relative standard error of prediction (RSEP) 0.01999, and residual predictive deviation (RPD) 15.14. Relative error of prediction of the commercial formulation was 0.077, indicating the predicted value correlated with the real content. The chemical value reconstruction image of dimethomorph formulation products was calculated by a MATLAB program. NIR microscopy imaging here manifests its potential in identifying the active component in the counterfeit pesticide and quantifying the active component in its scanned image.  相似文献   

19.
20.
In quantitative structure-activity relationship (QSAR) modeling, when compounds in a training set exhibit a significant structural distinction between each other, in particular when chemicals of biological interest interacting on the receptor involve a different mechanism, it might be difficult to construct a single linear model for the whole population of compounds of interest with desired residuals. Developing a piecewise linear local model can be effective to circumvent the aforementioned problem. In this paper, piecewise modeling by the particle swarm optimization (PMPSO) approach is applied to QSAR study. The minimum spanning tree is used for clustering all compounds in the training set to form a tree, and the modified discrete PSO is applied to divide the tree to find satisfactory piecewise linear models. A new objective function is formulated for searching the appropriate piecewise linear models. The proposed PMPSO algorithm was used to predict the antagonism of angiotensin II. The results demonstrated that PMPSO is useful for improvement of the performance of regression models.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号