首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Quantitative structure–activity relationship models for the prediction of mode of toxic action (MOA) of 221 phenols to the ciliated protozoan Tetrahymena pyriformis using atom-based quadratic indices are reported. The phenols represent a variety of MOAs including polar narcotics, weak acid respiratory uncouplers, pro-electrophiles and soft electrophiles. Linear discriminant analysis (LDA), and four machine learning techniques (ML), namely k-nearest neighbours (k-NN), support vector machine (SVM), classification trees (CTs) and artificial neural networks (ANNs), have been used to develop several models with higher accuracies and predictive capabilities for distinguishing between four MOAs. Most of them showed global accuracy of over 90%, and false alarm rate values were below 2.9% for the training set. Cross-validation, complementary subsets and external test set were performed, with good behaviour in all cases. Our models compare favourably with other previously published models, and in general the models obtained with ML techniques show better results than those developed with linear techniques. We developed unsupervised and supervised consensus, and these results were better than our ML models, the results of rule-based approach and other ensemble models previously published. This investigation highlights the merits of ML-based techniques as an alternative to other more traditional methods for modelling MOA.  相似文献   

2.
Retention prediction models for a group of pyrazines chromatographed under reversed-phase mode were developed using multiple linear regression (MLR) and artificial neural networks (ANNs). Using MLR, the retention of the analytes were satisfactorily described by a two-predictor model based on the logarithm of the partition coefficient of the analytes (log P) and the percentage of the organic modifier in the mobile phase (ACN or MeOH). ANN prediction models were also derived using the predictors derived from MLR as inputs and log k as outputs. The best network architecture was found to be 2-2-1 for both ACN and MeOH data sets. The optimized ANNs showed better predictive properties than the MLR models especially for the ACN data set. In the case of the MeOH data set, the MLR and ANN models have comparable predictive performance.  相似文献   

3.
4.
5.

The interpretation of mode of action for GABAA receptor modulator activity is an important task of medicinal chemistry. The computational elucidation of the modulator activity is one of the ways to solve the above task. So-called semi-correlation is a tool for prediction of GABAA receptor modulator activity. The semi-correlation is based on the Monte Carlo method. This approach is to build up categorical classification models into two classes: (i) active and (ii) inactive. The CORAL software (http://www.insilico.eu/coral) can be used to build up the semi-correlations. The statistical quality of models (for external validation sets) based on semi-correlation has the range of Matthews correlation coefficient (MCC) is 0.72–1.00 for 30 random splits of all available data (n?=?210) into the training and validation sets. In contrast to existing approaches, the predictive CORAL models give prediction using solely data on molecular architecture (represented by simplified molecular input-line entry system?=?SMILES) and available experimental data on endpoints. Suggested models for prediction of GABAA receptor modulator activity are built up according to the OECD principles. Thus, the approach based on the semi-correlation can be a useful tool for studying of the GABAA receptor modulators activity.

  相似文献   

6.
Although artificial neural networks (ANNs) have been shown to exhibit superior predictive power in the study of quantitative structure-activity relationships (QSARs), they have also been labeled a "black box" because they provide little explanatory insight into the relative influence of the independent variables in the predictive process so that little information on how and why compounds work can be obtained. Here, we have turned our interests to their explanatory capacities; therefore, a method was proposed for assessing the relative importance of variables indicating molecular structure, on the basis of axon connection weights and partial derivatives of the ANN output with respect to its input, which can identify variables that significantly contribute to network predictions, and providing a variable selection method for ANNs. We show that, by extending this approach to ANNs, the "black box" mechanics of ANNs can be greatly illuminated, thereby making it very useful in understanding environmental chemical QSAR models.  相似文献   

7.
Support vector machines in water quality management   总被引:1,自引:0,他引:1  
Support vector classification (SVC) and regression (SVR) models were constructed and applied to the surface water quality data to optimize the monitoring program. The data set comprised of 1500 water samples representing 10 different sites monitored for 15 years. The objectives of the study were to classify the sampling sites (spatial) and months (temporal) to group the similar ones in terms of water quality with a view to reduce their number; and to develop a suitable SVR model for predicting the biochemical oxygen demand (BOD) of water using a set of variables. The spatial and temporal SVC models rendered grouping of 10 monitoring sites and 12 sampling months into the clusters of 3 each with misclassification rates of 12.39% and 17.61% in training, 17.70% and 26.38% in validation, and 14.86% and 31.41% in test sets, respectively. The SVR model predicted water BOD values in training, validation, and test sets with reasonably high correlation (0.952, 0.909, and 0.907) with the measured values, and low root mean squared errors of 1.53, 1.44, and 1.32, respectively. The values of the performance criteria parameters suggested for the adequacy of the constructed models and their good predictive capabilities. The SVC model achieved a data reduction of 92.5% for redesigning the future monitoring program and the SVR model provided a tool for the prediction of the water BOD using set of a few measurable variables. The performance of the nonlinear models (SVM, KDA, KPLS) was comparable and these performed relatively better than the corresponding linear methods (DA, PLS) of classification and regression modeling.  相似文献   

8.
Artificial neural networks (ANNs) are comparatively straightforward to understand and use in the analysis of scientific data. However, this relative transparency may encourage their use in an uncritical, and therefore possibly unproductive, fashion. The geometry of a network is among the most crucial factors in the successful deployment of network tools; in this review, we cover methods that can be used to determine optimum or near‐optimum geometries. These methods of determining neural network architecture include the following: (i) trial and error, in which architectures chosen semirandomly are tested and modified by the user; (ii) empirical or statistical methods, in which an ANN's internal parameters are adjusted based on the model's performance; (iii) hybrid methods, such as fuzzy inference; (iv) constructive and/or pruning algorithms, that add and/or remove neurons or weights from an initial architecture, respectively, based on a predefined link between architecture and ANN performance; (v) evolutionary strategies, which search the topology space using genetic operators to vary the neural network parameters. Several case studies illustrate the development of neural network models for applications in chemistry and chemical engineering. Copyright © 2011 John Wiley & Sons, Ltd.  相似文献   

9.
The suitability of an approach for extracting heuristic rules from trained artificial neural networks (ANNs) pruned by a regularization method and with architectures designed by evolutionary computation for quantifying highly overlapping chromatographic peaks is demonstrated. The ANN input data are estimated by the Levenberg-Marquardt method in the form of a four-parameter Weibull curve associated with the profile of the chromatographic band. To test this approach, two N-methylcarbamate pesticides, carbofuran and propoxur, were quantified using a classic peroxyoxalate chemiluminescence reaction as a detection system for chromatographic analysis. Straightforward network topologies (one and two outputs models) allow the analytes to be quantified in concentration ratios ranging from 1:7 to 5:1 with an average standard error of prediction for the generalization test of 2.7 and 2.3% for carbofuran and propoxur, respectively. The reduced dimensions of the selected ANN architectures, especially those obtained after using heuristic rules, allowed simple quantification equations to be developed that transform the input variables into output variables. These equations can be easily interpreted from a chemical point of view to attain quantitative analytical information regarding the effect of both analytes on the characteristics of chromatographic bands, namely profile, dispersion, peak height, and residence time.  相似文献   

10.
综述了人工神经元网络方法在毛细管电泳和色谱分析中的应用,内容包括迁移(或保留)行为的预测,分离优化,模式识别及分类,重叠峰定量解析,非线性过程的模型化,峰纯度的判断等。还对人工神经元网络在色谱和毛细管电泳中将来可能的应用进行了探讨。引用文献52篇。  相似文献   

11.
Ni Xin  Qinghua Meng  Yizhen Li  Yuzhu Hu 《中国化学》2011,29(11):2533-2540
This paper indicates the possibility to use near infrared (NIR) spectral similarity as a rapid method to estimate the quality of Flos Lonicerae. Variable selection together with modelling techniques is utilized to select representative variables that are used to calculate the similarity. NIR is used to build calibration models to predict the bacteriostatic activity of Flos Lonicerae. For the determination of the bacteriostatic activity, the in vitro experiment is used. Models are built for the Gram‐positive bacteria and also for the Gram‐negative bacteria. A genetic algorithm combined with partial least squares regression (GA‐PLS) is used to perform the calibration. The results of GA‐PLS models are compared to interval partial least squares (iPLS) models, full‐spectrum PLS and full‐spectrum principal component regression (PCR) models. Then, the variables in the two GA‐PLS models are combined and then used to calculate the NIR spectral similarity of samples. The similarity based on the characteristic variables and full spectrum is used for evaluating the fingerprints of Flos Lonicerae, respectively. The results show that the combination of variable selection method, modelling techniques and similarity analysis might be a powerful tool for quality control of traditional Chinese medicine (TCM).  相似文献   

12.
13.
14.
15.
16.
17.
With the aim of obtaining a monitoring tool to assess the quality of water, a multivariate statistical procedure based on cluster analysis (CA) coupled with soft independent modelling class analogy (SIMCA) algorithm, providing an effective classification method, is proposed. The experimental data set, carried out throughout the year 2004, was composed of analytical parameters from 68 water sources in a vast southwest area of Paris. Nine variables carrying the most useful information were selected and investigated (nitrate, sulphate, chloride, turbidity, conductivity, hardness, alkalinity, coliforms and Escherichia coli). Principal component analysis provided considerable data reduction, gathering in the first two principal components the majority of information representing about 92.2% of the total variance. CA grouped samples belonging to different sites, distinctly correlating them with chemical variables, and a classification model was built by SIMCA. This model was optimised and validated and then applied to a new data matrix, consisting of the parameters measured during the year 2005 from the same objects, providing a fast and accurate classification of all the samples. The most of the examined sources appeared unchanged during the 2-year period, but five sources resulted distributed in different classes, due to statistical significant changes of some characteristic analytical parameters.  相似文献   

18.
Both the acute toxicity and chronic toxicity data on aquatic organisms are indispensable parameters in the ecological risk assessment priority chemical screening process (e.g. persistent, bioaccumulative and toxic chemicals). However, most of the present modelling actions are focused on developing predictive models for the acute toxicity of chemicals to aquatic organisms. As regards chronic aquatic toxicity, considerable work is needed. The major objective of the present study was to construct in silico models for predicting chronic toxicity data for Daphnia magna and Pseudokirchneriella subcapitata. In the modelling, a set of chronic toxicity data was collected for D. magna (21 days no observed effect concentration (NOEC)) and P. subcapitata (72 h NOEC), respectively. Then, binary classification models were developed for D. magna and P. subcapitata by employing the k-nearest neighbour method (k-NN). The model assessment results indicated that the obtained optimum models had high accuracy, sensitivity and specificity. The model application domain was characterized by the Euclidean distance-based method. In the future, the data gap for other chemicals within the application domain on their chronic toxicity for D. magna and P. subcapitata could be filled using the models developed here.  相似文献   

19.
《Analytical letters》2012,45(8):920-932
Different ANNs models [Multi-layer Perceptrons (MLPs) and Radial Basis Function (RBF)] were developed and evaluated for the discrimination of olive oils produced in four Greek regions according to their geographical origin. For this purpose, ninety-seven samples were analyzed for 10 rare earth elements (REE) by ICP-MS. Moreover, two additional supervised techniques, discriminant analysis (DA) and classification trees (CTs), were applied to the same set for the data pre-treatment and for comparison purposes. In addition, two approaches were used for models' training and evaluation: the classical random choice of samples for the learning data set and an innovative one, which used the two linear discriminant functions (LDFs) of the preceding DA to choose the most representative learning sample set. The results were very satisfactory for the new ANNs classifiers. Over-fitting phenomena were overcome and the prediction ability was 73%, as evaluated by an independent test sample set. The results are encouraging for the ANNs efficiency even in demanding data bases, as the one under consideration.

[Supplementary materials are available for this article. Go to the publisher's online edition of Analytical Letters for the following free supplemental resources: Additional figures and tables.]  相似文献   

20.
The fact that bitumens behave as non-Newtonian fluids results in non-linear relationships between their near-infrared (NIR) spectra and the physico-chemical properties that define their consistency (viz. penetration and viscosity). Determining such properties using linear calibration techniques [e.g. partial least-squares regression (PLSR)] entails the previous transformation of the original variables by use of non-linear functions and employing the transformed variables to construct the models. Other properties of bitumens such as density and composition exhibit linear relationships with their NIR spectra. Artificial neural networks (ANNs) enable modelling of systems with a non-linear property-spectrum relationship; also, they allow one to determine several properties of a sample with a single model, so they are effective alternatives to linear calibration methods. In this work, the ability of ANNs simultaneously to determine both linear and non-linear parameters for bitumens without the need previously to transform the original variables was assessed. Based on the results, ANNs allow the simultaneous determination of several linear and non-linear physical properties typical of bitumens.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号