首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
3.
Thyroid function diagnosis is an important classification problem, and we made reanalysis of the human thyroid data, which had been analyzed by the multivariate analysis, by the two notable neural networks. One is the self-organizing map approach which clusters the patients and displays visually a characteristic of the distribution according to laboratory tests. We found that self-organizing map (SOM) consists of three well separated clusters corresponding to hyperthyroid, hypothyroid and normal, and more detailed information for patients is obtained from the position in the map. Besides, the missing value SOM which we had introduced to investigate QSAR problem turned out to be also useful in treating such classification problem. We estimated the classification rates of thyroid disease using Bayesian regularized neural network (BRNN) and found that its prediction accuracy is better than multivariate analysis. Automatic relevance determination (ARD) method of BRNN was surely verified to be effective by the direct calculation of classification rates using BRNN without ARD for all possible combinations of laboratory tests.  相似文献   

4.
丛湧  薛英 《物理化学学报》2013,29(8):1639-1647
对89 个苯并异噻唑和苯并噻嗪类丙型肝炎病毒(HCV) NS5B聚合酶非核苷抑制剂进行了定量构效关系(QSAR)研究. 采用遗传算法组合偏最小二乘(GA-PLS)和线性逐步回归分析(LSRA)两种特征选择方法选择最优描述符子集, 然后建立多元线性回归和偏最小二乘线性回归模型. 并首次尝试使用遗传算法耦合支持向量机方法(GA-SVM)对两种特征选择方法所选的描述符子集分别建立非线性支持向量机回归模型. 三种机器学习方法所建模型均得到比较满意的预测效果. 采用LSRA所选的6 个描述符建立的三个QSAR模型对于测试集的相关系数为0.958-0.962, GA-SVM法给出最好的预测精度(0.962). 采用GA-PLS所选的7个描述符建立的三个QSAR模型对于测试集的相关系数为0.918-0.960, 偏最小二乘回归模型的结果最好(0.960). 本工作提供了一种有效的方法来预测丙型肝炎病毒抑制剂的生物活性, 该方法也可以扩展到其他类似的定量构效关系研究领域.  相似文献   

5.
6.
7.
8.
In the computer-aided drug design, in order to find some new leads from a large library of compounds, the pattern recognition study of the diversity and similarity assessment of the chemical compounds is required; meanwhile in the combinatorial library design, more attention is given to design target focusing library along with diversity and drug-likeness criteria. This review presents the current state-of-art applications of Kohonen self-organizing maps (SOM) for studying the compounds pattern recognition, comparing the property of molecular surfaces, distinguishing drug-like and nondrug-like molecules, splitting a dataset into the proper training and test sets before constructing a QSAR (Quantitative Structural-Activity Relationship) model, and also for the combinatorial libraries comparison and the combinatorial library design. The Kohonen self-organizing map will continue to play an important role in drug discovery and library design.  相似文献   

9.
The predictive accuracy of the model is of the most concern for computational chemists in quantitative structure-activity relationship (QSAR) investigations. It is hypothesized that the model based on analogical chemicals will exhibit better predictive performance than that derived from diverse compounds. This paper develops a novel scheme called "clustering first, and then modeling" to build local QSAR models for the subsets resulted from clustering of the training set according to structural similarity. For validation and prediction, the validation set and test set were first classified into the corresponding subsets just as those of the training set, and then the prediction was performed by the relevant local model for each subset. This approach was validated on two independent data sets by local modeling and prediction of the baseline toxicity for the fathead minnow. In this process, hierarchical clustering was employed for cluster analysis, k-nearest neighbor for classification, and partial least squares for the model generation. The statistical results indicated that the predictive performances of the local models based on the subsets were much superior to those of the global model based on the whole training set, which was consistent with the hypothesis. This approach proposed here is promising for extension to QSAR modeling for various physicochemical properties, biological activities, and toxicities.  相似文献   

10.
11.
We describe a toxicity alerting system for uncharacterized compounds, which is based upon comprehensive tables of substructure fragments that are indicative of toxicity risk. These tables were derived computationally by analyzing the RTECS database and the World Drug Index. We provide, free of charge, a Java applet for structure drawing and toxicity risk assessment. In an independent investigation, we compared the toxicity classification performance of naive Bayesian clustering, k next neighbor classification, and support vector machines. To visualize the chemical space of both toxic and druglike molecules, we trained a large self-organizing map (SOM) with all compounds from the RTECS database and the IDDB. In summary, we found that a support vector machine performed best at classifying compounds of defined toxicity into appropriate toxicity classes. Also, SOMs performed excellently in separating toxic from nontoxic substances. Although these two methods are limited to compounds that are structurally similar to known toxic substances, our fragment-based approach extends predictions to compounds that are structurally dissimilar to compounds used in the training set.  相似文献   

12.
The determination of volatile and semi-volatile components of ice wine aroma was realized throughout the development of a rapid headspace solid-phase microextraction-gas chromatography-time-of-flight mass spectrometry (SPME-GC-TOF-MS) analytical method (Part I) and its application to the analysis of 137 samples produced in Canada and Czech Republic and collected directly from the producing wineries (Part II). In this Part III study, the complex matrix resulting from the analysis of the 58 compounds selected for each sample as described in Part II, was submitted to critical interpretation by using a self-organizing map (SOM) technique. Results were commented in terms of relative characterization of samples according to their geographical origin, grape varieties, and vintage years. When clear clustering was obtained, the most determinant compounds responsible for the observed differentiations were identified and further discussed.  相似文献   

13.
14.
15.
This work addresses the problem of supervised classification of industrial wood species (seven different types in the present study) through their thermo‐oxidative stability. This is evaluated by pressure differential scanning calorimetry (PDSC) using the ASTM E2009. The maximization of the ratio of correct classification and the reduction of the costs of this activity are intended. This supervised classification problem was carried out using two different proposals: applying novel nonparametric functional data analysis techniques, based on kernel estimation, to the original PDSC curves, and using machine learning classification approaches applied to different multivariate data sets. The multivariate data sets were obtained, on the one hand, by estimating the fractal (Hausdorff) dimension of the PDSC curves by several methods, jointly with selecting the parameters from fitting a nonlinear model to the PDSC curves and, on the other hand, applying principal component analysis or partial linear squares to the thermograms. The results obtained show that the PDSC curves can be used to discriminate wood samples when these innovative and traditional statistical techniques are applied. In the best of the cases, a probability of correct classification that equals to 0.92 was obtained. PDSC represents a new alternative to the use of images, spectra, and other thermal signals as thermogravimetric analysis for classification purposes.Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

16.
Abstract

A novel method for modeling 3D QSAR has been developed. The method involves a multiple training of a series of self-organizing networks (SOM). The obtained networks have been used for processing the data of one reference molecule. A scheme for the analysis of such data with the PLS analysis has been proposed and tested using the steroids data with corticosteroid binding globulin (CBG) affinity. The predictivity of the CBG models measured with the SDEP parameter is among the best one reported. Although 3-D QSAR models for colchicinoid series is far less predictive, it allows for a discussion on the relative influence of the structural motifs of these compounds.  相似文献   

17.
This paper presents a novel method of mining biological data using a self-organizing map (SOM). After partitioning a set of protein sequences using SOM, conventional homology alignment is applied to each cluster to determine the conserved local motif (biological pattern) for the cluster. These local motifs are then regarded as rules for prediction and classification. In the application to the prediction of HIV protease cleavage sites in proteins, we found that the rules derived from this method are much more robust than those derived from the decision tree method.  相似文献   

18.
Quantitative structure-activity relationship (QSAR) studies based on chemometric techniques are reviewed. Partial least squares (PLS) is introduced as a novel robust method to replace classical methods such as multiple linear regression (MLR). Advantages of PLS compared to MLR are illustrated with typical applications. Genetic algorithm (GA) is a novel optimization technique which can be used as a search engine in variable selection. A novel hybrid approach comprising GA and PLS for variable selection developed in our group (GAPLS) is described. The more advanced method for comparative molecular field analysis (CoMFA) modeling called GA-based region selection (GARGS) is described as well. Applications of GAPLS and GARGS to QSAR and 3D-QSAR problems are shown with some representative examples. GA can be hybridized with nonlinear modeling methods such as artificial neural networks (ANN) for providing useful tools in chemometric and QSAR.  相似文献   

19.
Most models in quantitative structure and activity relationship (QSAR) research, proposed by various techniques such as ordinary least squares regression, principal components regression, partial least squares regression, and multivariate adaptive regression splines, involve a linear parametric part and a random error part. The random errors in those models are assumed to be independently identical distributed. However, the independence assumption is not reasonable in many cases. Some dependence among errors should be considered just like Kriging. It has been successfully used in computer experiments for modeling. The aim of this paper is to apply Kriging models to QSAR. Our experiments show that the Kriging models can significantly improve the performances of the models obtained by many existing methods.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号