首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Chemical fingerprints encode the presence or absence of molecular features and are available in many large databases. Using a variation of the Ant Colony Optimization (ACO) paradigm, we describe a binary classifier based on feature selection from fingerprints. We discuss the algorithm and possible cross-validation procedures. As a real-world example, we use our algorithm to analyze a Plasmodium falciparum inhibition assay and contrast its performance with other machine learning paradigms in use today (decision tree induction, random forests, support vector machines, artificial neural networks). Our algorithm matches established paradigms in predictive power, yet supplies the medicinal chemist and basic researcher with easily interpretable results. Furthermore, models generated with our paradigm are easy to implement and can complement virtual screenings by additionally exploiting the precalculated fingerprint information.  相似文献   

2.
3.
4.
5.
6.
7.
8.
This work describes the first approach in the development of a comprehensive classification method for bitterness of small molecules. The data set comprises 649 bitter and 13 530 randomly selected molecules from the MDL Drug Data Repository (MDDR) which are analyzed by circular fingerprints (MOLPRINT 2D) and information-gain feature selection. The feature selection proposes substructural features which are statistically correlated to bitterness. Classification is performed on the selected features via a na?ve Bayes classifier. The substructural features upon which the classification is based are able to discriminate between bitter and random compounds, and thus we propose they are also functionally responsible for causing the bitter taste. Such substructures include various sugar moieties as well as highly branched carbon scaffolds. Cynaropicrine contains a number of the substructural features found to be statistically associated with bitterness and thus was correctly predicted to be bitter by our model. Alternatively, both promethazine and saccharin contain fewer of these substructural features, and thus the bitterness in these compounds was not identified. Two different classes of bitter compounds were identified, namely those which are larger and contain mainly oxygen and carbon and often sugar moieties, and those which are rather smaller and contain additional nitrogen and/or sulfur fragments. The classifier is able to predict 72.1% of the bitter compounds. Feature selection reduces the number of false-positives while also increasing the number of false negatives to 69.5% of bitter compounds correctly predicted. Overall, the method presented here presents both one of the largest databases of bitter compounds presently available as well as a relatively reliable classification method.  相似文献   

9.
10.
为了解决传统接触式疲劳驾驶检测方法影响驾驶、检测算法识别率较低等问题,本文提出一种基于稀疏表示的眼睛状态识别的方法。利用K-SVD(K均值奇异值分解)方法对输入的训练集构造过完备冗余字典,利用正交匹配追踪法对测试的图像进行稀疏表示,然后根据重构图像和测试图像之间的误差,确定测试图像所属的类别,判断出测试图像的状态。实验中将K-SVD和OMP(正交匹配追踪)方法与其它字典学习和稀疏表示方法进行对比,结果表明,利用K-SVD字典学习算法结合OMP算法获得了较好的识别效果。  相似文献   

11.
Researchers are developing conceptually based models linking the structure and dynamics of molecular charge density to certain properties. Here we report on our efforts to identify features within the charge density that are indicative of instability and metastability. Towards this, we use our extensions to the quantum theory of atoms in molecules that capitalize on a molecule’s ridges to define a natural simplex over the charge density. The resulting simplicial complex can be represented at various levels by its 0‐, 1‐, and 2‐skeleton (dependent sets of points, lines, and surfaces). We show that the geometry of these n‐skeletons retains critical information regarding the structure and stability of molecular systems while greatly simplifying charge density analysis. As an example, we use our methods to uncover the fingerprints of instability and metastability in two much‐discussed systems, that is, the di‐benzene complex and the He and adamantane inclusion complex.  相似文献   

12.
13.
In many modern chemoinformatics systems, molecules are represented by long binary fingerprint vectors recording the presence or absence of particular features or substructures, such as labeled paths or trees, in the molecular graphs. These long fingerprints are often compressed to much shorter fingerprints using a simple modulo operation. As the length of the fingerprints decreases, their typical density and overlap tend to increase, and so does any similarity measure based on overlap, such as the widely used Tanimoto similarity. Here we show that this correlation between shorter fingerprints and higher similarity can be thought of as a systematic error introduced by the fingerprint folding algorithm and that this systematic error can be corrected mathematically. More precisely, given two molecules and their compressed fingerprints of a given length, we show how a better estimate of their uncompressed overlap, hence of their similarity, can be derived to correct for this bias. We show how the correction can be implemented not only for the Tanimoto measure but also for all other commonly used measures. Experiments on various data sets and fingerprint sizes demonstrate how, with a negligible computational overhead, the correction noticeably improves the sensitivity and specificity of chemical retrieval.  相似文献   

14.
15.
Summary Recently, the development of computer programs which permit the de novo design of molecular structures satisfying a set of steric and chemical constraints has become a burgeoning area of research and many operational systems have been reported in the literature. Experience with PRO_LIGAND—the de novo design methodology embodied in our in-house molecular design and simulation system PRO-METHEUS—has suggested that the addition of a genetic algorithm (GA) structure refinement procedure can add value to an already useful tool. Starting with the set of designed molecules as an initial population, the GA can combine features from both high- and low-scoring structures and, over a number of generations, produce individuals of better score than any of the starting structures. This paper describes how we have implemented such a procedure and demonstrates its efficacy in improving two sets of molecules generated by different de novo design projects.  相似文献   

16.
17.
18.
Toxicity of chemicals induced by different factors is an important consideration, especially during the drug research and development process. Thus, there is urgent need to develop computationally effective models that can predict the toxicity or adverse effects of chemicals for a specific class of chemicals. In this study, random forest (RF) was used to classify five toxicity data sets from Distributed Structure‐Searchable Toxicity database network, using substructure fingerprints calculated directly from simple molecular structure. Three model validation approaches, out‐of‐bag validation incorporated in RF, fivefold cross‐validation, and an independent validation set, were used for assessing the prediction capability of our models. The chemical space analysis of data sets was explored by multidimensional scaling plots, and outlying molecules were also detected by the proximity measure in RF. At the same time, the important substructure fingerprints, recognized by the RF technique, gave some insights into the structure features related to toxicity of chemicals. The results obtained showed that these in silico classification models with substructure patterns and RF are applicable for potential toxicity prediction of chemical compounds. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号