首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 350 毫秒
1.
2.
3.
4.
In this paper we study different representational spaces of molecule data sets based on 2D representation models for the building of QSAR models for the prediction of the activity of 37 benzylamino enaminone derivatives. Approximations based on classical similarity calculated from fingerprint representation of molecules and isomorphism obtained using sub-graph matching algorithms are compared to fragmentation-based approximations using partial least squares and genetic algorithms. The influence of the anchored position of a non-common moiety and the kind of substituents in the common core structure of the data set are analysed, demonstrating the anomalous behaviour of some molecules and therefore the difficulty in building prediction models. These problems are solved by considering approximate similarity models. These models tune the prediction equations based on the size of the substituent and the anchored position, by adjusting the contribution of each substituent in similarity measurements calculated between the molecule data sets.  相似文献   

5.
The rivality index (RI) is a normalized distance measurement between a molecule and their first nearest neighbours providing a robust prediction of the activity of a molecule based on the known activity of their nearest neighbours. Negative values of the RI describe molecules that would be correctly classified by a statistic algorithm and, vice versa, positive values of this index describe those molecules detected as outliers by the classification algorithms. In this paper, we have described a classification algorithm based on the RI and we have proposed four weighted schemes (kernels) for its calculation based on the measuring of different characteristics of the neighbourhood of molecules for each molecule of the dataset at established values of the threshold of neighbours. The results obtained have demonstrated that the proposed classification algorithm, based on the RI, generates more reliable and robust classification models than many of the more used and well-known machine learning algorithms. These results have been validated and corroborated by using 20 balanced and unbalanced benchmark datasets of different sizes and modelability. The classification models generated provide valuable information about the molecules of the dataset, the applicability domain of the models and the reliability of the predictions.  相似文献   

6.
We present a new method (fFLASH) for the virtual screening of compound databases that is based on explicit three-dimensional molecular superpositions. fFLASH takes the torsional flexibility of the database molecules fully into account, and can deal with an arbitrary number of conformation-dependent molecular features. The method utilizes a fragmentation-reassembly approach which allows for an efficient sampling of the conformational space. A fast clique-based pattern matching algorithm generates alignments of pairs of adjacent molecular fragments on the rigid query molecule that are subsequently reassembled to complete database molecules. Using conventional molecular features (hydrogen bond donors and acceptors, charges, and hydrophobic groups) we show that fFLASH is able to rapidly produce accurate alignments of medium-sized drug-like molecules. Experiments with a test database containing a diverse set of 1780 drug-like molecules (including all conformers) have shown that average query processing times of the order of 0.1 seconds per molecule can be achieved on a PC.  相似文献   

7.
Feature extraction is essential for chemical property estimation of molecules using machine learning. Recently, graph neural networks have attracted attention for feature extraction from molecules. However, existing methods focus only on specific structural information, such as node relationship. In this paper, we propose a novel graph convolutional neural network that performs feature extraction with simultaneously considering multiple structures. Specifically, we propose feature extraction paths specialized in node, edge, and three-dimensional structures. Moreover, we propose an attention mechanism to aggregate the features extracted by the paths. The attention aggregation enables us to select useful features dynamically. The experimental results showed that the proposed method outperformed previous methods.  相似文献   

8.
Implicit solvent models divide solvation free energies into polar and nonpolar additive contributions, whereas polar and nonpolar interactions are inseparable and nonadditive. We present a feature functional theory (FFT) framework to break this ad hoc division. The essential ideas of FFT are as follows: (i) representability assumption: there exists a microscopic feature vector that can uniquely characterize and distinguish one molecule from another; (ii) feature‐function relationship assumption: the macroscopic features, including solvation free energy, of a molecule is a functional of microscopic feature vectors; and (iii) similarity assumption: molecules with similar microscopic features have similar macroscopic properties, such as solvation free energies. Based on these assumptions, solvation free energy prediction is carried out in the following protocol. First, we construct a molecular microscopic feature vector that is efficient in characterizing the solvation process using quantum mechanics and Poisson–Boltzmann theory. Microscopic feature vectors are combined with macroscopic features, that is, physical observable, to form extended feature vectors. Additionally, we partition a solvation dataset into queries according to molecular compositions. Moreover, for each target molecule, we adopt a machine learning algorithm for its nearest neighbor search, based on the selected microscopic feature vectors. Finally, from the extended feature vectors of obtained nearest neighbors, we construct a functional of solvation free energy, which is employed to predict the solvation free energy of the target molecule. The proposed FFT model has been extensively validated via a large dataset of 668 molecules. The leave‐one‐out test gives an optimal root‐mean‐square error (RMSE) of 1.05 kcal/mol. FFT predictions of SAMPL0, SAMPL1, SAMPL2, SAMPL3, and SAMPL4 challenge sets deliver the RMSEs of 0.61, 1.86, 1.64, 0.86, and 1.14 kcal/mol, respectively. Using a test set of 94 molecules and its associated training set, the present approach was carefully compared with a classic solvation model based on weighted solvent accessible surface area. © 2017 Wiley Periodicals, Inc.  相似文献   

9.
10.
We present a method for simultaneous three-dimensional (3D) structure generation and pharmacophore-based alignment using a self-organizing algorithm called Stochastic Proximity Embedding (SPE). Current flexible molecular alignment methods either start from a single low-energy structure for each molecule and tweak bonds or torsion angles, or choose from multiple conformations of each molecule. Methods that generate structures and align them iteratively (e.g., genetic algorithms) are often slow. In earlier work, we used SPE to generate good-quality 3D conformations by iteratively adjusting pairwise distances between atoms based on a set of geometric rules, and showed that it samples conformational space better and runs faster than earlier programs. In this work, we run SPE on the entire ensemble of molecules to be aligned. Additional information about which atoms or groups of atoms in each molecule correspond to points in the pharmacophore can come from an automatically generated hypothesis or be specified manually. We add distance terms to SPE to bring pharmacophore points from different molecules closer in space, and also to line up normal/direction vectors associated with these points. We also permit pharmacophore points to be constrained to lie near external coordinates from a binding site. The aligned 3D molecular structures are nearly correct if the pharmacophore hypothesis is chemically feasible; postprocessing by minimization of suitable distance and energy functions further improves the structures and weeds out infeasible hypotheses. The method can be used to test 3D pharmacophores for a diverse set of active ligands, starting from only a hypothesis about corresponding atoms or groups.  相似文献   

11.
Summary This paper outlines an application of the theory of simulated annealing to molecular matching problems. Three cooling schedules are examined: linear, exponential and dynamic cooling. The objective function is the sum of the elements of the difference distance matrix between the two molecules generated by continual reordering of one molecule. Extensive tests of the algorithms have been performed on random coordinate data together with two related protein structures. Combinatorial problems, inherent in the assignment of atom correspondences, are effectively overcome by simulated annealing. The algorithms outlined here can readily optimize molecular matching problems with 150 atoms.  相似文献   

12.
Accurate computational methods that can help to predict biological function of a protein from its sequence are of great interest to research biologists and pharmaceutical companies. One approach to assume the function of proteins is to predict the interactions between proteins and other molecules. In this work, we propose a machine learning method that uses a primary sequence of a domain to predict its propensity for interaction with small molecules. By curating the Pfam database with respect to the small molecule binding ability of its component domains, we have constructed a dataset of small molecule binding and non-binding domains. This dataset was then used as training set to learn a Bayesian classifier, which should distinguish members of each class. The domain sequences of both classes are modelled with Markov chains. In a Jack-knife test, our classification procedure achieved the predictive accuracies of 77.2% and 66.7% for binding and non-binding classes respectively. We demonstrate the applicability of our classifier by using it to identify previously unknown small molecule binding domains. Our predictions are available as supplementary material and can provide very useful information to drug discovery specialists. Given the ubiquitous and essential role small molecules play in biological processes, our method is important for identifying pharmaceutically relevant components of complete proteomes. The software is available from the author upon request.  相似文献   

13.
The DOCK program explores possible orientations of a molecule within a macromolecular active site by superimposing atoms onto precomputed site points. Here we compare a number of different search methods, including an exhaustive matching algorithm based on a single docking graph. We evaluate the performance of each method by screening a small database of molecules to a variety of macromolecular targets. By varying the amount of sampling, we can monitor the time convergence of scores and rankings. We not only show that the site point–directed search is tenfold faster than a random search, but that the single graph matching algorithm boosts the speed of database screening up to 60-fold. The new algorithm, in fact, outperforms the bipartite graph matching algorithm currently used in DOCK. The results indicate that a critical issue for rapid database screening is the extent to which a search method biases run time toward the highest-ranking molecules. The single docking graph matching algorithm will be incorporated into DOCK version 4.0. © 1997 John Wiley & Sons, Inc. J Comput Chem 18: 1175–1189  相似文献   

14.
The many applications of the distance matrix, D(G), and the Wiener branching index, W(G), in chemistry are briefly outlined. W(G) is defined as one half the sum of all the entries in D(G). A recursion formula is developed enabling W(G) to be evaluated for any molecule whose graph G exists in the form of a tree. This formula, which represents the first general recursion formula for trees of any kind, is valid irrespective of the valence of the vertices of G or of the degree of branching in G. Several closed expressions giving W(G) for special classes of tree molecules are derived from the general formula. One illustrative worked example is also presented. Finally, it is shown how the presence of an arbitrary number of heteroatoms in tree-like molecules can readily be accommodated within our general formula by appropriately weighting the vertices and edges of G.  相似文献   

15.
Finding a set of molecules, which closely resemble a given lead molecule, from a database containing potentially billions of chemical structures is an important but daunting problem. Similar molecular shapes are particularly important, given that in biology small organic molecules frequently act by binding into a defined and complex site on a macromolecule. Here, we present a new method for molecular shape comparison, named ultrafast shape recognition (USR), capable of screening billions of compounds for similar shapes using a single computer and without the need of aligning the molecules before testing for similarity. Despite its extremely fast comparison rate, USR will be shown to be highly accurate at describing, and hence comparing, molecular shapes.  相似文献   

16.
A focused collection of organic synthesis reactions for computer-based molecule construction is presented. It is inspired by real-world chemistry and has been compiled in close collaboration with medicinal chemists to achieve high practical relevance. Virtual molecules assembled from existing starting material connected by these reactions are supposed to have an enhanced chance to be amenable to real chemical synthesis. About 50% of the reactions in the dataset are ring-forming reactions, which fosters the assembly of novel ring systems and innovative chemotypes. A comparison with a recent survey of the reactions used in early drug discovery revealed considerable overlaps with the collection presented here. The dataset is available encoded as computer-readable Reaction SMARTS expressions from the Supporting Information presented for this paper.  相似文献   

17.
The densities of high energetic molecules in the solid state were calculated with a simplified scheme based on molecular surface electrostatic potentials (MSEP). The MSEP scheme for density estimation, originally developed by Politzer et al., was further modified to calculate electrostatic potential on a simpler van der Waals surface. Forty-one energetic molecules containing at least one nitro group were selected from among a variety of molecular types and density values, and were used to test the suitability of the MSEP scheme for predicting the densities of solid energetic molecules. For comparison purposes, we utilized the group additivity method (GAM) incorporating the parameter sets developed by Stine (Stine-81) and by Ammon (Ammon-98 and -00). The absolute average error in densities from our MSEP scheme was 0.039 g/cc. The results based on our MSEP scheme were slightly better than the GAM results. In addition, the errors in densities generated by the MSEP scheme were almost the same for various molecule types, while those predicted by GAM were somewhat dependent upon the molecule types.  相似文献   

18.
Generation of the list of near-neighbor pairs of atoms not bonded to each other is a key feature of many programs for calculating the energy and energy derivatives for large molecules. Because this step can take a significant amount of CPU time, more efficient nonbonded list generation can speed up the energy calculations. In this article, a novel nonbonded list generation algorithm, BYCC, is introduced. It combines certain features of other algorithms and achieves more rapid nonbonded list generation; a factor of approximately 2.5 for a molecule of 5000 atoms with a cutoff in the 10 A range is obtained on Hewlett-Packard (HP) and Alpha processors, without greatly increasing memory requirements.  相似文献   

19.
With convergent synthesis in mind, we defined a new parameter to quantify the degree of centrality of each atom and bond in a molecule, so-called molecular centrality, which was defined based on squared node distances. The centrality becomes higher as the location of atoms gets closer to the center of a molecule. From the results of validation with 40 organic compounds reported about their total syntheses, it became clear that our molecular centrality was an effective index to evaluate synthetically important bonds. Additionally, it was confirmed that the highest attaching bond centrality of each product moderately correlated with molecular complexity changes of each step. The parameter is quantitative among molecules and suitable for statistical analysis.  相似文献   

20.
This paper reports a method for the identification of those molecules in a database of rigid 3D structures with molecular electrostatic potential (MEP) grids that are most similar to that of a user-defined target molecule. The most important features of an MEP grid are encoded in field-graphs, and a target molecule is matched against a database molecule by a comparison of the corresponding field-graphs. The matching is effected using a maximal common subgraph isomorphism algorithm, which provides an alignment of the target molecule's field- graph with those of each of the database molecules in turn. These alignments are used in the second stage of the search algorithm to calculate the intermolecular MEP similarities. Several different ways of generating field-graphs are evaluated, in terms of the effectiveness of the resulting similarity measures and of the associated computational costs. The most appropriate procedure has been implemented in an operational system that searches a corporate database, containing ca. 173,000 3D structures.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号