首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
3.
SMILES strings are the most compact text based molecular representations. Implicitly they contain the information needed to compute all kinds of molecular structures and, thus, molecular properties derived from these structures. We show that this implicit information can be accessed directly at SMILES string level without the need to apply explicit time-consuming conversion of the SMILES strings into molecular graphs or 3D structures with subsequent 2D or 3D QSPR calculations. Our method is based on the fragmentation of SMILES strings into overlapping substrings of a defined size that we call LINGOs. The integral set of LINGOs derived from a given SMILES string, the LINGO profile, is a hologram of the SMILES representation of the molecule described. LINGO profiles provide input for QSPR models and the calculation of intermolecular similarities at very low computational cost. The octanol/water partition coefficient (LlogP) QSPR model achieved a correlation coefficient R2=0.93, a root-mean-square error RRMS=0.49 log units, a goodness of prediction correlation coefficient Q2=0.89 and a QRMS=0.61 log units. The intrinsic aqueous solubility (LlogS) QSPR model achieved correlation coefficient values of R2=0.91, Q2=0.82, and RRMS=0.60 and QRMS=0.89 log units. Integral Tanimoto coefficients computed from LINGO profiles provided sharp discrimination between random and bioisoster pairs extracted from Accelrys Bioster Database. Average similarities (LINGOsim) were 0.07 for the random pairs and 0.36 for the bioisosteric pairs.  相似文献   

4.
5.
6.
7.
Bioisosteres are functional groups or atoms that are structurally different but that can form similar intermolecular interactions. Potential bioisosteres were identified here from analysing the X-ray crystallographic structures for sets of different ligands complexed with a fixed protein. The protein was used to align the ligands with each other, and then pairs of ligands compared to identify substructural features with high volume overlap that occurred in approximately the same region of geometric space. The resulting pairs of substructural features can suggest potential bioisosteric replacements for use in lead-optimisation studies. Experiments with 12 sets of ligand–protein complexes from the Protein Data Bank demonstrate the effectiveness of the procedure.  相似文献   

8.
We present a new method (fFLASH) for the virtual screening of compound databases that is based on explicit three-dimensional molecular superpositions. fFLASH takes the torsional flexibility of the database molecules fully into account, and can deal with an arbitrary number of conformation-dependent molecular features. The method utilizes a fragmentation-reassembly approach which allows for an efficient sampling of the conformational space. A fast clique-based pattern matching algorithm generates alignments of pairs of adjacent molecular fragments on the rigid query molecule that are subsequently reassembled to complete database molecules. Using conventional molecular features (hydrogen bond donors and acceptors, charges, and hydrophobic groups) we show that fFLASH is able to rapidly produce accurate alignments of medium-sized drug-like molecules. Experiments with a test database containing a diverse set of 1780 drug-like molecules (including all conformers) have shown that average query processing times of the order of 0.1 seconds per molecule can be achieved on a PC.  相似文献   

9.
10.
11.
12.
Recognition of small molecules by proteins depends on three-dimensional molecular surface complementarity. However, the dominant techniques for analyzing the similarity of small molecules are based on two-dimensional chemical structure, with such techniques often outperforming three-dimensional techniques in side-by-side comparisons of correlation to biological activity. This paper introduces a new molecular similarity method, termed morphological similarity (MS), that addresses the apparent paradox. Two sets of molecule pairs are identified from a set of ligands whose protein-bound states are known crystallographically. Pairs that bind the same protein sites form the first set, and pairs that bind different sites form the second. MS is shown to separate the two sets significantly better than a benchmark 2D similarity technique. Further, MS agrees with crystallographic observation of bound ligand states, independent of information about bound states. MS is efficient to compute and can be practically applied to large libraries of compounds.  相似文献   

13.
A methodology for performing sequence-free comparison of functional sites in protein structures is introduced. The method is based on a new notion of similarity among superimposed groups of amino acid residues that evaluates both geometry and physico-chemical properties. The method is specifically designed to handle disconnected and sparsely distributed sets of residues. A genetic algorithm is employed to find the superimposition of protein segments that maximizes their similarity. The method was evaluated by performing an all-to-all comparison on two separate sets of ligand-binding sites, comprising 47 protein-FAD (Flavin-Adenine Dinucleotide) and 64 protein-NAD (Nicotinamide-Adenine Dinucleotide) complexes, and comparing the results with those of an existing sequence-based structural alignment tool (TM-Align). The quality of the two methodologies is judged by the methods’ capacity to, among other, correctly predict the similarities in the protein-ligand contact patterns of each pair of binding sites. The results show that using a sequence-free method significantly improves over the sequence-based one, resulting in 23 significant binding-site homologies being detected by the new method but ignored by the sequence-based one.  相似文献   

14.
Bioisosterism involving replacement of a carboxylic acid substituent by 1H-tetrazole, yielding deprotonated carboxylate and tetrazolate under physiological conditions, is a well-known synthetic strategy in medicinal chemistry. To improve our overall understanding of bioisosterism, we have used this example to study the geometrical and energetic aspects of the functional group replacement. Specifically, we use crystal structure informatics and high-level ab initio calculations to study the hydrogen bond (H-bond) energy landscapes of the protonated and deprotonated bioisosteric pairs. Each pair exhibits very similar H-bond environments in crystal structures retrieved from the CSD, and the attractive energies of these H-bonds are also very similar. However, by comparison with -COOH and -COO(-), the H-bond environments around 1H-tetrazole and tetrazolate substituents extend further, by about 1.2 ?, from the core of the connected molecule. Analysis of pairs of PDB structures containing ligands which differ only in having a tetrazole or a carboxyl substituent and which are bound to the same protein indicates that the protein binding site must flex sufficiently to form strong H-bonds to either substituent. A survey of DrugBank shows a rather small number of tetrazole-containing drugs in the 'approved' and 'experimental' drug sections of that database.  相似文献   

15.
16.
This paper reports a method for the identification of those molecules in a database of rigid 3D structures with molecular electrostatic potential (MEP) grids that are most similar to that of a user-defined target molecule. The most important features of an MEP grid are encoded in field-graphs, and a target molecule is matched against a database molecule by a comparison of the corresponding field-graphs. The matching is effected using a maximal common subgraph isomorphism algorithm, which provides an alignment of the target molecule's field- graph with those of each of the database molecules in turn. These alignments are used in the second stage of the search algorithm to calculate the intermolecular MEP similarities. Several different ways of generating field-graphs are evaluated, in terms of the effectiveness of the resulting similarity measures and of the associated computational costs. The most appropriate procedure has been implemented in an operational system that searches a corporate database, containing ca. 173,000 3D structures.  相似文献   

17.
This paper aims at demonstrating the applicability of statistical spectroscopy and genetic algorithms to the similarity studies. Statistical moments of the intensity distributions are used as a basis for defining similarity distances between pairs of model spectra. Model spectrum is taken as a sum of two Gaussian distributions characterized by different parameters. As a result, dissimilarity maps are presented.  相似文献   

18.
Summary This paper describes techniques for calculating the degree of similarity between an input query molecule and each of the molecules in a database of 3-D chemical structures. The inter-molecular similarity measure used is the number of atoms in the 3-D common substructure (CS) between the two molecules which are being compared. The identification of 3-D CSs is very demanding of computational resources, even when an efficient clique detection algorithm is used for this purpose. Two types of upperbound calculation are described which allow reductions in the number of exact CS searches which need to be carried out to identify those molecules from a database which are similar to a 3-D target molecule.  相似文献   

19.
Starting from either the exchange or the exchange‐correlation density together with Bader's definition of an atom in a molecule, an atomic hole density function can be defined. Contour maps of atomic hole density functions are able to show how the electron density of each atom in a molecule is partially delocalized into the rest of atoms in the molecule. The degree of delocalization of the atomic density ultimately depends on the nature of the atom studied and its environment. Atomic hole density functions are also used to define an atomic similarity measure, which allows for the quantitative assessment of the degree of atomic transferability in different molecular environments. In this article, contour maps for the N atom in the (N2, CN, NO+) series and the O atom in the (CO, H2CO, and HCOOH) series are presented at the Hartree–Fock and CISD levels of theory. Moreover, the transferability of N and O within the two series is studied by means of atomic similarity measures. © 2000 John Wiley & Sons, Inc. J Comput Chem 21: 1361–1374, 2000  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号