首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A new method is proposed for the evaluation of numerical similarity measures for large molecules, defined in terms of their electron density (ED) distributions. The technique is based on the Molecular Electron Density Lego Assembler (MEDLA) approach, proposed earlier for the generation of ab initio quality electron densities for proteins and other macromolecules. The reliability of the approach is tested using a family of 13 substituted aromatic systems for which both standard ab initio electron density computations and the MEDLA technique are applicable. These tests also provide additional examples for evaluating the accuracy of the MEDLA technique. Electron densities for a series of 13 substituted benzenes were calculated using the standard ab initio method with STO-3G, 3-21G, and 6-31G** basis sets as well as the MEDLA approach with a 6-31G** database of electron density fragments. For each type of calculation, pairwise similarity measures of these compounds were calculated using a point-by-point numerical comparison of the EDs. From these results, 2D similarity maps were constructed, serving as an aid for quick visual comparisons for the entire molecular family. The MEDLA approach is shown to give virtually equivalent numerical similarity measures and similarity maps as the standard ab initio method using a 6-31G** basis set. By contrast, significant differences are found between the standard ab initio 6-31G** results and the standard ab initio results obtained with smaller STO-3G and 3-21G basis sets. These tests indicate that the MEDLA-based similarity measures faithfully mimic the actual, standard ab initio 6-31G** similarity measures, suggesting the MEDLA method as a reliable technique to assess the shape similarities of proteins and other macromolecules. The speed of the MEDLA computations allows rapid, pairwise comparisons of the actual EDs for a series of molecules, requiring no more computer time than other simplified, less detailed representations of molecular shape. The MEDLA method also reduces the need to store large volumes of numerical density data on disk, as these densities can be quickly recomputed when needed. For these reasons, the proposed MEDLA similarity analysis technique is likely to become a useful tool in computational drug design. © 1995 John Wiley & Sons, Inc.  相似文献   

2.
The evaluation of the electron density based similarity function scales quadratically with respect to the size of the molecules for simplified, atomic shell densities. Due to the exponential decay of the function's atom-atom terms most interatomic contributions are numerically negligible on large systems. An improved algorithm for the evaluation of the Quantum Molecular Similarity function is presented. This procedure identifies all non-negligible terms without computing unnecessary interatomic squared distances, thus effectively turning to linear scaling the similarity evaluation. Presented also is a minimalist dynamic electron density model. Approximate, single shell densities together with the proposed algorithm facilitate fast electron density based alignments on macromolecules.  相似文献   

3.
Some of the most widely used indices in molecular similarity searching are intrinsically symmetric in nature, meaning that each molecule under comparison contributes equally to the similarity index. For example, the Carbó and the Hodgkin–Richards similarity indices are respectively, related to the geometric and arithmetic averages of the molecular self-similarities. This work introduces the asymmetric forms of an entire family of field-based molecular similarity indices. By incorporating a weighted contribution of each molecule into the similarity index, the newly obtained asymmetric forms allow for measuring and modulating the similarity of one molecule in the context of another and thus have the potential of alleviating the size dependency often observed in chemical similarity searching  相似文献   

4.
A DNA sequence is a finite sequence of letters in the 4-letter DNA alphabet sigma = [A, C, G, T]. A set of condensed matrices was constructed to represent DNA sequences based on the sieve ratios of trinucleotide in sequence. Then, leading eigenvalues of these matrices were computed and considered as invariants for the DNA sequences. Similarity and dissimilarity analysis based on condensed matrices are given for eleven exon-1 genes of beta-globins of eleven species.  相似文献   

5.
Maxwell-Boltzmann statistics provides the adequate mathematical background allowing to define similarity measures involving molecular energy surfaces and electrostatic potential maps. Boltzmann similarity measures are described and various illustrative examples are used to show the practical viability of the theory. A new molecular similarity index is also presented. Finally, hybrid measures involving Boltzmann and density distributions are defined.  相似文献   

6.
A general theory of similarity measures for library search systems is presented. It shows that once the spectral features to be used in the comparison of two spectra have been selected and their relative importance has been fixed, the characteristics of the similarity measure are fully controlled by two independent parameters. These two parameters control whether a forward or a reverse search and whether an identity, a similarity, or a classification search is conducted.  相似文献   

7.
A new type of molecular representation is introduced that is based on activity class characteristic substructures extracted from random fragment populations. Mapping of characteristic substructures is used to determine atom match rates in active molecules. Comparison of match rates of bonded atoms defines a hierarchical molecular fragmentation scheme. Active compounds are encoded as fragmentation pathways isolated from core trees. These paths are amenable to biological sequence alignment methods in combination with substructure-based scoring functions. From multiple core path alignments, consensus fragment sequences are derived that represent compound activity classes. Consensus fragment sequences weighted by increasing structural specificity can also be used to map molecules and search databases for active compounds.  相似文献   

8.
9.
On the similarity of DNA primary sequences   总被引:3,自引:0,他引:3  
We consider numerical characterization of graphical representations of DNA primary sequences. In particular we consider graphical representation of DNA of beta-globins of several species, including human, on the basis of the approach of A. Nandy in which nucleic bases are associated with a walk over integral points of a Cartesian x, y-coordinate system. With a so-generated graphical representation of DNA, we associate a distance/distance matrix, the elements of which are given by the quotient of the Euclidean and the graph theoretical distances, that is, through the space and through the bond distances for pairs of bases of graphical representation of DNA. We use eigenvalues of so-constructed matrices to characterize individual DNA sequences. The eigenvalues are used to construct numerical sequences, which are subsequently used for similarity/dissimilarity analysis. The results of such analysis have been compared and combined with similarity tables based on the frequency of occurrence of pairs of bases.  相似文献   

10.
Similarity measures between pairs of molecular wave functions are described. They are based on the geometrical structure of the LCAO–MO framework and upon multivariate analysis ideas. The theoretical framework is presented, and formulae for some integrals needed are given. Two main measures, distance and correlation coefficients, are used. Distance and correlation matrices induce relationships in the whole MO set, which can be depicted through minimal spanning tree techniques. Furthermore, principal component analysis allows a two-dimensional visualization of the Mo manifold geometrical relationships. Various examples are given in order to obtain information on how basis set, environment, excitation, bending, stretching, and electronegativity affect the induced order. For this purpose “ab initio” SCF–LCAO–MO calculations with double- and single-zeta quality basis sets have been used for various simple molecular structures: H2O, NH3, CH4, N2, O2, C2, NO, CN, and CO. The results obtained can open the way to LCAO–MO taxonomy. Using this information, other areas of interest are connected with similarity measures (SCF and CI , localization procedures, etc.), proving in this manner their potential utility.  相似文献   

11.
In many modern chemoinformatics systems, molecules are represented by long binary fingerprint vectors recording the presence or absence of particular features or substructures, such as labeled paths or trees, in the molecular graphs. These long fingerprints are often compressed to much shorter fingerprints using a simple modulo operation. As the length of the fingerprints decreases, their typical density and overlap tend to increase, and so does any similarity measure based on overlap, such as the widely used Tanimoto similarity. Here we show that this correlation between shorter fingerprints and higher similarity can be thought of as a systematic error introduced by the fingerprint folding algorithm and that this systematic error can be corrected mathematically. More precisely, given two molecules and their compressed fingerprints of a given length, we show how a better estimate of their uncompressed overlap, hence of their similarity, can be derived to correct for this bias. We show how the correction can be implemented not only for the Tanimoto measure but also for all other commonly used measures. Experiments on various data sets and fingerprint sizes demonstrate how, with a negligible computational overhead, the correction noticeably improves the sensitivity and specificity of chemical retrieval.  相似文献   

12.
13.
A family of related techniques for the reduction of square-cell configurations (“animals”) to simpler ones by cell-shedding processes provide physically motivated, novel approaches for shape characterization and similarity criteria as well as similarity measures based on equivalence relations. The two main algorithms, csk, k = 1, 2, involve the simultaneous “shedding” of all cells having precisely k sides exposed on the periphery of the animal; the shedding steps are repeated as long as the resulting structure is an animal. Since the termination criteria of these two algorithms are different, they can be combined sequentially into composite algorithms, leading to various alternative shape characterizations and equivalence relations. The third main algorithm, cs32, involves incomplete elimination of peripheral cells of a given type, thus retaining some additional local shape features inherited from the original animal. Following the introduction of these transformations, some of their properties are derived and several examples are discussed. © 1997 John Wiley & Sons, Inc. Int J Quant Chem 62 : 353–361, 1997  相似文献   

14.
The calculation of quantum similarity measures from second-order density functions contracted to intracule and extracule densities obtained at the Hartree-Fock level is presented and applied to a series of atoms, (He, Li, Be, and Ne), isoelectronic molecules (C2H2, HCN, CNH, CO, and N2), and model hydrogen-transfer processes (H2/H+, H2/Hot, H2/H). Second-order quantum similarity measures and indices are found to be suitable measures for quantitatively analyzing electron-pair density reorganizations in atoms, molecules, and chemical processes. For the molecular series, a comparative analysis between the topology of pairwise similarity functions as computed from one-electron, intracule, and extracule densities is carried out and the assignment of each particular local similarity maximum to a molecular alignment discussed. In the comparative study of the three hydrogen-transfer reactions considered, second-order quantum similarity indices are found to be more sensitive than first-order indices for analyzing the electron-density reorganization between the reactant complex and the transition state, thus providing additional insights for a better understanding of the mechanistic aspects of each process. Received: 7 July 1997 / Accepted: 29 October 1997  相似文献   

15.
Simple and accurate relationships between atomic and nuclear quantum similarity measures and their constituent elements were found. These results complement findings in previous studies in which quantum self‐similarity measures in atoms and nuclei were linked to the atomic and mass numbers, respectively. The models were validated on a large test set, and the general trends in the behavior of the quantum similarity measures for these quantum objects were made clear. © 2000 John Wiley & Sons, Inc. Int J Quant Chem 77: 685–692, 2000  相似文献   

16.
Tagged and convex sets concepts and definitions are applied with the aim to discover a general mathematical pattern enveloping the quantum similarity measures framework. As a consequence, several aspects of the quantum similarity theoretical structure become beautifully related to a mathematical construction, adopting the form of some interwoven essential formalism, connecting quantum theory, molecular similarity, convexity and tagged sets. This revised version was published online in July 2006 with corrections to the Cover Date.  相似文献   

17.
18.
String comparison techniques were developed and applied for measuring the molecular similarity of chemical structures. The molecular structures were encoded as a sequence of numbers representing counts of paths of different lengths. The similarity index between two compounds was calculated as the difference between the gains of information derived through comparison of the corresponding molecular path sequences. Ranks between the structures of the studied data base obtained according to this similarity were used as basic data for deriving correspondences between the elements of the set of compounds. The method was applied on a group of 41 barbiturates. Correlation equations were calculated for different groups of compounds grouped according to the displayed similarity. The correlation equations and the corresponding statistics were obtained using standard computer programs. Special algorithm for computing the similarity index and the correlation matrix (outlined very briefly) was developed and implemented on VAX 11/750.  相似文献   

19.
The use of the molecular quantum similarity overlap measure for molecular alignment is investigated. A new algorithm is presented, the quantum similarity superposition algorithm (QSSA), expressing the relative positions of two molecules in terms of mutual translation in three Cartesian directions and three Euler angles. The quantum similarity overlap is then used to optimize the mutual positions of the molecules. A comparison is made with TGSA, a topogeometrical approach, and the influence of differences on molecular clustering is discussed.  相似文献   

20.
A theoretical measure of molecular similarity based on ab initio computations of electron density derived from molecular orbital wave functions is first applied to a model series (CH3CH2CH3, CH3OCH3, and CH3SCH3) and then to the rings in a series of prostaglandins and to some histamine H2 antagonists. Comparison in terms of valence electron density seems to be a good basis for structure-activity studies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号