首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 45 毫秒
1.
A wide variety of computational algorithms have been developed that strive to capture the chemical similarity between two compounds for use in virtual screening and lead discovery. One limitation of such approaches is that, while a returned similarity value reflects the perceived degree of relatedness between any two compounds, there is no direct correlation between this value and the expectation or confidence that any two molecules will in fact be equally active. A lack of a common framework for interpretation of similarity measures also confounds the reliable fusion of information from different algorithms. Here, we present a probabilistic framework for interpreting similarity measures that directly correlates the similarity value to a quantitative expectation that two molecules will in fact be equipotent. The approach is based on extensive benchmarking of 10 different similarity methods (MACCS keys, Daylight fingerprints, maximum common subgraphs, rapid overlay of chemical structures (ROCS) shape similarity, and six connectivity-based fingerprints) against a database of more than 150,000 compounds with activity data against 23 protein targets. Given this unified and probabilistic framework for interpreting chemical similarity, principles derived from decision theory can then be applied to combine the evidence from different similarity measures in such a way that both capitalizes on the strengths of the individual approaches and maintains a quantitative estimate of the likelihood that any two molecules will exhibit similar biological activity.  相似文献   

2.
Fingerprint scaling is a method to increase the performance of similarity search calculations. It is based on the detection of bit patterns in keyed fingerprints that are signatures of specific compound classes. Application of scaling factors to consensus bits that are mostly set on emphasizes signature bit patterns during similarity searching and has been shown to improve search results for different fingerprints. Similarity search profiling has recently been introduced as a method to analyze similarity search calculations. Profiles separately monitor correctly identified hits and other detected database compounds as a function of similarity threshold values and make it possible to estimate whether virtual screening calculations can be successful or to evaluate why they fail. This similarity search profile technique has been applied here to study fingerprint scaling in detail and better understand effects that are responsible for its performance. In particular, we have focused on the qualitative and quantitative analysis of similarity search profiles under scaling conditions. Therefore, we have carried out systematic similarity search calculations for 23 biological activity classes under scaling conditions over a wide range of scaling factors in a compound database containing approximately 1.3 million molecules and monitored these calculations in similarity search profiles. Analysis of these profiles confirmed increases in hit rates as a consequence of scaling and revealed that scaling influences similarity search calculations in different ways. Based on scaled similarity search profiles, compound sets could be divided into different categories. In a number of cases, increases in search performance under scaling conditions were due to a more significant relative increase in correctly identified hits than detected false-positives. This was also consistent with the finding that preferred similarity threshold values increased due to fingerprint scaling, which was well illustrated by similarity search profiling.  相似文献   

3.
This paper compares 22 different similarity coefficients when they are used for searching databases of 2D fragment bit-strings. Experiments with the National Cancer Institute s AIDS and IDAlert databases show that the coefficients fall into several well-marked clusters, in which the members of a cluster will produce comparable rankings of a set of molecules. These clusters provide a basis for selecting combinations of coefficients for use in data fusion experiments. The results of these experiments provide a simple way of increasing the effectiveness of fragment-based similarity searching systems.  相似文献   

4.
Differences in molecular complexity and size are known to bias the evaluation of fingerprint similarity. For example, complex molecules tend to produce fingerprints with higher bit density than simple ones, which often leads to artificially high similarity values in search calculations. We introduce here a variant of the Tversky coefficient that makes it possible to modulate or eliminate molecular complexity effects when evaluating fingerprint similarity. This has enabled us to study in detail the role of molecular complexity in similarity searching and the relationship between reference and active database compounds. Balancing complexity effects leads to constant distributions of similarity values for reference and database molecules, independent of how compound contributions are weighted. When searching for active compounds with varying complexity, hit rates can be optimized by modulating complexity effects, rather than eliminating them, and adjusting relative compound weights. For reference molecules and active database compounds having different complexity, preferred parameter settings are identified.  相似文献   

5.
Similarity searches using combinations of seven different similarity coefficients and six different representations have been carried out on the Dictionary of Natural Products database. The objective was to discover if any special methods of searching apply to this database, which is very different in nature from the many synthetic databases that have been the subject of previous studies of similarity searching. Search effectiveness was assessed by a recall analysis of the search outputs from sets of pharmacologically active target structures. The different target sets produce exceptional but contradictory results for the Russell-Rao and Forbes coefficients, which have been shown to be due to a dependence on molecular size; these are the coefficients of choice in the case of large and small structures, respectively. Rankings from these results have been combined using a data fusion scheme and some small gains in performance were normally obtained by using substructural fingerprints and molecular holograms in combination with the Squared Euclidean or Tanimoto coefficients.  相似文献   

6.
A statistical approach named the conditional correlated Bernoulli model is introduced for modeling of similarity scores and predicting the potential of fingerprint search calculations to identify active compounds. Fingerprint features are rationalized as dependent Bernoulli variables and conditional distributions of Tanimoto similarity values of database compounds given a reference molecule are assessed. The conditional correlated Bernoulli model is utilized in the context of virtual screening to estimate the position of a compound obtaining a certain similarity value in a database ranking. Through the generation of receiver operating characteristic curves from cumulative distribution functions of conditional similarity values for known active and random database compounds, one can predict how successful a fingerprint search might be. The comparison of curves for different fingerprints makes it possible to identify fingerprints that are most likely to identify new active molecules in a database search given a set of known reference molecules.  相似文献   

7.
8.
9.
Summary This paper describes techniques for calculating the degree of similarity between an input query molecule and each of the molecules in a database of 3-D chemical structures. The inter-molecular similarity measure used is the number of atoms in the 3-D common substructure (CS) between the two molecules which are being compared. The identification of 3-D CSs is very demanding of computational resources, even when an efficient clique detection algorithm is used for this purpose. Two types of upperbound calculation are described which allow reductions in the number of exact CS searches which need to be carried out to identify those molecules from a database which are similar to a 3-D target molecule.  相似文献   

10.
An analysis method termed similarity search profiling has been developed to evaluate fingerprint-based virtual screening calculations. The analysis is based on systematic similarity search calculations using multiple template compounds over the entire value range of a similarity coefficient. In graphical representations, numbers of correctly identified hits and other detected database compounds are separately monitored. The resulting profiles make it possible to determine whether a virtual screening trial can in principle succeed for a given compound class, search tool, similarity metric, and selection criterion. As a test case, we have analyzed virtual screening calculations using a recently designed fingerprint on 23 different biological activity classes in a compound source database containing approximately 1.3 million molecules. Based on our predefined selection criteria, we found that virtual screening analysis was successful for 19 of 23 compound classes. Profile analysis also makes it possible to determine compound class-specific similarity threshold values for similarity searching.  相似文献   

11.
12.
13.
A quantum similarity measure between two molecules is normally identified with the maximum value of the overlap of the corresponding molecular electron densities. The electron density overlap is a function of the mutual positioning of the compared molecules, requiring the measurement of similarity, a solution of a multiple-maxima problem. Collapsing the molecular electron densities into the nuclei provides the essential information toward a global maximization of the overlap similarity function, the maximization of which, in this limit case, appears to be related to the so-called assignment problem. Three levels of approach are then proposed for a global search scanning of the similarity function. In addition, atom—atom similarity Lorentzian potential functions are defined for a rapid completion of the function scanning. Performance is tested among these three levels of simplification and the Monte Carlo and simplex methods. Results reveal the present algorithms as accurate, rapid, and unbiased techniques for density-based molecular alignments. © 1997 by John Wiley & Sons, Inc. J Comput Chem 18: 826–846, 1997  相似文献   

14.
A large-scale similarity search investigation has been carried out on 266 well-defined compound activity classes extracted from the ChEMBL database. The analysis was performed using two widely applied two-dimensional (2D) fingerprints that mark opposite ends of the current performance spectrum of these types of fingerprints, i.e., MACCS structural keys and the extended connectivity fingerprint with bond diameter four (ECFP4). For each fingerprint, three nearest neighbor search strategies were applied. On the basis of these search calculations, a similarity search profile of the ChEMBL database was generated. Overall, the fingerprint search campaign was surprisingly successful. In 203 of 266 test cases (~76%), a compound recovery rate of at least 50% was observed with at least the better performing fingerprint and one search strategy. The similarity search profile also revealed several general trends. For example, fingerprint searching was often characterized by an early enrichment of active compounds in database selection sets. In addition, compound activity classes have been categorized according to different similarity search performance levels, which helps to put the results of benchmark calculations into perspective. Therefore, a compendium of activity classes falling into different search performance categories is provided. On the basis of our large-scale investigation, the performance range of state-of-the-art 2D fingerprinting has been delineated for compound data sets directed against a wide spectrum of pharmaceutical targets.  相似文献   

15.
Current systems for similarity-based virtual screening use similarity measures in which all the fragments in a fingerprint contribute equally to the calculation of structural similarity. This paper discusses the weighting of fragments on the basis of their frequencies of occurrence in molecules. Extensive experiments with sets of active molecules from the MDL Drug Data Report and the World of Molecular Bioactivity databases, using fingerprints encoding Tripos holograms, Pipeline Pilot ECFC_4 circular substructures and Sunset Molecular keys, demonstrate clearly that frequency-based screening is generally more effective than conventional, unweighted screening. The results suggest that standardising the raw occurrence frequencies by taking the square root of the frequencies will maximise the effectiveness of virtual screening. An upper-bound analysis shows the complex interactions that can take place between representations, weighting schemes and similarity coefficients when similarity measures are computed, and provides a rationalisation of the relative performance of the various weighting schemes.  相似文献   

16.
Similarity searching using a single bioactive reference structure is a well-established technique for accessing chemical structure databases. This paper describes two extensions of the basic approach. First, we discuss the use of group fusion to combine the results of similarity searches when multiple reference structures are available. We demonstrate that this technique is notably more effective than conventional similarity searching in scaffold-hopping searches for structurally diverse sets of active molecules; conversely, the technique will do little to improve the search performance if the actives are structurally homogeneous. Second, we make the assumption that the nearest neighbors resulting from a similarity search, using a single bioactive reference structure, are also active and use this assumption to implement approximate forms of group fusion, substructural analysis, and binary kernel discrimination. This approach, called turbo similarity searching, is notably more effective than conventional similarity searching.  相似文献   

17.
Molecules with similar shapes and features often have similar biological activity. Several computational approaches search chemical databases for new leads or templates based on overall molecular shape similarity. However, active molecules often present critical subshapes that are required for binding, which may be missed by comparing overall shape similarity. We present a new approach to compare molecular shapes of different sizes and to calculate subshape similarity. We developed a skeletal representation of the shape which is topologically unrelated to covalent chemical connectivity. This simplifies rotational and translational sampling. We test initial possible alignments by matching similar triangles. This triangle-matching filter rapidly eliminates most geometrically impossible matches. Surviving matches are filtered further in successive stages. These stages involve direction, feature, and shape matching procedures. Our approach is applied to several situations demonstrating lead discovery and evolution.  相似文献   

18.
This paper reports an evaluation of both graph-based and fingerprint-based measures of structural similarity, when used for virtual screening of sets of 2D molecules drawn from the MDDR and ID Alert databases. The graph-based measures employ a new maximum common edge subgraph isomorphism algorithm, called RASCAL, with several similarity coefficients described previously for quantifying the similarity between pairs of graphs. The effectiveness of these graph-based searches is compared with that resulting from similarity searches using BCI, Daylight and Unity 2D fingerprints. Our results suggest that graph-based approaches provide an effective complement to existing fingerprint-based approaches to virtual screening.  相似文献   

19.
A method was required to objectively determine whether various melanins were synthesised from different precursors, and whether synthesis from the same precursor was reproducible. Melanins have a complex, heterogeneous, polymeric structure, making them difficult to characterise and compare. Pyrolysis chromatography may be useful for such large molecules that are not amenable to analysis by other methods, but the resulting chromatograms are usually complex and difficult to compare. Techniques used to objectively differentiate between such chromatograms often employ statistical methods that are difficult to use and interpret without specialised knowledge. Melanins were analysed by pyrolysis/gas chromatography/mass spectrometry (PY/GC/MS). Software was developed to automate the conversion of the resulting total ion current (TIC) chromatograms to pseudo-mass spectra (PMS), consisting of a row vector of chromatographic peak areas (analogous to ion abundances) ordered along a retention time axis (analogous to m/z ratio). The National Institute of Standards and Technology (NIST, USA) mass spectral search program, which is used widely as an objective measure of the degree of similarity between mass spectra, was then used to generate match factors for comparisons of the generated PMS. Match factors between melanins synthesised from the same precursors were not significantly different, while match factors between melanins synthesised from different precursors were significantly smaller. The reproducibility of the pyrolysis technique was reasonable, with the majority of relative standard deviation (RSD) values of match factors from melanins synthesised from the same precursor, being below 10% (n=5 for each melanin type). While the method does not allow the unequivocal identification of individual melanin types in isolation, it can be used to compare melanins from different sources and objectively estimate the degree of similarity between them on the basis of significant differences between their pyrograms.  相似文献   

20.
An alternative to experimental high through-put screening is the virtual screening of compound libraries on the computer. In absence of a detailed structure of the receptor protein, candidate molecules are compared with a known reference by mutually superimposing their skeletons and scoring their similarity. Since molecular shape highly depends on the adopted conformation, an efficient conformational screening is performed using a knowledge-based approach. A comprehensive torsion library has been compiled from crystal data stored in the Cambridge Structural Database. For molecular comparison a strategy is followed considering shape associated physicochemical properties in space such as steric occupancy, electrostatics, lipophilicity and potential hydrogen-bonding. Molecular shape is approximated by a set of Gaussian functions not necessarily located at the atomic positions. The superposition is performed in two steps: first by a global alignment search operating on multiple rigid conformations and then by conformationally relaxing the best scored hits of the global search. A normalized similarity scoring is used to allow for a comparison of molecules with rather different shape and size. The approach has been implemented on a cluster of parallel processors. As a case study, the search for ligands binding to the dopamine receptor is given.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号