首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 328 毫秒
1.
Fingerprint scaling is a method to increase the performance of similarity search calculations. It is based on the detection of bit patterns in keyed fingerprints that are signatures of specific compound classes. Application of scaling factors to consensus bits that are mostly set on emphasizes signature bit patterns during similarity searching and has been shown to improve search results for different fingerprints. Similarity search profiling has recently been introduced as a method to analyze similarity search calculations. Profiles separately monitor correctly identified hits and other detected database compounds as a function of similarity threshold values and make it possible to estimate whether virtual screening calculations can be successful or to evaluate why they fail. This similarity search profile technique has been applied here to study fingerprint scaling in detail and better understand effects that are responsible for its performance. In particular, we have focused on the qualitative and quantitative analysis of similarity search profiles under scaling conditions. Therefore, we have carried out systematic similarity search calculations for 23 biological activity classes under scaling conditions over a wide range of scaling factors in a compound database containing approximately 1.3 million molecules and monitored these calculations in similarity search profiles. Analysis of these profiles confirmed increases in hit rates as a consequence of scaling and revealed that scaling influences similarity search calculations in different ways. Based on scaled similarity search profiles, compound sets could be divided into different categories. In a number of cases, increases in search performance under scaling conditions were due to a more significant relative increase in correctly identified hits than detected false-positives. This was also consistent with the finding that preferred similarity threshold values increased due to fingerprint scaling, which was well illustrated by similarity search profiling.  相似文献   

2.
An analysis method termed similarity search profiling has been developed to evaluate fingerprint-based virtual screening calculations. The analysis is based on systematic similarity search calculations using multiple template compounds over the entire value range of a similarity coefficient. In graphical representations, numbers of correctly identified hits and other detected database compounds are separately monitored. The resulting profiles make it possible to determine whether a virtual screening trial can in principle succeed for a given compound class, search tool, similarity metric, and selection criterion. As a test case, we have analyzed virtual screening calculations using a recently designed fingerprint on 23 different biological activity classes in a compound source database containing approximately 1.3 million molecules. Based on our predefined selection criteria, we found that virtual screening analysis was successful for 19 of 23 compound classes. Profile analysis also makes it possible to determine compound class-specific similarity threshold values for similarity searching.  相似文献   

3.
Similarity searching using molecular fingerprints is a widely used approach for the identification of novel hits. A fingerprint search involves many pairwise comparisons of bit string representations of known active molecules with those precomputed for database compounds. Bit string overlap, as evaluated by various similarity metrics, is used as a measure of molecular similarity. Results of a number of studies focusing on fingerprints suggest that it is difficult, if not impossible, to develop generally applicable search parameters and strategies, irrespective of the compound classes under investigation. Rather, more or less, each individual search problem requires an adjustment of calculation conditions. Thus, there is a need for diagnostic tools to analyze fingerprint-based similarity searching. We report an analysis of fingerprint search calculations on different sets of structurally diverse active compounds. Calculations on five biological activity classes were carried out with two fingerprints in two compound source databases, and the results were analyzed in histograms. Tanimoto coefficient (Tc) value ranges where active compounds were detected were compared to the distribution of Tc values in the database. The analysis revealed that compound class-specific effects strongly influenced the outcome of these fingerprint calculations. Among the five diverse compound sets studied, very different search results were obtained. The analysis described here can be applied to determine Tc intervals where scaffold hopping occurs. It can also be used to benchmark fingerprint calculations or estimate their probability of success.  相似文献   

4.
A large-scale similarity search investigation has been carried out on 266 well-defined compound activity classes extracted from the ChEMBL database. The analysis was performed using two widely applied two-dimensional (2D) fingerprints that mark opposite ends of the current performance spectrum of these types of fingerprints, i.e., MACCS structural keys and the extended connectivity fingerprint with bond diameter four (ECFP4). For each fingerprint, three nearest neighbor search strategies were applied. On the basis of these search calculations, a similarity search profile of the ChEMBL database was generated. Overall, the fingerprint search campaign was surprisingly successful. In 203 of 266 test cases (~76%), a compound recovery rate of at least 50% was observed with at least the better performing fingerprint and one search strategy. The similarity search profile also revealed several general trends. For example, fingerprint searching was often characterized by an early enrichment of active compounds in database selection sets. In addition, compound activity classes have been categorized according to different similarity search performance levels, which helps to put the results of benchmark calculations into perspective. Therefore, a compendium of activity classes falling into different search performance categories is provided. On the basis of our large-scale investigation, the performance range of state-of-the-art 2D fingerprinting has been delineated for compound data sets directed against a wide spectrum of pharmaceutical targets.  相似文献   

5.
The large sizes of today's chemical databases require efficient algorithms to perform similarity searches. It can be very time consuming to compare two large chemical databases. This paper seeks to build upon existing research efforts by describing a novel strategy for accelerating existing search algorithms for comparing large chemical collections. The quest for efficiency has focused on developing better indexing algorithms by creating heuristics for searching individual chemical against a chemical library by detecting and eliminating needless similarity calculations. For comparing two chemical collections, these algorithms simply execute searches for each chemical in the query set sequentially. The strategy presented in this paper achieves a speedup upon these algorithms by indexing the set of all query chemicals so redundant calculations that arise in the case of sequential searches are eliminated. We implement this novel algorithm by developing a similarity search program called Symmetric inDexing or SymDex. SymDex shows over a 232% maximum speedup compared to the state-of-the-art single query search algorithm over real data for various fingerprint lengths. Considerable speedup is even seen for batch searches where query set sizes are relatively small compared to typical database sizes. To the best of our knowledge, SymDex is the first search algorithm designed specifically for comparing chemical libraries. It can be adapted to most, if not all, existing indexing algorithms and shows potential for accelerating future similarity search algorithms for comparing chemical databases.  相似文献   

6.
7.
Combination of fingerprint-based similarity coefficients using data fusion   总被引:3,自引:0,他引:3  
Many different types of similarity coefficients have been described in the literature. Since different coefficients take into account different characteristics when assessing the degree of similarity between molecules, it is reasonable to combine them to further optimize the measures of similarity between molecules. This paper describes experiments in which data fusion is used to combine several binary similarity coefficients to get an overall estimate of similarity for searching databases of bioactive molecules. The results show that search performances can be improved by combining coefficients with little extra computational cost. However, there is no single combination which gives a consistently high performance for all search types.  相似文献   

8.
An alternative to experimental high through-put screening is the virtual screening of compound libraries on the computer. In absence of a detailed structure of the receptor protein, candidate molecules are compared with a known reference by mutually superimposing their skeletons and scoring their similarity. Since molecular shape highly depends on the adopted conformation, an efficient conformational screening is performed using a knowledge-based approach. A comprehensive torsion library has been compiled from crystal data stored in the Cambridge Structural Database. For molecular comparison a strategy is followed considering shape associated physicochemical properties in space such as steric occupancy, electrostatics, lipophilicity and potential hydrogen-bonding. Molecular shape is approximated by a set of Gaussian functions not necessarily located at the atomic positions. The superposition is performed in two steps: first by a global alignment search operating on multiple rigid conformations and then by conformationally relaxing the best scored hits of the global search. A normalized similarity scoring is used to allow for a comparison of molecules with rather different shape and size. The approach has been implemented on a cluster of parallel processors. As a case study, the search for ligands binding to the dopamine receptor is given.  相似文献   

9.
10.
11.
Similarity searching using a single bioactive reference structure is a well-established technique for accessing chemical structure databases. This paper describes two extensions of the basic approach. First, we discuss the use of group fusion to combine the results of similarity searches when multiple reference structures are available. We demonstrate that this technique is notably more effective than conventional similarity searching in scaffold-hopping searches for structurally diverse sets of active molecules; conversely, the technique will do little to improve the search performance if the actives are structurally homogeneous. Second, we make the assumption that the nearest neighbors resulting from a similarity search, using a single bioactive reference structure, are also active and use this assumption to implement approximate forms of group fusion, substructural analysis, and binary kernel discrimination. This approach, called turbo similarity searching, is notably more effective than conventional similarity searching.  相似文献   

12.
13.
14.
Recent trends in the computer-aided design of diverse and focussed combinatorial libraries are surveyed. First, chemical data input, storage and retrieval including chemical database management and virtual chemical structure enumeration are outlined as background. Then, the optimization of ADMET parameters, diversity maximization, molecular similarity search, QSAR-based virtual screening, pharmacophore search and molecular docking are discussed.  相似文献   

15.
A similarity-search system is described for proton-NMR spectroscopy. In order to achieve fast retrieval of reference compounds, the 1H-NMR spectra of the data base and of the unknown are encoded in a bitsring. The individual bits of the binary signature describe different features of the spectra. Part of the coupling information is coded in such a way that effects of magnetic field strength are taken into account. The encoding thus permits a fast search for identical and structurally similar reference compounds in the data base even when the spectra were recorded at different magnetic field strengths. Because the search consists of weighted comparison of bits, each of them describing different spectral features, a choice of different kinds of searches is possible with the same signature by selecting appropriate weight vectors. Thus specific spectroscopic features can be selected for the search. Such a context-sensitive similarity-search system allows, for example, a search for compounds having similar multiplicities or similar subspectra in a given (e.g., aromatic) region of the spectrum. Furthermore, by adjusting two “software knobs” which influence the normalization of the search results, the user can choose between the two extremes of forward and reverse search, and between an identity search, similarity search or classification search. The results were tested on a small library containing 550 spectra including some mixtures and duplicates recorded under different experimental conditions at 250 and 400 MHz.  相似文献   

16.
刘琪  邓勇  王川  石铁流  李亦学 《中国化学》2006,24(9):1247-1254
聚类是芯片数据分析中被广泛使用的方法。未知基因的功能通常通过其与已知基因在不同生物状态下具有表达相似性来进行预测。然而,还未有人就这种通过表达相似性来进行功能注释的方法的可靠性进行评估。本文利用Gene Ontology对表达相似性和基因功能相似性的相关关系进行了全面的研究。研究表明,尽管表达谱的相似性和基因功能相似性之间有一定的依赖关系,但相关性较弱。在Gene Ontology的三大类中,相对生物过程和分子功能,基因表达谱的相似性更有助于细胞组分的注释。本文的研究结果对于基因功能的预测有一定的指导意义。  相似文献   

17.
A new method for the computerized search and identification of infrared spectra has been developed and evaluated. Based on cross-correlation, the search system utilizes all spectral information in a digitized spectrum when it attempts to match an unknown spectrum to one in a small library of known spectra. To evaluate a spectral match, the search program calculates the cross-correlation function between the unknown and known (library) spectra which indicates their degree of similarity and allows library spectra to be ranked in order of probability of match to the unknown spectrum. In this study, several small infrared spectral libraries of structurally similar compounds were searched under conditions which examined the sensitivity of the search method to chemical and instrumental variations. Because the correlation technique is slower than conventional file-searching methods, it will probably find greatest use in the search of small collections of similar spectra or as a match-ranking procedure following preliminary selection by a faster search method.  相似文献   

18.
19.
The de novo design program Skelgen has been used to design inhibitor structures for four targets of pharmaceutical interest. The designed structures are compared to modeled binding modes of known inhibitors (i) visually and (ii) by means of a novel similarity measure considering the size and spatial proximity of the maximum common substructure of two small molecules. It is shown that the Skelgen algorithm generates representatives of many inhibitor classes within a very short time and that the new similarity measure is useful for comparing and clustering designed structures. The results demonstrate the necessity of properly defining search constraints in practical applications of de novo design.  相似文献   

20.
The infrared spectra of pure compounds of ninety thousands, poly compounds of twelve thousand, drugs of one thousand were included in the data bank. All of them can be searched out according to their serial number, chemical name, commercial name, amount of each atoms, or molecular formula, as well as their spectrum peak appearances. Program for spectrum information inputting, program for spectrum information search and program for spectrum peak appearance search were included in the system; in addition, spectrum information data bank, spectrum peak code data bank and spectrum figure data bank were attached to the system. System program was written by Visial Basic, and run under Windows system. The spectrum information data bank and spectrum figure data bank were administrated by Microsoft Access.The program for spectrum message inputting can be used to add message data and spectrum figure of some new compounds into the data banks by users themselves. The program for spectrum message search was designed to find out all the message data and spectrum figure of interested compound according to someone of the message data. The program for spectrum peak search was designed to find out some spectra most similar in peak shape with unknown spectrum by peak to peak comparison. When the wavenumbers and transmittances of main peaks in the spectrum of unknown sample were entered, the spectrum peak search was performed and several hits with higher similarity were reported including their similarity scores, spectrum serial numbers, sample's states,melt points, molecular formulas as well as spectrum images. If the search result was not satisfactory,some methods to modify spectrum parameters were reminded and search was performed again.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号