首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 421 毫秒
1.
Similarity-based methods for virtual screening are widely used. However, conventional searching using 2D chemical fingerprints or 2D graphs may retrieve only compounds which are structurally very similar to the original target molecule. Of particular current interest then is scaffold hopping, that is, the ability to identify molecules that belong to different chemical series but which could form the same interactions with a receptor. Reduced graphs provide summary representations of chemical structures and, therefore, offer the potential to retrieve compounds that are similar in terms of their gross features rather than at the atom-bond level. Using only a fingerprint representation of such graphs, we have previously shown that actives retrieved were more diverse than those found using Daylight fingerprints. Maximum common substructures give an intuitively reasonable view of the similarity between two molecules. However, their calculation using graph-matching techniques is too time-consuming for use in practical similarity searching in larger data sets. In this work, we exploit the low cardinality of the reduced graph in graph-based similarity searching. We reinterpret the reduced graph as a fully connected graph using the bond-distance information of the original graph. We describe searches, using both the maximum common induced subgraph and maximum common edge subgraph formulations, on the fully connected reduced graphs and compare the results with those obtained using both conventional chemical and reduced graph fingerprints. We show that graph matching using fully connected reduced graphs is an effective retrieval method and that the actives retrieved are likely to be topologically different from those retrieved using conventional 2D methods.  相似文献   

2.
In this review, we discuss a number of computational methods that have been developed or adapted for molecule classification and virtual screening (VS) of compound databases. In particular, we focus on approaches that are complementary to high-throughput screening (HTS). The discussion is limited to VS methods that operate at the small molecular level, which is often called ligand-based VS (LBVS), and does not take into account docking algorithms or other structure-based screening tools. We describe areas that greatly benefit from combining virtual and biological screening and discuss computational methods that are most suitable to contribute to the integration of screening technologies. Relevant approaches range from established methods such as clustering or similarity searching to techniques that have only recently been introduced for LBVS applications such as statistical methods or support vector machines. Finally, we discuss a number of representative applications at the interface between VS and HTS.  相似文献   

3.
Fragment‐based searching and abstract representation of molecular features through reduced graphs have separately been used for virtual screening. Here, we combine these two approaches and apply the algorithm RedFrag to virtual screens retrospectively and prospectively. It uses a new type of reduced graph that does not suffer from information loss during its construction and bypasses the necessity of feature definitions. Built upon chemical epitopes resulting from molecule fragmentation, the reduced graph embodies physico‐chemical and 2D‐structural properties of a molecule. Reduced graphs are compared with a continuous‐similarity‐distance‐driven maximal common subgraph algorithm, which calculates similarity at the fragmental and topological levels. The performance of the algorithm is evaluated by retrieval experiments utilizing precompiled validation sets. By predicting and experimentally testing ligands for endothiapepsin, a challenging model protease, the method is assessed in a prospective setting. Here, we identified five novel ligands with affinities as low as 2.08 μM. © 2015 Wiley Periodicals, Inc.  相似文献   

4.
We propose a novel method to prioritize libraries for combinatorial synthesis and high-throughput screening that assesses the viability of a particular library on the basis of the aggregate physical-chemical properties of the compounds using a na?ve Bayesian classifier. This approach prioritizes collections of related compounds according to the aggregate values of their physical-chemical parameters in contrast to single-compound screening. The method is also shown to be useful in screening existing noncombinatorial libraries when the compounds in these libraries have been previously clustered according to their molecular graphs. We show that the method used here is comparable or superior to the single-compound virtual screening of combinatorial libraries and noncombinatorial libraries and is superior to the pairwise Tanimoto similarity searching of a collection of combinatorial libraries.  相似文献   

5.
Accurate clustering of cells from single-cell RNA sequencing (scRNA-seq) data is an essential step for biological analysis such as putative cell type identification. However, scRNA-seq data has high dimension and high sparsity, which makes traditional clustering methods less effective to reflect the similarity between cells. Since genetic network fundamentally defines the functions of cell and deep learning shows strong advantages in network representation learning, we propose a novel scRNA-seq clustering framework ScGSLC based on graph similarity learning. ScGSLC effectively integrates scRNA-seq data and protein-protein interaction network to a graph. Then graph convolution network is employed by ScGSLC to embedding graph and clustering the cells by the calculated similarity between graphs. Unsupervised clustering results of nine public data sets demonstrate that ScGSLC shows better performance than the state-of-the-art methods.  相似文献   

6.
As the use of high-throughput screening systems becomes more routine in the drug discovery process, there is an increasing need for fast and reliable analysis of the massive amounts of the resulting data. At the forefront of the methods used is data reduction, often assisted by cluster analysis. Activity thresholds reduce the data set under investigation to manageable sizes while clustering enables the detection of natural groups in that reduced subset, thereby revealing families of compounds that exhibit increased activity toward a specific biological target. The above process, designed to handle primarily data sets of sizes much smaller than the ones currently produced by high-throughput screening systems, has become one of the main bottlenecks of the modern drug discovery process. In addition to being fragmented and heavily dependent on human experts, it also ignores all screening information related to compounds with activity less than the threshold chosen and thus, in the best case, can only hope to discover a subset of the knowledge available in the screening data sets. To address the deficiencies of the current screening data analysis process the authors have developed a new method that analyzes thoroughly large screening data sets. In this report we describe in detail this new approach and present its main differences with the methods currently in use. Further, we analyze a well-known, publicly available data set using the proposed method. Our experimental results show that the proposed method can improve significantly both the ease of extraction and amount of knowledge discovered from screening data sets.  相似文献   

7.
8.
The emergence of large chemical databases imposes a need for organizing the compounds in these databases. Mapping the chemical graph in particular, and a molecular equivalence class represented by a labeled pseudograph in general, to a unique number or string facilitates high-throughput browsing, grouping, and searching of the chemical database. Computing this number using a naming adaptation of the Morgan algorithm, we observed a large classification noise in which nonisomorphic graphs were mapped to the same number. Our extensions to that algorithm greatly reduced the classification noise.  相似文献   

9.
In this paper we introduce a quantitative model that relates chemical structural similarity to biological activity, and in particular to the activity of lead series of compounds in high-throughput assays. From this model we derive the optimal screening collection make up for a given fixed size of screening collection, and identify the conditions under which a diverse collection of compounds or a collection focusing on particular regions of chemical space are appropriate strategies. We derive from the model a diversity function that may be used to assess compounds for acquisition or libraries for combinatorial synthesis by their ability to complement an existing screening collection. The diversity function is linked directly through the model to the goal of more frequent discovery of lead series from high-throughput screening. We show how the model may also be used to derive relationships between collection size and probabilities of lead discovery in high-throughput screening, and to guide the judicious application of structural filters.  相似文献   

10.
11.
On the basis of the recently introduced reduced graph concept of ErG (extending reduced graphs), a straightforward weighting approach to include additional (e.g., structural or SAR) knowledge into similarity searching procedures for virtual screening (wErG) is proposed. This simple procedure is exemplified with three data sets, for which interaction patterns available from X-ray structures of native or peptidomimetic ligands with their target protein are used to significantly improve retrieval rates of known actives from the MDL Drug Report database. The results are compared to those of other virtual screening techniques such as Daylight fingerprints, FTrees, UNITY, and various FlexX docking protocols. Here, it is shown that wErG exhibits a very good and stable performance independent of the target structure. On the basis of this (and the fact that ErG retrieves structurally more dissimilar compounds due to its potential to perform scaffold-hopping), the combination of wErG and FlexX is successfully explored. Overall, wErG is not only an easily applicable weighting procedure that efficiently identifies actives in large data sets but it is also straightforward to understand for both medicinal and computational chemists and can, therefore, be driven by several aspects of project-related knowledge (e.g., X-ray, NMR, SAR, and site-directed mutagenesis) in a very early stage of the hit identification process.  相似文献   

12.
13.
Previously (Hähnke et al., J Comput Chem 2010, 31, 2810) we introduced the concept of nonlinear dimensionality reduction for canonization of two‐dimensional layouts of molecular graphs as foundation for text‐based similarity searching using our Pharmacophore Alignment Search Tool (PhAST), a ligand‐based virtual screening method. Here we apply these methods to three‐dimensional molecular conformations and investigate the impact of these additional degrees of freedom on virtual screening performance and assess differences in ranking behavior. Best‐performing variants of PhAST are compared with 16 state‐of‐the‐art screening methods with respect to significance estimates for differences in screening performance. We show that PhAST sorts new chemotypes on early ranks without sacrificing overall screening performance. We succeeded in combining PhAST with other virtual screening techniques by rank‐based data fusion, significantly improving screening capabilities. We also present a parameterization of double dynamic programming for the problem of small molecule comparison, which allows for the calculation of structural similarity between compounds based on one‐dimensional representations, opening the door to a holistic approach to molecule comparison based on textual representations. © 2011 Wiley Periodicals, Inc. J Comput Chem , 2011.  相似文献   

14.
Complementarity of molecular surfaces is crucial for molecular recognition. A method for representation of molecular shape is presented. We decompose the molecular surface into commensurate patches with defined shape by fitting hyperbolical paraboloids onto a triangulated isosurface of the Gaussian model of a molecule. As a result of this decomposition we obtain a 3D graph representation of the molecular shape, which can be used for complete and partial shape matching and isosteric group searching. To point out the possibilities and limitations of shape-only models, we challenged our method by three scenarios in a virtual screening contest: rigid body alignment, consensus shape filtering, and target-specific screening.  相似文献   

15.
We present a ligand-based virtual screening technique (PhAST) for rapid hit and lead structure searching in large compound databases. Molecules are represented as strings encoding the distribution of pharmacophoric features on the molecular graph. In contrast to other text-based methods using SMILES strings, we introduce a new form of text representation that describes the pharmacophore of molecules. This string representation opens the opportunity for revealing functional similarity between molecules by sequence alignment techniques in analogy to homology searching in protein or nucleic acid sequence databases. We favorably compared PhAST with other current ligand-based virtual screening methods in a retrospective analysis using the BEDROC metric. In a prospective application, PhAST identified two novel inhibitors of 5-lipoxygenase product formation with minimal experimental effort. This outcome demonstrates the applicability of PhAST to drug discovery projects and provides an innovative concept of sequence-based compound screening with substantial scaffold hopping potential.  相似文献   

16.
A methodology is introduced to assign energy-based scores to two-dimensional (2D) structural features based on three-dimensional (3D) ligand-target interaction information and utilize interaction-annotated features in virtual screening. Database molecules containing such fragments are assigned cumulative scores that serve as a measure of similarity to active reference compounds. The Interaction Annotated Structural Features (IASF) method is applied to mine five high-throughput screening (HTS) data sets and often identifies more hits than conventional fragment-based similarity searching or ligand-protein docking.  相似文献   

17.
Molecular “fingerprints” encoding structural information are the workhorse of cheminformatics and machine learning in drug discovery applications. However, fingerprint representations necessarily emphasize particular aspects of the molecular structure while ignoring others, rather than allowing the model to make data-driven decisions. We describe molecular graph convolutions, a machine learning architecture for learning from undirected graphs, specifically small molecules. Graph convolutions use a simple encoding of the molecular graph—atoms, bonds, distances, etc.—which allows the model to take greater advantage of information in the graph structure. Although graph convolutions do not outperform all fingerprint-based methods, they (along with other graph-based methods) represent a new paradigm in ligand-based virtual screening with exciting opportunities for future improvement.  相似文献   

18.
Similarity searching using reduced graphs   总被引:3,自引:0,他引:3  
Reduced graphs provide summary representations of chemical structures. In this work, the effectiveness of reduced graphs for similarity searching is investigated. Different types of reduced graphs are introduced that aim to summarize features of structures that have the potential to form interactions with receptors while retaining the topology between the features. Similarity searches have been carried out across a variety of different activity classes. The effectiveness of the reduced graphs at retrieving compounds with the same activity as known target compounds is compared with searching using Daylight fingerprints. The reduced graphs are shown to be effective for similarity searching and to retrieve more diverse active compounds than those found using Daylight fingerprints; they thus represent a complementary similarity searching tool.  相似文献   

19.
Target-oriented substructure-based virtual screening (sSBVS) of molecules is a promising approach in drug discovery. Yet, there are doubts whether sSBVS is suitable also for extrapolation, that is, for detecting molecules that are very different from those used for training. Herein, we evaluate the predictive power of classic virtual screening methods, namely, similarity searching using Tanimoto coefficient (MTC) and Naive Bayes (NB). As could be expected, these classic methods perform better in interpolation than in extrapolation tasks. Consequently, to enhance the predictive ability for extrapolation tasks, we introduce the Shadow approach, in which inclusion relations between substructures are considered, as opposed to the classic sSBVS methods that assume independence between substructures. Specifically, we discard contributions from substructures included in ("shaded" by) others which are, in turn, included in the molecule of interest. Indeed, the Shadow classifier significantly outperforms both MTC (pValue = 3.1 × 10(-16)) and NB (pValue = 3.5 × 10(-9)) in detecting hits sharing low similarity with the training active molecules.  相似文献   

20.
Screening a library of molecular graphs for an exact or approximate match with one particular molecular graph, the query graph, is reduced to list comparisons. The lists contain lengths of shortest paths in graph Voronoi regions. This induces the notion of shortest path similarity. All graphs that are shortest path similar to the query graph are efficiently retrievable. The same applies to approximate or similarity matching. For the retrieval of all superstructures of a query, shortest path lists are modified to distance patterns. This also allows algorithmic support for query construction.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号