期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Toward an improved clustering of large data sets using maximum common substructures and topological fingerprints

Böcker A 《Journal of chemical information and modeling》2008,48(11):2097-2107

相似文献

2.

A hierarchical clustering approach for large compound libraries

Böcker A Derksen S Schmidt E Teckentrup A Schneider G 《Journal of chemical information and modeling》2005,45(4):807-815

相似文献

3.

Using consensus-shape clustering to identify promiscuous ligands and protein targets and to choose the right query for shape-based virtual screening

Pérez-Nueno VI Ritchie DW 《Journal of chemical information and modeling》2011,51(6):1233-1248

Ligand-based shape matching approaches have become established as important and popular virtual screening (VS) techniques. However, despite their relative success, many authors have discussed how best to choose the initial query compounds and which of their conformations should be used. Furthermore, it is increasingly the case that pharmaceutical companies have multiple ligands for a given target and these may bind in different ways to the same pocket. Conversely, a given ligand can sometimes bind to multiple targets, and this is clearly of great importance when considering drug side-effects. We recently introduced the notion of spherical harmonic-based "consensus shapes" to help deal with these questions. Here, we apply a consensus shape clustering approach to the 40 protein-ligand targets in the DUD data set using PARASURF/PARAFIT. Results from clustering show that in some cases the ligands for a given target are split into two subgroups which could suggest they bind to different subsites of the same target. In other cases, our clustering approach sometimes groups together ligands from different targets, and this suggests that those ligands could bind to the same targets. Hence spherical harmonic-based clustering can rapidly give cross-docking information while avoiding the expense of performing all-against-all docking calculations. We also report on the effect of the query conformation on the performance of shape-based screening of the DUD data set and the potential gain in screening performance by using consensus shapes calculated in different ways. We provide details of our analysis of shape-based screening using both PARASURF/PARAFIT and ROCS, and we compare the results obtained with shape-based and conventional docking approaches using MSSH/SHEF and GOLD. The utility of each type of query is analyzed using commonly reported statistics such as enrichment factors (EF) and receiver-operator-characteristic (ROC) plots as well as other early performance metrics. 相似文献

4.

Lead Finder docking and virtual screening evaluation with Astex and DUD test sets

Novikov FN Stroylov VS Zeifman AA Stroganov OV Kulkov V Chilov GG 《Journal of computer-aided molecular design》2012,26(6):725-735

Lead Finder is a molecular docking software. Sampling uses an original implementation of the genetic algorithm that involves a number of additional optimization procedures. Lead Finder's scoring functions employ a set of semi-empiric molecular mechanics functionals that have been parameterized independently for docking, binding energy predictions and rank-ordering for virtual screening. Sampling and scoring both utilize a staged approach, moving from fast but less accurate algorithm versions to computationally more intensive but more accurate versions. Lead Finder includes tools for the preparation of full atom protein and ligand models. In this exercise, Lead Finder achieved 72.9% docking success rate on the Astex test set when the original author-prepared full atom models were used, and 74.1% success rate when the structures were prepared by Lead Finder. The major cause of docking failures were scoring errors resulting from the use of imperfect solvation models. In many cases, docking errors could be corrected by the proper protonation and the use of correct cyclic conformations of ligands. In virtual screening experiments on the DUD test set the early enrichment factor of several tens was achieved on average. However, the area under the ROC curve ("AUC ROC") ranged from 0.70 to 0.74 depending on the screening protocol used, and the separation from the null model was not perfect-0.12-0.15 units of AUC ROC. We assume that effective virtual screening in the whole range of enrichment curve and not just at the early enrichment stages requires more accurate solvation modeling and accounting for the protein backbone flexibility. 相似文献

5.

Aerosol time-of-flight mass spectrometry data analysis: a benchmark of clustering algorithms 总被引：1，自引：0，他引：1

Rebotier TP Prather KA 《Analytica chimica acta》2007,585(1):38-54

Airborne particulate matter is an important component of atmospheric pollution, affecting human health, climate, and visibility. Modern instruments allow single particles to be analyzed one-by-one in real time, and offer the promise of determining the sources of individual particles based on their mass spectral signatures. The large number of particles to be apportioned makes clustering a necessary step. The goal of this study is to compare using mass spectral data the accuracy and speed of several clustering algorithms: ART-2a, several variants of hierarchical clustering, and K-means. Repeated simulations with various algorithms and different levels of data preprocessing suggest that hierarchical clustering methods using derivatives of Ward's algorithm discriminate sources with fewer errors than ART-2a, which itself discriminates much better than point-wise hierarchical clustering methods. In most cases, K-means algorithms do almost as well as the best hierarchical clustering. These efficient algorithms (clustering derived from Ward's algorithm, ART-2a and K-means) are most accurate when the relative peak areas have been pre-scaled by taking the square root. Analysis times vary within a factor of 30, and when accuracy above 95% is required, run times scale up as the square of the number of particles. Algorithms derived from Ward's remain the most accurate under a wide range of conditions and conversely, for an equal accuracy, can deliver a shorter list of clusters, allowing faster and maybe on-the-fly classification. 相似文献

6.

A cluster-based strategy for assessing the overlap between large chemical libraries and its application to a recent acquisition

Engels MF Gibbs AC Jaeger EP Verbinnen D Lobanov VS Agrafiotis DK 《Journal of chemical information and modeling》2006,46(6):2651-2660

相似文献

7.

Restrepo G Mesa H Llanos EJ 《Journal of chemical information and modeling》2007,47(3):761-770

We discussed three dissimilarity measures between dendrograms defined over the same set, they are triples, partition, and cluster indices. All of them decompose the dendrograms into subsets. In the case of triples and partition indices, these subsets correspond to binary partitions containing some clusters, while in the cluster index, a novel dissimilarity method introduced in this paper, the subsets are exclusively clusters. In chemical applications, the dendrograms gather clusters that contain similarity information of the data set under study. Thereby, the cluster index is the most suitable dissimilarity measure between dendrograms resulting from chemical investigation. An application example of the three measures is shown to remark upon the advantages of the cluster index over the other two methods in similarity studies. Finally, the cluster index is used to measure the differences between five dendrograms obtained when applying five common hierarchical clustering algorithms on a database of 1000 molecules. 相似文献

8.

Comparison of 2D fingerprint types and hierarchy level selection methods for structural grouping using Ward's clustering

Wild DJ Blankley CJ 《Journal of chemical information and computer sciences》2000,40(1):155-162

Four different two-dimensional fingerprint types (MACCS, Unity, BCI, and Daylight) and nine methods of selecting optimal cluster levels from the output of a hierarchical clustering algorithm were evaluated for their ability to select clusters that represent chemical series present in some typical examples of chemical compound data sets. The methods were evaluated using a Ward's clustering algorithm on subsets of the publicly available National Cancer Institute HIV data set, as well as with compounds from our corporate data set. We make a number of observations and recommendations about the choice of fingerprint type and cluster level selection methods for use in this type of clustering 相似文献

9.

Computation of 3D queries for ROCS based virtual screens

Gregory J. Tawa J. Christian Baber Christine Humblet 《Journal of computer-aided molecular design》2009,23(12):853-868

Rapid overlay of chemical structures (ROCS) is a method that aligns molecules based on shape and/or chemical similarity. It is often used in 3D ligand-based virtual screening. Given a query consisting of a single conformation of an active molecule ROCS can generate highly enriched hit lists. Typically the chosen query conformation is a minimum energy structure. Can better enrichment be obtained using conformations other than the minimum energy structure? To answer this question a methodology has been developed called CORAL (COnformational analysis, Rocs ALignment). For a given set of molecule conformations it computes optimized conformations for ROCS screening. It does so by clustering all conformations of a chosen molecule set using pairwise ROCS combo scores. The best representative conformation is that which has the highest average overlap with the rest of the conformations in the cluster. It is these best representative conformations that are then used for virtual screening. CORAL was tested by performing virtual screening experiments with the 40 DUD (Directory of Useful Decoys) data sets. Both CORAL and minimum energy queries were used. The recognition capability of each query was quantified as the area under the ROC curve (AUC). Results show that the CORAL AUC values are on average larger than the minimum energy AUC values. This demonstrates that one can indeed obtain better ROCS enrichments with conformations other than the minimum energy structure. As a result, CORAL analysis can be a valuable first step in virtual screening workflows using ROCS. 相似文献

10.

Feature selection for hierarchical clustering

F. QuestierB. Walczak D.L. Massart C. BouconS. de Jong 《Analytica chimica acta》2002,466(2):311-324

Feature selection is a valuable technique in data analysis for information-preserving data reduction. This paper describes a feature selection approach for hierarchical clustering based on genetic algorithms using a fitness function that tries to minimize the difference between the dissimilarity matrix of the original feature set and the one of the reduced feature sets. Clustering trees based on reduced feature sets are comparable with those based on the complete feature set. Special measures to favor small reduced feature sets are discussed. 相似文献

11.

Computational fragment-based screening using RosettaLigand: the SAMPL3 challenge

Kumar A Zhang KY 《Journal of computer-aided molecular design》2012,26(5):603-616

SAMPL3 fragment based virtual screening challenge provides a valuable opportunity for researchers to test their programs, methods and screening protocols in a blind testing environment. We participated in SAMPL3 challenge and evaluated our virtual fragment screening protocol, which involves RosettaLigand as the core component by screening a 500 fragments Maybridge library against bovine pancreatic trypsin. Our study reaffirmed that the real test for any virtual screening approach would be in a blind testing environment. The analyses presented in this paper also showed that virtual screening performance can be improved, if a set of known active compounds is available and parameters and methods that yield better enrichment are selected. Our study also highlighted that to achieve accurate orientation and conformation of ligands within a binding site, selecting an appropriate method to calculate partial charges is important. Another finding is that using multiple receptor ensembles in docking does not always yield better enrichment than individual receptors. On the basis of our results and retrospective analyses from SAMPL3 fragment screening challenge we anticipate that chances of success in a fragment screening process could be increased significantly with careful selection of receptor structures, protein flexibility, sufficient conformational sampling within binding pocket and accurate assignment of ligand and protein partial charges. 相似文献

12.

Homology model-based virtual screening for GPCR ligands using docking and target-biased scoring

Radestock S Weil T Renner S 《Journal of chemical information and modeling》2008,48(5):1104-1117

The current study investigates the combination of two recently reported techniques for the improvement of homology model-based virtual screening for G-protein coupled receptor (GPCR) ligands. First, ligand-supported homology modeling was used to generate receptor models that were in agreement with mutagenesis data and structure-activity relationship information of the ligands. Second, interaction patterns from known ligands to the receptor were applied for scoring and rank ordering compounds from a virtual library using ligand-receptor interaction fingerprint-based similarity (IFS). Our approach was evaluated in retrospective virtual screening experiments for antagonists of the metabotropic glutamate receptor (mGluR) subtype 5. The results of our approach were compared to the results obtained by conventional scoring functions (Dock-Score, PMF-Score, Gold-Score, ChemScore, and FlexX-Score). The IFS lead to significantly higher enrichment rates, relative to the competing scoring functions. Though using a target-biased scoring approach, the results were not biased toward the chemical classes of the reference structures. Our results indicate that the presented approach has the potential to serve as a general setup for successful structure-based GPCR virtual screening. 相似文献

13.

Counting clusters using R-NN curves

Guha R Dutta D Wild DJ Chen T 《Journal of chemical information and modeling》2007,47(4):1308-1318

相似文献

14.

On pattern matching of X-ray powder diffraction data

Ivanisevic I Bugay DE Bates S 《The journal of physical chemistry. B》2005,109(16):7781-7787

We introduce a novel pattern matching algorithm optimized for X-ray powder diffraction (XRPD) data and useful for data from other types of analytical techniques (e.g., Raman, IR). The algorithm is based on hierarchical clustering with a similarity metric that compares peak positions using the full peak profile. It includes heuristics developed from years of experience manually matching XRPD data, and preprocessing algorithms that reduce the effects of common problems associated with XRPD (e.g., preferred orientation and poor particle statistics). This algorithm can find immediate application in automated polymorph screening and salt selection, common tasks in the development of pharmaceuticals. 相似文献

15.

Development of purely structure-based pharmacophores for the topoisomerase I-DNA-ligand binding pocket

Malgorzata N. Drwal Keli Agama Yves Pommier Renate Griffith 《Journal of computer-aided molecular design》2013,27(12):1037-1049

Purely structure-based pharmacophores (SBPs) are an alternative method to ligand-based approaches and have the advantage of describing the entire interaction capability of a binding pocket. Here, we present the development of SBPs for topoisomerase I, an anticancer target with an unusual ligand binding pocket consisting of protein and DNA atoms. Different approaches to cluster and select pharmacophore features are investigated, including hierarchical clustering and energy calculations. In addition, the performance of SBPs is evaluated retrospectively and compared to the performance of ligand- and complex-based pharmacophores. SBPs emerge as a valid method in virtual screening and a complementary approach to ligand-focussed methods. The study further reveals that the choice of pharmacophore feature clustering and selection methods has a large impact on the virtual screening hit lists. A prospective application of the SBPs in virtual screening reveals that they can be used successfully to identify novel topoisomerase inhibitors. 相似文献

16.

Optimization of high throughput virtual screening by combining shape-matching and docking methods

Lee HS Choi J Kufareva I Abagyan R Filikov A Yang Y Yoon S 《Journal of chemical information and modeling》2008,48(3):489-497

Receptor flexibility is a critical issue in structure-based virtual screening methods. Although a multiple-receptor conformation docking is an efficient way to account for receptor flexibility, it is still too slow for large molecular libraries. It was reported that a fast ligand-centric, shape-based virtual screening was more consistent for hit enrichment than a typical single-receptor conformation docking. Thus, we designed a "distributed docking" method that improves virtual high throughput screening by combining a shape-matching method with a multiple-receptor conformation docking. Database compounds are classified in advance based on shape similarities to one of the crystal ligands complexed with the target protein. This classification enables us to pick the appropriate receptor conformation for a single-receptor conformation docking of a given compound, thereby avoiding time-consuming multiple docking. In particular, this approach utilizes cross-docking scores of known ligands to all available receptor structures in order to optimize the algorithm. The present virtual screening method was tested for reidentification of known PPARgamma and p38 MAP kinase active compounds. We demonstrate that this method improves the enrichment while maintaining the computation speed of a typical single-receptor conformation docking. 相似文献

17.

Improving the accuracy of ultrafast ligand-based screening: incorporating lipophilicity into ElectroShape as an extra dimension

Armstrong MS Finn PW Morris GM Richards WG 《Journal of computer-aided molecular design》2011,25(8):785-790

相似文献

18.

ElectroShape: fast molecular similarity calculations incorporating shape,chirality and electrostatics

M. Stuart Armstrong Garrett M. Morris Paul W. Finn Raman Sharma Loris Moretti Richard I. Cooper W. Graham Richards 《Journal of computer-aided molecular design》2010,24(9):789-801

We present ElectroShape, a novel ligand-based virtual screening method, that combines shape and electrostatic information into a single, unified framework. Building on the ultra-fast shape recognition (USR) approach for fast non-superpositional shape-based virtual screening, it extends the method by representing partial charge information as a fourth dimension. It also incorporates the chiral shape recognition (CSR) method, which distinguishes enantiomers. It has been validated using release 2 of the Directory of useful decoys (DUD), and shows a near doubling in enrichment ratio at 1% over USR and CSR, and improvements as measured by Receiver Operating Characteristic curves. These improvements persisted even after taking into account the chemotype redundancy in the sets of active ligands in DUD. During the course of its development, ElectroShape revealed a difference in the charge allocation of the DUD ligand and decoy sets, leading to several new versions of DUD being generated as a result. ElectroShape provides a significant addition to the family of ultra-fast ligand-based virtual screening methods, and its higher-dimensional shape recognition approach has great potential for extension and generalisation. 相似文献

19.

Drug–target interaction prediction by integrating multiview network data

《Computational Biology and Chemistry》2017

Drug–target interaction (DTI) prediction is a challenging step in further drug repositioning, drug discovery and drug design. The advent of high-throughput technologies brings convenience to the development of DTI prediction methods. With the generation of a high number of data sets, many mathematical models and computational algorithms have been developed to identify the potential drug–target pairs. However, most existing methods are proposed based on the single view data. By integrating the drug and target data from different views, we aim to get more stable and accurate prediction results.In this paper, a multiview DTI prediction method based on clustering is proposed. We first introduce a model for single view drug–target data. The model is formulated as an optimization problem, which aims to identify the clusters in both drug similarity network and target protein similarity network, and at the same time make the clusters with more known DTIs be connected together. Then the model is extended to multiview network data by maximizing the consistency of the clusters in each view. An approximation method is proposed to solve the optimization problem. We apply the proposed algorithms to two views of data. Comparisons with some existing algorithms show that the multiview DTI prediction algorithm can produce more accurate predictions. For the considered data set, we finally predict 54 possible DTIs. From the similarity analysis of the drugs/targets, enrichment analysis of DTIs and genes in each cluster, it is shown that the predicted DTIs have a high possibility to be true. 相似文献

20.

Consensus scoring criteria for improving enrichment in virtual screening

Yang JM Chen YF Shen TW Kristal BS Hsu DF 《Journal of chemical information and modeling》2005,45(4):1134-1146

MOTIVATION: Virtual screening of molecular compound libraries is a potentially powerful and inexpensive method for the discovery of novel lead compounds for drug development. The major weakness of virtual screening-the inability to consistently identify true positives (leads)-is likely due to our incomplete understanding of the chemistry involved in ligand binding and the subsequently imprecise scoring algorithms. It has been demonstrated that combining multiple scoring functions (consensus scoring) improves the enrichment of true positives. Previous efforts at consensus scoring have largely focused on empirical results, but they have yet to provide a theoretical analysis that gives insight into real features of combinations and data fusion for virtual screening. RESULTS: We demonstrate that combining multiple scoring functions improves the enrichment of true positives only if (a) each of the individual scoring functions has relatively high performance and (b) the individual scoring functions are distinctive. Notably, these two prediction variables are previously established criteria for the performance of data fusion approaches using either rank or score combinations. This work, thus, establishes a potential theoretical basis for the probable success of data fusion approaches to improve yields in in silico screening experiments. Furthermore, it is similarly established that the second criterion (b) can, in at least some cases, be functionally defined as the area between the rank versus score plots generated by the two (or more) algorithms. Because rank-score plots are independent of the performance of the individual scoring function, this establishes a second theoretically defined approach to determining the likely success of combining data from different predictive algorithms. This approach is, thus, useful in practical settings in the virtual screening process when the performance of at least two individual scoring functions (such as in criterion a) can be estimated as having a high likelihood of having high performance, even if no training sets are available. We provide initial validation of this theoretical approach using data from five scoring systems with two evolutionary docking algorithms on four targets, thymidine kinase, human dihydrofolate reductase, and estrogen receptors of antagonists and agonists. Our procedure is computationally efficient, able to adapt to different situations, and scalable to a large number of compounds as well as to a greater number of combinations. Results of the experiment show a fairly significant improvement (vs single algorithms) in several measures of scoring quality, specifically "goodness-of-hit" scores, false positive rates, and "enrichment". This approach (available online at http://gemdock.life. nctu.edu.tw/dock/download.php) has practical utility for cases where the basic tools are known or believed to be generally applicable, but where specific training sets are absent. 相似文献