首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Efficient recognition of tautomeric compound forms in large corporate or commercially available compound databases is a difficult and labor intensive task. Our data indicate that up to 0.5% of commercially available compound collections for bioscreening contain tautomers. Though in the large registry databases, such as Beilstein and CAS, the tautomers are found in an automated fashion using high-performance computational technologies, their real-time recognition in the nonregistry corporate databases, as a rule, remains problematic. We have developed an effective algorithm for tautomer searching based on the proprietary chemoinformatics platform. This algorithm reduces the compound to a canonical structure. This feature enables rapid, automated computer searching of most of the known tautomeric transformations that occur in databases of organic compounds. Another useful extension of this methodology is related to the ability to effectively search for different forms of compounds that contain ionic and semipolar bonds. The computations are performed in the Windows environment on a standard personal computer, a very useful feature. The practical application of the proposed methodology is illustrated by several examples of successful recovery of tautomers and different forms of ionic compounds from real commercially available nonregistry databases.  相似文献   

2.
This review focuses on the possibilities and limits of nontarget screening of emerging contaminants, with emphasis on recent applications and developments in data evaluation and compound identification by liquid chromatography-high-resolution mass spectrometry (HRMS). The general workflow includes determination of the elemental composition from accurate mass, a further search for the molecular formula in compound libraries or general chemical databases, and a ranking of the proposed structures using further information, e.g., from mass spectrometry (MS) fragmentation and retention times. The success of nontarget screening is in some way limited to the preselection of relevant compounds from a large data set. Recently developed approaches show that statistical analysis in combination with suspect and nontarget screening are useful methods to preselect relevant compounds. Currently, the unequivocal identification of unknowns still requires information from an authentic standard which has to be measured or is already available in user-defined MS/MS reference databases or libraries containing HRMS spectral information and retention times. In this context, we discuss the advantages and future needs of publicly available MS and MS/MS reference databases and libraries which have mostly been created for the metabolomic field. A big step forward has been achieved with computer-based tools when no MS library or MS database entry is found for a compound. The numerous search results from a large chemical database can be condensed to only a few by in silico fragmentation. This has been demonstrated for selected compounds and metabolites in recent publications. Still, only very few compounds have been identified or tentatively identified in environmental samples by nontarget screening. The availability of comprehensive MS libraries with a focus on environmental contaminants would tremendously improve the situation.  相似文献   

3.
We have used the Chemical Structure DataBase (CSDB) of the NCI CADD Group, an aggregated collection of over 150 small-molecule databases totaling 103.5 million structure records, to conduct tautomerism analyses on one of the largest currently existing sets of real (i.e. not computer-generated) compounds. This analysis was carried out using calculable chemical structure identifiers developed by the NCI CADD Group, based on hash codes available in the chemoinformatics toolkit CACTVS and a newly developed scoring scheme to define a canonical tautomer for any encountered structure. CACTVS’s tautomerism definition, a set of 21 transform rules expressed in SMIRKS line notation, was used, which takes a comprehensive stance as to the possible types of tautomeric interconversion included. Tautomerism was found to be possible for more than 2/3 of the unique structures in the CSDB. A total of 680 million tautomers were calculated from, and including, the original structure records. Tautomerism overlap within the same individual database (i.e. at least one other entry was present that was really only a different tautomeric representation of the same compound) was found at an average rate of 0.3% of the original structure records, with values as high as nearly 2% for some of the databases in CSDB. Projected onto the set of unique structures (by FICuS identifier), this still occurred in about 1.5% of the cases. Tautomeric overlap across all constituent databases in CSDB was found for nearly 10% of the records in the collection.  相似文献   

4.
While many large publicly accessible databases provide excellent annotation for biological macromolecules, the same is not true for small chemical compounds. Commercial data sources also fail to encompass an annotation interface for large numbers of compounds and tend to be cost prohibitive to be widely available to biomedical researchers. Therefore, using annotation information for the selection of lead compounds from a modern day high-throughput screening (HTS) campaign presently occurs only under a very limited scale. The recent rapid expansion of the NIH PubChem database provides an opportunity to link existing biological databases with compound catalogs and provides relevant information that potentially could improve the information garnered from large-scale screening efforts. Using the 2.5 million compound collection at the Genomics Institute of the Novartis Research Foundation (GNF) as a model, we determined that approximately 4% of the library contained compounds with potential annotation in such databases as PubChem and the World Drug Index (WDI) as well as related databases such as the Kyoto Encyclopedia of Genes and Genomes (KEGG) and ChemIDplus. Furthermore, the exact structure match analysis showed 32% of GNF compounds can be linked to third party databases via PubChem. We also showed annotations such as MeSH (medical subject headings) terms can be applied to in-house HTS databases in identifying signature biological inhibition profiles of interest as well as expediting the assay validation process. The automated annotation of thousands of screening hits in batch is becoming feasible and has the potential to play an essential role in the hit-to-lead decision making process.  相似文献   

5.
Abstract

Molecular property diagnostic suite (MPDS) is a Galaxy-based open source drug discovery and development platform. MPDS web portals are designed for several diseases, such as tuberculosis, diabetes mellitus, and other metabolic disorders, specifically aimed to evaluate and estimate the drug-likeness of a given molecule. MPDS consists of three modules, namely data libraries, data processing, and data analysis tools which are configured and interconnected to assist drug discovery for specific diseases. The data library module encompasses vast information on chemical space, wherein the MPDS compound library comprises 110.31 million unique molecules generated from public domain databases. Every molecule is assigned with a unique ID and card, which provides complete information for the molecule. Some of the modules in the MPDS are specific to the diseases, while others are non-specific. Importantly, a suitably altered protocol can be effectively generated for another disease-specific MPDS web portal by modifying some of the modules. Thus, the MPDS suite of web portals shows great promise to emerge as disease-specific portals of great value, integrating chemoinformatics, bioinformatics, molecular modelling, and structure- and analogue-based drug discovery approaches.  相似文献   

6.
Dendrimers and dendrons offer an excellent platform for developing novel drug delivery systems and medicines. The rational design and further development of these repetitively branched systems are restricted by difficulties in scalable synthesis and structural determination, which can be overcome by judicious use of molecular modelling and molecular simulations. A major difficulty to utilise in silico studies to design dendrimers lies in the laborious generation of their structures. Current modelling tools utilise automated assembly of simpler dendrimers or the inefficient manual assembly of monomer precursors to generate more complicated dendrimer structures. Herein we describe two novel graphical user interface toolkits written in Python that provide an improved degree of automation for rapid assembly of dendrimers and generation of their 2D and 3D structures. Our first toolkit uses the RDkit library, SMILES nomenclature of monomers and SMARTS reaction nomenclature to generate SMILES and mol files of dendrimers without 3D coordinates. These files are used for simple graphical representations and storing their structures in databases. The second toolkit assembles complex topology dendrimers from monomers to construct 3D dendrimer structures to be used as starting points for simulation using existing and widely available software and force fields. Both tools were validated for ease-of-use to prototype dendrimer structure and the second toolkit was especially relevant for dendrimers of high complexity and size.  相似文献   

7.
Protein-protein interactions are fundamental in mediating biological processes including metabolism, cell growth, and signaling. To be able to selectively inhibit or induce protein activity or complex formation is a key feature in controlling disease. For those situations in which protein-protein interactions derive substantial affinity from short linear peptide sequences, or motifs, we can develop search algorithms for peptidomimetic compounds that resemble the short peptide's structure but are not compromised by poor pharmacological properties. SAAMCO is a Web service ( http://bioware.ucd.ie/ approximately saamco) that facilitates the screening of motifs with known structures against bioactive compound databases. It is built on an algorithm that defines compound similarity based on the presence of appropriate amino acid side chain fragments and a favorable Root Mean Squared Deviation (RMSD) between compound and motif structure. The methodology is efficient as the available compound databases are preprocessed and fast regular expression searches filter potential matches before time-intensive 3D superposition is performed. The required input information is minimal, and the compound databases have been selected to maximize the availability of information on biological activity. "Hits" are accompanied with a visualization window and links to source database entries. Motif matching can be defined on partial or full similarity which will increase or reduce respectively the number of potential mimetic compounds. The Web server provides the functionality for rapid screening of known or putative interaction motifs against prepared compound libraries using a novel search algorithm. The tabulated results can be analyzed by linking to appropriate databases and by visualization.  相似文献   

8.
9.
Due to the huge amount of data generated in drug discovery programs, their success strongly depends on both the workflows and platforms to manage and, more importantly, to integrate different chemical and biological data sources. At Experimental Therapeutics Program in the Spanish National Cancer Research Center (CNIO), we have addressed our efforts in the design and optimal implementation of those key processes that enable dynamic workflows and interfaces between the different information blocks. Our approach focuses on the development of a common chemical and biological repository (CCBR) that gathers all data that pass quality control criteria. An integral web application (WACBIP) was designed to query against CCBR while providing decision making tools. Currently, our CCBR contains more than 43,000 unique structures as well as experimental data from more than 350 different biological assays. As input sources of the CCBR, we federated a series of Laboratory Information Management Systems (LIMS) which cover sections as follows: chemical synthesis, analytical department, compound logistics, biochemical and cellular data (including high-throughput and high-content screenings; HTS and HCS), computational chemistry (in-silico chemogenomics and physico-chemical profiling) and in-vivo pharmacology. With regard to the last section, an integral In-Vivo Management e-Biobook (IVMB) that handles the entire workflow of in-vivo labs was designed and implemented. Herein we describe the processes and tools that we have developed and implemented, balancing purchase and development, for centralizing discovery information as well as providing decision-making and project management tools - a clear unmet need in public organizations and networks.  相似文献   

10.
11.
Asparagine linked glycans (N-glycans) are important in biological processes. Yet, their structural complexity and lack of databases hinder progress in glycomics and glycobiology. We present a way for in silico generation of very large N-glycan structure databases and their use in high throughput composition and primary structure determination of N-glycans attached to peptides, based on CID (collision induced dissociation) MS/MS (tandem mass spectrometric data). The database and the integrated search engine is called Glyquest and is available to the glycomics community.  相似文献   

12.
13.
14.
A central problem in structure-based drug design is understanding protein-ligand interactions quantitatively and qualitatively. Several recent studies have highlighted from a qualitative perspective the nature of these interactions and their utility in drug discovery. However, a common limitation is a lack of adequate tools to mine these interactions comprehensively, since exhaustive searches of the protein data bank are time-consuming and difficult to perform. Consequently, fundamental questions remain unanswered: How unique or how common are the protein-ligand interactions observed in a given drug design project when compared to all complexed structures in the protein data bank? Which interaction patterns might explain the affinity of a tool compound toward unwanted targets? To answer these questions and to enable the systematic and comprehensive study of protein-ligand interactions, we introduce PROLIX (Protein Ligand Interaction Explorer), a tool that uses sophisticated fingerprint representations of protein-ligand interaction patterns for rapid data mining in large crystal structure databases. Our implementation strategy pursues a branch-and-bound technique that enables mining against thousands of complexes within a few seconds. Key elements of PROLIX include (i) an intuitive interface that enables users to formulate complex queries easily, (ii) exceptional speed for results retrieval, and (iii) a sophisticated results summarization. Herein we describe the algorithms developed to enable complex queries and fast retrieval of search results, as well as the intuitive aspects of the user interface and summarization viewer.  相似文献   

15.
Analogue series play a key role in drug discovery. They arise naturally in lead optimization efforts where analogues are explored based on one or a few core structures. However, it is much harder to accurately identify and extract pairs or series of analogue molecules in large compound databases with no predefined core structures. This methodological review outlines the most common and recent methodological developments to automatically identify analogue series in large libraries. Initial approaches focused on using predefined rules to extract scaffold structures, such as the popular Bemis–Murcko scaffold. Later on, the matched molecular pair concept led to efficient algorithms to identify similar compounds sharing a common core structure by exploring many putative scaffolds for each compound. Further developments of these ideas yielded, on the one hand, approaches for hierarchical scaffold decomposition and, on the other hand, algorithms for the extraction of analogue series based on single-site modifications (so-called matched molecular series) by exploring potential scaffold structures based on systematic molecule fragmentation. Eventually, further development of these approaches resulted in methods for extracting analogue series defined by a single core structure with several substitution sites that allow convenient representations, such as R-group tables. These methods enable the efficient analysis of large data sets with hundreds of thousands or even millions of compounds and have spawned many related methodological developments.  相似文献   

16.
17.
A recurrent problem in organic chemistry is the generation of new molecular structures that conform to some predetermined set of structural constraints that are imposed in an endeavor to build certain required properties into the newly generated structure. An example of this is the pharmacophore model, used in medicinal chemistry to guide de novo design or selection of suitable structures from compound databases. We propose here a method that efficiently links up a selected number of required atom positions while at the same time directing the emergent molecular skeleton to avoid forbidden positions. The linkage process takes place on a lattice whose unit step length and overall geometry is designed to match typical architectures of organic molecules. We use an optimization method to select from the many different graphs possible. The approach is demonstrated in an example where crystal structures of the same (in this case rigid) ligand complexed with different proteins are available.  相似文献   

18.
PreSSAPro is a software, available to the scientific community as a free web service designed to provide predictions of secondary structures starting from the amino acid sequence of a given protein. Predictions are based on our recently published work on the amino acid propensities for secondary structures in either large but not homogeneous protein data sets, as well as in smaller but homogeneous data sets corresponding to protein structural classes, i.e. all-alpha, all-beta, or alpha–beta proteins. Predictions result improved by the use of propensities evaluated for the right protein class. PreSSAPro predicts the secondary structure according to the right protein class, if known, or gives a multiple prediction with reference to the different structural classes. The comparison of these predictions represents a novel tool to evaluate what sequence regions can assume different secondary structures depending on the structural class assignment, in the perspective of identifying proteins able to fold in different conformations. The service is available at the URL http://bioinformatica.isa.cnr.it/PRESSAPRO/.  相似文献   

19.
Modeling off-target effects is one major goal of chemical biology, particularly in its applications to drug discovery. Here, we describe a new approach that allows the extraction of structure-activity relationships from large chemogenomic spaces starting from a single chemical structure. Several public source databases, offering a vast amount of data on structure and activity for a large number of different targets, have been investigated for their usefulness in automated structure-activity relationships (SAR) extraction. SAR tables were constructed by assembling similar structures around each query structure that have an activity record for a particular target. Quantitative series enrichment analysis (QSEA) was applied to these SAR tables to identify trends and to transform these trends into topomer CoMFA models. Overall more than 1700 SAR tables with topomer CoMFA models have been obtained from the ChEMBL, PubChem, and ChemBank databases. These models were able to highlight the structural trends associated with various off-target effects of marketed drugs, including cases where other structural similarity metrics would not have detected an off-target effect. These results indicate the usefulness of the QSEA approach, particularly whenever applicable with public databases, in providing a new means, beyond a simple similarity between ligand structures, to capture SAR trends and thereby contribute to success in drug discovery.  相似文献   

20.
PlantPIs is a web querying system for a database collection of plant protease inhibitors data. Protease inhibitors in plants are naturally occurring proteins that inhibit the function of endogenous and exogenous proteases. In this paper the design and development of a web framework providing a clear and very flexible way of querying plant protease inhibitors data is reported. The web resource is based on a relational database, containing data of plants protease inhibitors publicly accessible, and a graphical user interface providing all the necessary browsing tools, including a data exporting function. PlantPIs contains information extracted principally from MEROPS database, filtered, annotated and compared with data stored in other protein and gene public databases, using both automated techniques and domain expert evaluations. The data are organized to allow a flexible and easy way to access stored information. The database is accessible at http://www.plantpis.ba.itb.cnr.it/.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号