首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
An interactive computer system has been designed to handle all the data associated with the National Cancer Institute's (NCI) drug screening program. The system resides on the NIH DEC System 10 computers and allows interactive access to the entire NCI screening data system. This contains over 20 separate databases, including a chemistry file of about 400,000 structures and a biology file of approximately 1.5 million test records. New compounds and test data are added daily to the files, and the system also controls and records all the daily operations of the screening program, such as acquisition, shipping, and biological testing of chemicals.  相似文献   

2.
Eight large chemical databases have been analyzed and compared to each other. Central to this comparison is the open National Cancer Institute (NCI) database, consisting of approximately 250 000 structures. The other databases analyzed are the Available Chemicals Directory ("ACD," from MDL, release 1.99, 3D-version); the ChemACX ("ACX," from CamSoft, Version 4.5); the Maybridge Catalog and the Asinex database (both as distributed by CamSoft as part of ChemInfo 4.5); the Sigma-Aldrich Catalog (CD-ROM, 1999 Version); the World Drug Index ("WDI," Derwent, version 1999.03); and the organic part of the Cambridge Crystallographic Database ("CSD," from Cambridge Crystallographic Data Center, 1999 Version 5.18). The database properties analyzed are internal duplication rates; compounds unique to each database; cumulative occurrence of compounds in an increasing number of databases; overlap of identical compounds between two databases; similarity overlap; diversity; and others. The crystallographic database CSD and the WDI show somewhat less overlap with the other databases than those with each other. In particular the collections of commercial compounds and compilations of vendor catalogs have a substantial degree of overlap among each other. Still, no database is completely a subset of any other, and each appears to have its own niche and thus "raison d'être". The NCI database has by far the highest number of compounds that are unique to it. Approximately 200 000 of the NCI structures were not found in any of the other analyzed databases.  相似文献   

3.
The NCI Drug Information System (DIS) is a collection of 24 interactively searchable databases which contain all the data associated with NCI's drug screening program. Data flow into all of these databases upon a daily basis, and maintenance procedures have been developed which provide a high degree of currency to the files. An extensive security system controls both write access and read access to the DIS and matches both to the authorization possessed by each specific user. Detailed usage statistics are collected automatically. The cost of the overall system in terms of both manpower and machine time is discussed briefly.  相似文献   

4.
Shape Signatures, a new 3-dimensional molecular comparison method, has been adapted to rank ligands of the serotonin receptors. A set of 825 agonists and 400 antagonists together with approximately 10,000 randomly chosen compounds from the NCI database were used in this study. Both 1D and 2D Shape Signature databases were created, and enrichment studies were carried out. Results from these studies reveal that the 1D Shape Signature approach is highly efficient in separating agonists from a mixture of molecules which includes compounds randomly selected from the NCI database taken as inactives. It is also equally effective at separating agonists and antagonists from a pool of active ligands for the serotonin receptor. Parallel enrichment studies using 2D shape signatures showed high selectivity with more restricted coverage due to the high specificity of 2D signatures. The influence of conformational variation of the shape signature on enrichment was explored by docking a subset of ligands into the crystal structure of serotonin N-acetyltransferase. Enrichment studies on the resulting "docked" conformations produced only slightly improved results compared with the CORINA-generated conformations.  相似文献   

5.
The NCI drug screening program tests over 10,000 chemicals per year for activity against cancer. The associated Drug Information System (DIS) captures all the raw testing data and provides for its validation. The large quantity of numeric data gathered during testing is maintained within the DIS in a database that is interactively searchable and automatically updated at regular intervals.  相似文献   

6.
7.
We have used the Chemical Structure DataBase (CSDB) of the NCI CADD Group, an aggregated collection of over 150 small-molecule databases totaling 103.5 million structure records, to conduct tautomerism analyses on one of the largest currently existing sets of real (i.e. not computer-generated) compounds. This analysis was carried out using calculable chemical structure identifiers developed by the NCI CADD Group, based on hash codes available in the chemoinformatics toolkit CACTVS and a newly developed scoring scheme to define a canonical tautomer for any encountered structure. CACTVS’s tautomerism definition, a set of 21 transform rules expressed in SMIRKS line notation, was used, which takes a comprehensive stance as to the possible types of tautomeric interconversion included. Tautomerism was found to be possible for more than 2/3 of the unique structures in the CSDB. A total of 680 million tautomers were calculated from, and including, the original structure records. Tautomerism overlap within the same individual database (i.e. at least one other entry was present that was really only a different tautomeric representation of the same compound) was found at an average rate of 0.3% of the original structure records, with values as high as nearly 2% for some of the databases in CSDB. Projected onto the set of unique structures (by FICuS identifier), this still occurred in about 1.5% of the cases. Tautomeric overlap across all constituent databases in CSDB was found for nearly 10% of the records in the collection.  相似文献   

8.
()New data, tools and services recently made available on the web server (http://cactus.nci.nih.gov) of the Computer-Aided Drug Design (CADD) Group, NCI, NIH, developed in the context of chemoinformatics and drug development work, are presented. These tools are designed for searching for structures in very large databases of small molecules. One of them is a web service-the Chemical Structure Look-up Service (CSLS)-for very rapid structure look-up in an aggregated collection of more than 80 databases comprising more than 27 million unique structures at the time of this writing. CSLS contains pointers to the entries in toxicology-related databases, catalogues of commercially available samples, drugs, assay results data sets, and databases in several other categories. CSLS allows the user to find out very rapidly in which one(s) of all these databases a given structure occurs independent of the representation of the input structure, by making use of InChIs as well as new CACTVS hashcode-based identifiers. These latter, calculable, identifiers are designed to take into account tautomerism, different resonance structures drawn for charged species, and presence of additional fragments. They make possible fine-tunable yet rapid compound identification and database overlap analyses in very large compound collections.  相似文献   

9.
One of the hallmarks of Parkinson’s disease (PD), a long-term neurodegenerative syndrome, is the accumulation of alpha-synuclein (α-syn) fibrils. Despite numerous studies and efforts, inhibition of α-syn protein aggregation is still a challenge. To overcome this issue, we propose an in silico pharmacophore-based repositioning strategy, to find a pharmaceutical drug that, in addition to their defined role, can be used to prevent aggregation of the α-syn protein. Ligand-based pharmacophore modeling was developed and the best model was selected with validation parameters including 72 % sensitivity, 98 % specificity and goodness score about 0.7. The optimal model has three groups of hydrogen bond donor (HBD), three groups of hydrogen bond acceptor (HBA), and two aromatic rings (AR). The FDA-Approved reports in the ZINC15 database were screened with the pharmacophore model taken from inhibitor compounds. The model identified 22 hits, as promising candidate drugs for Parkinson's therapy. It is noteworthy that among these, 10 drugs have been reported to inhibition of α-syn aggregation or treat/reduce Parkinson's pathogenesis. This model was used to virtual screen ZINC, NCI databases, and natural products from the pomegranate. The results of this screen were filtered for their inability to cross the blood-brain barrier, poor oral bioavailability, etc. Finally, the selected compounds of two ZINC and NCI databases were combined and structurally clustered. Remained compounds were clustered in 28 different clusters, and the 17 compounds were introduced as final candidates.  相似文献   

10.
Web-based tools for mining the NCI databases for anticancer drug discovery   总被引:1,自引:0,他引:1  
In this paper, we describe the development of a set of integrated Web-based tools for mining the National Cancer Institute's (NCI) anticancer databases for anticancer drug discovery. For data mining, three different correlation algorithms were implemented, which included the commonly used Pearson's correlation algorithm available from the NCI's COMPARE program, the Spearman's and Kendall's correlation algorithms. In addition, we implemented the p-value test to evaluate the significance of the correlation results. These Web-based data mining tools allow robust analysis of the correlation between the in vitro anticancer activity of the drugs in the NCI anticancer database, the protein levels and mRNA levels of molecular targets (genes) in the NCI 60 human cancer cell lines for identification of potential lead compounds for a specific molecular target and for study of the molecular mechanism action of a drug. Examples were provided to identify PKC ligands using a lead compound and to identify potential ErbB-2 inhibitors using the mRNA levels of ErbB-2 in the NCI 60 tumor cell lines.  相似文献   

11.
Virtual screening of large libraries of organic compounds combined with pharmacological high throughput screening is widely used for drug discovery in the pharmaceutical industry. Our aim was to explore the efficiency of using a biased 3D database comprising secondary metabolites from antiinflammatory medicinal plants as a source for the virtual screening. For this study pharmacophore models of cyclooxygenase I and II (COX-1, COX-2), key enzymes in the inflammation process, were generated with structure-based as well as common feature based modeling, resulting in three COX hypotheses. Four different multiconfomational 3D databases limited in molecular weight between 300 and 700 Da were applied to the screening in order to compare and analyze the obtained hit rates. Two of them were created in-house (DIOS, NPD). The database DIOS consists of 2752 compounds from phytochemical reports of antiinflammatory medicinal plants described by the ethnopharmacological source 'De material medica' of Pedanius Dioscorides, whereas NPD contains almost 80,000 compounds gathered arbitrarily from natural sources. In addition, two available multiconformational 3D libraries comprising marketed and development drug substances (DWI and NCI), mainly originating from synthesis, were used for comparison. As a test of the pharmacophore models' capability in natural sources, the models were used to search for known COX inhibitory natural products. This was achieved with some exceptions, which are discussed in the paper. Depending on the hypothesis used, DWI and NCI library searches produced hit rates in the range of 6.6% to 13.7%. A slight increase of the number of molecules assessed for binding was achieved with the database of natural products (NPD). Using the biased 3D database DIOS, however, the average increase of efficiency reached 77% to 133% compared to the hit rates resulting from WDI and NCI. The statistical benefit of a combination of an ethnopharmacological approach with the potential of computer aided drug discovery by in silico screening was demonstrated exemplified on the applied targets COX-1 and COX-2.  相似文献   

12.
May PM  Muray K 《Talanta》1991,38(12):1419-1426
The thermodynamic database of the JESS (Joint Expert Speciation System) software package is described. It overcomes many existing problems associated with solution-chemistry databases. The system is fully interactive. Reactions can be expressed in any form. Any number of equilibrium constants, enthalpy, entropy and Gibbs-free energy values can be associated with a reaction. Supplementary data such as background electrolyte, temperature, ionic strength, method of determination and original literature reference are also stored. Data can be readily transferred between databases. Currently, the thermodynamic database that is being distributed with JESS contains over 12,000 reactions and over 20,000 equilibrium constants. These data span interactions in aqueous solution of some 100 metal ions with more than 650 ligands.  相似文献   

13.
A multivariate insight into the in vitro antitumour screen database of the NCI by means of the SIMCA package allows to propose hypotheses on the mechanism of action of novel anticancer compounds. As an example, the application of multivariate analysis to the NCI standard database provided clues to the classification of drugs whose mechanism is either unknown or controversial. Moreover, the influence of intrinsic biochemical cell line properties (molecular targets) on the sensitivity to drug treatment could be evaluated simultaneously for classes of compounds which act by the same mechanism. Interestingly, the present approach can also provide a correlation between the molecular targets and the therapeutical fingerprint of novel active compounds thus suggesting specific biochemical studies for the investigation of new mechanisms of drug action and resistance. The statistical approach reported here represents a valuable tool for handling theenormous data sets deriving from recent genome-wide investigations of gene expression in the NCI cell lines.  相似文献   

14.
We have systematically enumerated graph representations of scaffold topologies for up to eight-ring molecules and four-valence atoms, thus providing coverage of the lower portion of the chemical space of small molecules (Pollock et al. J. Chem. Inf. Model., this issue). Here, we examine scaffold topology distributions for several databases: ChemNavigator and PubChem for commercially available chemicals, the Dictionary of Natural Products, a set of 2742 launched drugs, WOMBAT, a database of medicinal chemistry compounds, and two subsets of PubChem, "actives" and DSSTox comprising toxic substances. We also examined a virtual database of exhaustively enumerated small organic molecules, GDB (Fink et al. Angew. Chem., Int. Ed. 2005, 44, 1504-1508), and we contrast the scaffold topology distribution from these collections to the complete coverage of up to eight-ring molecules. For reasons related, perhaps, to synthetic accessibility and complexity, scaffolds exhibiting six rings or more are poorly represented. Among all collections examined, PubChem has the greatest scaffold topological diversity, whereas GDB is the most limited. More than 50% of all entries (13 000 000+ actual and 13 000 000+ virtual compounds) exhibit only eight distinct topologies, one of which is the nonscaffold topology that represents all treelike structures. However, most of the topologies are represented by a single or very small number of examples. Within topologies, we found that three-way scaffold connections (3-nodes) are much more frequent compared to four-way (4-node) connections. Fused rings have a slightly higher frequency in biologically oriented databases. Scaffold topologies can be the first step toward an efficient coarse-grained classification scheme of the molecules found in chemical databases.  相似文献   

15.
Consideration of stereochemistry early in the identification and optimization of lead compounds can improve the efficiency and efficacy of the drug discovery process and reduce the time spent on subsequent drug development. These improvements can result by focusing on specific enantiomers that have the desired potential therapeutic effect (eutomers), while removing from consideration enantiomers that may have no, or even undesirable, effects (distomers). A virtual screening campaign that correctly takes stereochemical information into account can, in theory, be utilized to provide information about the relative binding affinities of enantiomers. Thus, the proper enumeration of the relevant stereoisomers in general, and enantiomeric pairs in particular, of chiral compounds is crucial if one is to use virtual screening as an effective drug discovery tool. As is obvious, in cases where no stereochemical information is provided for chiral compounds in a 2D chemical database, then each possible stereoisomer should be generated for construction of the subsequent 3D database to be used for virtual screening. However, acute problems can arise in 3D database construction when relative stereochemistry is encoded in a 2D database for a chiral compound containing multiple stereogenic atoms but absolute stereochemistry is not implied. In this case, we report that generation of enantiomeric pairs is imperative in database development if one is to obtain accurate docking results. A study is described on the impact of the neglect of enantiomeric pairs on virtual screening using the human homolog of murine double minute 2 (MDM2) protein, the product of a proto-oncogene, as the target. Docking in MDM2 with GLIDE 4.0 was performed using the NCI Diversity Set 3D database and, for comparison, a set of enantiomers we created corresponding to mirror image structures of the single enantiomers of chiral compounds present in the NCI Diversity Set. Our results demonstrate that potential lead candidates may be overlooked when databases contain 3D structures representing only a single enantiomer of racemic chiral compounds.  相似文献   

16.
In the last two decades, the volumes of chemical and biological data are constantly increasing. The problem of converting data sets into knowledge is both expensive and time-consuming, as a result a workflow technology with platforms such as KNIME, was built up to facilitate searching through multiple heterogeneous data sources and filtering for specific criteria then extracting hidden information from these large data. Before any QSAR modeling, a manual data curation is extremely recommended. However, this can be done, for small datasets, but for the extensive data accumulated recently in public databases a manual process of big data will be hardly feasible. In this work, we suggest using KNIME as an automated solution for workflow in data curation, development, and validation of predictive QSAR models from a huge dataset.In this study, we used 250250 structures from NCI database, only 3520 compounds could successfully pass through our workflow safely with their corresponding experimental log P, this property was investigated as a case study, to improve some existing log P calculation algorithms.  相似文献   

17.
18.
We have developed a method, given a database of molecules and associated activities, to identify molecular substructures that are associated with many different biological activities. These may be therapeutic areas (e.g. antihypertensive) and/or mechanism-based activities (e.g. renin inhibitor). This information helps us avoid chemical classes that are likely to have unanticipated side effects and also can suggest combinatorial libraries that might have activity on a variety of receptor targets. The method was applied to the USPDI and MDDR databases. There are clearly substructures in each database that occur in many compounds and span a variety of therapeutic categories. Some of these are expected, but some are not.  相似文献   

19.
20.
Microsequencing of proteins recovered from two-dimensional (2-D) gels is being used systematically to identify proteins in the master human keratinocyte 2-D gel database. To date, about 250 protein spots recorded in human 2-D gel databases have been microsequenced and, of these, 145 are recorded in the keratinocyte database under the entry partial amino acid sequence. Coomassie Brilliant Blue-stained protein spots cut from several (up to 40) dry gels were concentrated by elution-concentration gel electrophoresis, electroblotted onto PVDF membranes and digested in situ with trypsin. Eluting peptides were separated by reversed-phase HPLC, collected individually and sequenced. Computer search using the FASTA and TFASTA programs from Genetics Computer Group indicated that 110 of the microsequenced polypeptides shared significant similarity with proteins contained in the PIR, Mipsx or GenEMBL databases. Only 35 polypeptides corresponded to hitherto unknown proteins. Peptide sequences of all 145 proteins are listed together with their coordinates (apparent molecular weight and pI) in the keratinocyte database.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号