首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 156 毫秒
1.
Web-based tools for mining the NCI databases for anticancer drug discovery   总被引:1,自引:0,他引:1  
In this paper, we describe the development of a set of integrated Web-based tools for mining the National Cancer Institute's (NCI) anticancer databases for anticancer drug discovery. For data mining, three different correlation algorithms were implemented, which included the commonly used Pearson's correlation algorithm available from the NCI's COMPARE program, the Spearman's and Kendall's correlation algorithms. In addition, we implemented the p-value test to evaluate the significance of the correlation results. These Web-based data mining tools allow robust analysis of the correlation between the in vitro anticancer activity of the drugs in the NCI anticancer database, the protein levels and mRNA levels of molecular targets (genes) in the NCI 60 human cancer cell lines for identification of potential lead compounds for a specific molecular target and for study of the molecular mechanism action of a drug. Examples were provided to identify PKC ligands using a lead compound and to identify potential ErbB-2 inhibitors using the mRNA levels of ErbB-2 in the NCI 60 tumor cell lines.  相似文献   

2.
Natural products,as major resources for drug discovery historically,are gaining more attentions recently due to the advancement in genomic sequencing and other technologies,which makes them attractive and amenable to drug candidate screening.Collecting and mining the bioactivity information of natural products are extremely important for accelerating drug development process by reducing cost.Lately,a number of publicly accessible databases have been established to facilitate the access to the chemical biology data for small molecules including natural products.Thus,it is imperative for scientists in related fields to exploit these resources in order to expedite their researches on natural products as drug leads/candidates for disease treatment.PubChem,as a public database,contains large amounts of natural products associated with bioactivity data.In this review,we introduce the information system provided at PubChem,and systematically describe the applications for a set of PubChem web services for rapid data retrieval,analysis,and downloading of natural products.We hope this work can serve as a starting point for the researchers to perform data mining on natural products using PubChem.  相似文献   

3.
High-throughput screening (HTS) campaigns in pharmaceutical companies have accumulated a large amount of data for several million compounds over a couple of hundred assays. Despite the general awareness that rich information is hidden inside the vast amount of data, little has been reported for a systematic data mining method that can reliably extract relevant knowledge of interest for chemists and biologists. We developed a data mining approach based on an algorithm called ontology-based pattern identification (OPI) and applied it to our in-house HTS database. We identified nearly 1500 scaffold families with statistically significant structure-HTS activity profile relationships. Among them, dozens of scaffolds were characterized as leading to artifactual results stemming from the screening technology employed, such as assay format and/or readout. Four types of compound scaffolds can be characterized based on this data mining effort: tumor cytotoxic, general toxic, potential reporter gene assay artifact, and target family specific. The OPI-based data mining approach can reliably identify compounds that are not only structurally similar but also share statistically significant biological activity profiles. Statistical tests such as Kruskal-Wallis test and analysis of variance (ANOVA) can then be applied to the discovered scaffolds for effective assignment of relevant biological information. The scaffolds identified by our HTS data mining efforts are an invaluable resource for designing SAR-robust diversity libraries, generating in silico biological annotations of compounds on a scaffold basis, and providing novel target family specific scaffolds for focused compound library design.  相似文献   

4.
The tremendous increase of chemical data sets, both in size and number, and the simultaneous desire to speed up the drug discovery process has resulted in an increasing need for a new generation of computational tools that assist in the extraction of information from data and allow for rapid and in-depth data mining. During recent years, visual data mining has become an important tool within the life sciences and drug discovery area with the potential to help avoiding data analysis from turning into a bottleneck. In this paper, we present InfVis, a platform-independent visual data mining tool for chemists, who usually only have little experience with classical data mining tools, for the visualization, exploration, and analysis of multivariate data sets. InfVis represents multidimensional data sets by using intuitive 3D glyph information visualization techniques. Interactive and dynamic tools such as dynamic query devices allow real-time, interactive data set manipulations and support the user in the identification of relationships and patterns. InfVis has been implemented in Java and Java3D and can be run on a broad range of platforms and operating systems. It can also be embedded as an applet in Web-based interfaces. We will present in this paper examples detailing the analysis of a reaction database that demonstrate how InfVis assists chemists in identifying and extracting hidden information.  相似文献   

5.
The UTAB Database contains information concerned with the uptake/accumulation, translocation, adhesion, and biotransformation of both xenobiotic organic chemicals and heavy metals by vascular plants. UTAB can be used to estimate the accumulation of chemicals in vegetation and their subsequent movement through the food chain. The database contains actual data from papers in the published literature dating from 1926 for organic chemicals and from 1976 for heavy metals. At present the database is comprised of more than 37,000 records pertaining to 900 different organic chemicals, 21 heavy metals, and over 350 plant species. Each record contains information on a single combination of species, chemical, and dose. Other information includes the application and destination sites, amount accumulated, rates of uptake or translocation, products and sites of biotransformation, experimental condition parameters, and the source paper. Thus, the database can be used to quickly obtain specific data pertaining to a chemical, plant species, mine spoil, etc. or it can be used for the comparative analysis of a set of data pertaining to groups of chemicals and plants.  相似文献   

6.
The rapid expansion of structural information for protein-ligand binding sites is potentially an important source of information in structure-based drug design and in understanding ligand cross reactivity and toxicity. We have developed a large database of ligand binding sites extracted automatically from the Protein Data Bank. This has been combined with a method for calculating binding site similarity based on geometric hashing to create a relational database for the retrieval of site similarity and binding site superposition. It contains an all-against-all comparison of binding sites and holds known protein-ligand binding sites, which are made accessible to data mining. Here we demonstrate its utility in two structure-based applications: in determining site similarity and in aiding the derivation of a receptor-based pharmacophore model. The database is available from http://www.bioinformatics.leeds.ac.uk/sb/.  相似文献   

7.
讨论了系统设计和系统评价。首先,根据ICP-AES中有关信息在计算机中的流动、 转换、储存和处理情况设计了计算机流程图,然后根据数据库规范化的要求对数据库的概念 结构、逻辑结构和物理结构的设计作了讨论。该数据库包括了28000余条ICP发射谱线的有 关数据。运行结果表明,该系统设计合理,信息量大,且具有方便、实用的用户界面。  相似文献   

8.
Using data mining techniques, we have studied a subset (1400) of compounds from the large public National Cancer Institute (NCI) compounds data repository. We first carried out a functional class identity assignment for the 60 NCI cancer testing cell lines via hierarchical clustering of gene expression data. Comprised of nine clinical tissue types, the 60 cell lines were placed into six classes-melanoma, leukemia, renal, lung, and colorectal, and the sixth class was comprised of mixed tissue cell lines not found in any of the other five classes. We then carried out supervised machine learning, using the GI(50) values tested on a panel of 60 NCI cancer cell lines. For separate 3-class and 2-class problem clustering, we successfully carried out clear cell line class separation at high stringency, p < 0.01 (Bonferroni corrected t-statistic), using feature reduction clustering algorithms embedded in RadViz, an integrated high dimensional analytic and visualization tool. We started with the 1400 compound GI(50) values as input and selected only those compounds, or features, significant in carrying out the classification. With this approach, we identified two small sets of compounds that were most effective in carrying out complete class separation of the melanoma, non-melanoma classes and leukemia, non-leukemia classes. To validate these results, we showed that these two compound sets' GI(50) values were highly accurate classifiers using five standard analytical algorithms. One compound set was most effective against the melanoma class cell lines (14 compounds), and the other set was most effective against the leukemia class cell lines (30 compounds). The two compound classes were both significantly enriched in two different types of substituted p-quinones. The melanoma cell line class of 14 compounds was comprised of 11 compounds that were internal substituted p-quinones, and the leukemia cell line class of 30 compounds was comprised of 6 compounds that were external substituted p-quinones. Attempts to subclassify melanoma or leukemia cell lines based upon their clinical cancer subtype met with limited success. For example, using GI(50) values for the 30 compounds we identified as effective against all leukemia cell lines, we could subclassify acute lymphoblastic leukemia (ALL) origin cell lines from non-ALL leukemia origin cell lines without significant overlap from non-leukemia cell lines. Based upon clustering using GI(50) values for the 60 cancer cell lines laid out by the RadViz algorithm, these two compound subsets did not overlap with clusters containing any of the NCI's 92 compounds of known mechanism of action, a few of which are quinones. Given their structural patterns, the two p-quinone subtypes we identified would clearly be expected to possess different redox potentials/substrate specificities for enzymatic reduction in vivo. These two p-quinone subtypes represent valuable information that may be used in the elucidation of pharmacophores for the design of compounds to treat these two cancer tissue types in the clinic.  相似文献   

9.
For a set of a priori given radionuclides, extracted from a general nuclide data library, the authors use median estimates of the gamma-peak areas and estimates of their errors to produce a list of possible radionuclides matching gamma-ray line(s) and some measure of the reliability of this assignment.

An a priori determined list of nuclides is obtained by searching for a match with the energy information of the database. This procedure is performed in an interactive graphic mode by markers that superimpose the energy information provided by a general gamma-ray data library on the spectral data. This library of experimental data includes approximately 17,000 gamma-energy lines related to 756 known gamma emitter radionuclides listed by ICRP.  相似文献   


10.
The NASA Mars Science Laboratory rover will carry the first Laser Induced Breakdown Spectroscopy experiment in space: ChemCam. We have developed a laboratory model which mimics ChemCam's main characteristics. We used a set of target samples relevant to Mars geochemistry, and we recorded individual spectra. We propose a data reduction scheme for Laser Induced Breakdown Spectroscopy data incorporating de-noising, continuum removal, and peak fitting. Known effects of the Martian atmosphere are confirmed with our experiment: better Signal-to-Noise Ratio on Mars compared to Earth, narrower peak width, and essentially no self-absorption. The wavelength shift of emission lines from air to Mars pressure is discussed. The National Institute of Standards and Technology vacuum database is used for wavelength calibration and to identify the elemental lines. Our Martian database contains 1336 lines for 32 elements: H, Li, Be, B, C, N, O, F, Na, Mg, Al, Si, P, S, Cl, K, Ar, Ca, Ti, V, Cr, Mn, Fe, Ni, Cu, Zn, As, Rb, Sr, Cs, Ba, and Pb. It is a subset of the National Institute of Standards and Technology database to be used for Martian geochemistry. Finally, synthetic spectra can be built from the Martian database. Correlation calculations help to distinguish between elements in case of uncertainty. This work is used to create tools and support data for the interpretation of ChemCam results.  相似文献   

11.
An information database on internal pH gradients in chromatofocusing was developed. The Windows 95/98/NT–based database contains information on model and experimental pH gradients and has a user-friendly interface for information retrieval and new data input. The database also contains two embedded programs: CF for simulating pH gradients by experimental conditions and ABC for rapidly calculating particular sections in gradient profiles. Examples of the use of embedded programs are given, and experimental and model pH gradients are compared.  相似文献   

12.
Combinatorial chemistry and high-throughput screening technologies produce huge amounts of data on a regular basis. Sieving through these libraries of compounds and their associated assay data to identify appropriate series for follow-up is a daunting task, which has created a need for computational techniques that can find coherent islands of structure-activity relationships in this sea. Structural unit analysis (SUA) examines an entire data set so as to identify the molecular substructures or fragments that distinguish compounds with high activity from those with average activity. The algorithm is iterative and follows set heuristics in order to generate the structural units. It produces graphs that represent a set of units, which become SUA rules. Finding all of the input structures that match these graphs generates clusters. The Apriori algorithm for association rule mining is adapted to explore all of the combinations of structural units that define useful series. User-defined constraints are applied toward series selection and the refinement of rules. The significance of a series is determined by applying statistical methods appropriate to each data set. Application to the NCI-H23 (DTP Human Tumor Cell Line Screen) database serves to illustrate the process by which structural series are identified. An application of the method to scaffold hopping is then discussed in connection with proprietary screening data from a lead optimization project directed toward the treatment of respiratory tract infections at Bayer Healthcare. SUA was able to successfully identify promising alternative core structures in addition to identifying compounds with above-average activity and selectivity.  相似文献   

13.
Substructure mining algorithms are important drug discovery tools since they can find substructures that affect physicochemical and biological properties. Current methods, however, only consider a part of all chemical information that is present within a data set of compounds. Therefore, the overall aim of our study was to enable more exhaustive data mining by designing methods that detect all substructures of any size, shape, and level of chemical detail. A means of chemical representation was developed that uses atomic hierarchies, thus enabling substructure mining to consider general and/or highly specific features. As a proof-of-concept, the efficient, multipurpose graph mining system Gaston learned substructures of any size and shape from a mutagenicity data set that was represented in this manner. From these substructures, we extracted a set of only six nonredundant, discriminative substructures that represent relevant biochemical knowledge. Our results demonstrate the individual and synergistic importance of elaborate chemical representation and mining for nonlinear substructures. We conclude that the combination of elaborate chemical representation and Gaston provides an excellent method for 2D substructure mining as this recipe systematically explores all substructures in different levels of chemical detail.  相似文献   

14.
15.
16.
PlantPIs is a web querying system for a database collection of plant protease inhibitors data. Protease inhibitors in plants are naturally occurring proteins that inhibit the function of endogenous and exogenous proteases. In this paper the design and development of a web framework providing a clear and very flexible way of querying plant protease inhibitors data is reported. The web resource is based on a relational database, containing data of plants protease inhibitors publicly accessible, and a graphical user interface providing all the necessary browsing tools, including a data exporting function. PlantPIs contains information extracted principally from MEROPS database, filtered, annotated and compared with data stored in other protein and gene public databases, using both automated techniques and domain expert evaluations. The data are organized to allow a flexible and easy way to access stored information. The database is accessible at http://www.plantpis.ba.itb.cnr.it/.  相似文献   

17.
In this paper, Legendre moments are calculated to extract the global information from a set of two-dimensional polyacrylamide gel electrophoresis map images. The dataset contains 18 samples belonging to two different cell lines (PACA44 and T3M4) of control (untreated) and drug-treated pancreatic ductal carcinoma cells. The aim of this work was to obtain the correct classification of the 18 samples, using the Legendre moments as discriminant variables. For each image the Legendre moments up to a maximum order of 100 were computed. The stepwise linear discriminant analysis (LDA) was performed in order to select the moments with the highest discriminating power. The results demonstrate that the Legendre moments can be successfully applied for fast classification purposes and similarity analysis.  相似文献   

18.
As high-resolution biological transmission electron microscopy (TEM) has increased in popularity over recent years, the volume of data and number of projects underway has risen dramatically. A robust tool for effective data management is essential to efficiently process large data sets and extract maximum information from the available data. We present the Electron Microscopy Electronic Notebook (EMEN), a portable, object-oriented, web-based tool for TEM data archival and project management. EMEN has several unique features. First, the database is logically organized and annotated so multiple collaborators at different geographical locations can easily access and interpret the data without assistance. Second, the database was designed to provide flexibility to the user, so it can be used much as a lab notebook would be, while maintaining a structure suitable for data mining and direct interaction with data-processing software. Finally, as an object-oriented database, the database structure is dynamic and can be easily extended to incorporate information not defined in the original database specification.  相似文献   

19.
Warmr: a data mining tool for chemical data   总被引:5,自引:0,他引:5  
  相似文献   

20.
A database of lipid phase transition temperatures and enthalpy changes   总被引:1,自引:0,他引:1  
The systematic study of the mesomorphic phase properties of synthetic and biologically derived lipids began some 30 years ago. In the past decade, interest in this area has grown enormously. As a result, there exists a wealth of information on lipid phase behavior, but unfortunately, these data have, until now, been scattered throughout the literature in a variety of books, proceedings, and journals. The data have recently been compiled in a centralized database with a view to providing ready access to the same and to the appropriate literature. The compilation facilitates review of what has thus far been accomplished and highlights what remains to be done in this active research area. As such, it represents a convenient summary of the existing data which, when evaluated, will enable us to identify where deficits exist in the data, to reveal the fundamental physicochemical principles upon which lipid phase behavior is based, and to understand more completely lipid phase relations in biological, reconstituted, and formulated systems. The compilation consists of a tabulation of all known mesomorphic and polymorphic phase transition temperatures and enthalpy changes for synthetic and biologically derived lipids in the dry and in the partially and fully hydrated states. Also included is the effect on these thermodynamic values of pH, and of salt and metal ion concentration and other additives such as proteins, drugs, etc. The methods used in making the measurements and the experimental conditions are reported. Bibliographic information includes complete literature referencing and list of authors. As of this writing, the database is current through June 1990 and contains 9500 records. Each record contains 28 fields. Here, we describe how the database originated, its scope and contents, data abstraction procedures, and issues relating to mesophase and lipid nomenclature, data analysis, and evaluation, and database maintenance and distribution.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号