首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
Experimental programs have been underway for several years to determine the environmental effects of chemical compounds, mixtures, and the like. Among these programs is the National Toxicology Program (NTP) on rodent carcinogenicity. Because these experiments are costly and time-consuming, the rate at which test articles (i.e., chemicals) can be tested is limited. The ability to predict the outcome of the analysis at various points in the process would facilitate informed decisions about the allocation of testing resources. To assist human experts in organizing an empirical testing regime, and to try to shed light on mechanisms of toxicity, we constructed toxicity models using various machine learning and data mining methods, both existing and those of our own devising. These models took the form of decision trees, rule sets, neural networks, rules extracted from trained neural networks, and Bayesian classifiers. As a training set, we used recent results from rodent carcinogenicity bioassays conducted by the NTP on 226 test articles. We performed 10-way cross-validation on each of our models to approximate their expected error rates on unseen data. The data set consists of physical-chemical parameters of test articles, alerting chemical substructures, salmonella mutagenicity assay results, subchronic histopathology data, and information on route, strain, and sex/species for 744 individual experiments. These results contribute to the ongoing process of evaluating and interpreting the data collected from chemical toxicity studies.  相似文献   

3.
A public bacterial mutagenicity database was classified into 2-D structural families using a set of specific algorithms and clustering techniques that find overlapping classes of compounds based upon chemical substructures. Structure-activity relationships were learned from the biological activity of the compounds within each class and used to identify rules that define substructures potentially responsible for mutagenic activity. In addition, this method of analysis was used to compare the pharmacologically relevant substructure of test compounds with their potential toxic substructures making this a potentially valuable in silico profiling tool for lead selection and optimization.  相似文献   

4.
There exists many databases containing information on genes that are useful for background information in machine learning analysis of microarray data. The gene ontology and gene ontology annotation projects are among the most comprehensive of these. We demonstrate how inductive logic programming (ILP) can be used to build classification rules for microarray data which naturally incorporates the gene ontology and annotations to it as background knowledge without removing the inherent graph structure of the ontology. The ILP rules generated are parsimonious and easy to interpret. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

5.
Substructure mining algorithms are important drug discovery tools since they can find substructures that affect physicochemical and biological properties. Current methods, however, only consider a part of all chemical information that is present within a data set of compounds. Therefore, the overall aim of our study was to enable more exhaustive data mining by designing methods that detect all substructures of any size, shape, and level of chemical detail. A means of chemical representation was developed that uses atomic hierarchies, thus enabling substructure mining to consider general and/or highly specific features. As a proof-of-concept, the efficient, multipurpose graph mining system Gaston learned substructures of any size and shape from a mutagenicity data set that was represented in this manner. From these substructures, we extracted a set of only six nonredundant, discriminative substructures that represent relevant biochemical knowledge. Our results demonstrate the individual and synergistic importance of elaborate chemical representation and mining for nonlinear substructures. We conclude that the combination of elaborate chemical representation and Gaston provides an excellent method for 2D substructure mining as this recipe systematically explores all substructures in different levels of chemical detail.  相似文献   

6.
To a significant degree processes of database development are based upon human activities, which are susceptible to various errors. Propagation of errors in the processing leads to a decrease in the value of original data as well as that of any database products. Data quality is a critical issue that every database producer must handle as an inseparable part of the database management. Within the Thermodynamics Research Center (TRC), a systematic approach to implement database integrity rules was established through the use of modern database technology, statistical methods, and thermodynamic principles. The four major functions of the system--error prevention, database integrity enforcement, scientific data integrity protection, and database traceability--are detailed in this paper.  相似文献   

7.
Mass spectrometry (MS) is widely used for the identification of chemical compounds by matching the experimentally acquired mass spectrum against a database of reference spectra. However, this approach suffers from a limited coverage of the existing databases causing a failure in the identification of a compound not present in the database. Among the computational approaches for mining metabolite structures based on MS data, one option is to predict molecular fingerprints from the mass spectra by means of chemometric strategies and then use them to screen compound libraries. This can be carried out by calibrating multi-task artificial neural networks from large datasets of mass spectra, used as inputs, and molecular fingerprints as outputs. In this study, we prepared a large LC-MS/MS dataset from an on-line open repository. These data were used to train and evaluate deep-learning-based approaches to predict molecular fingerprints and retrieve the structure of unknown compounds from their LC-MS/MS spectra. Effects of data sparseness and the impact of different strategies of data curing and dimensionality reduction on the output accuracy have been evaluated. Moreover, extensive diagnostics have been carried out to evaluate modelling advantages and drawbacks as a function of the explored chemical space.  相似文献   

8.
We have developed a method, given a database of molecules and associated activities, to identify molecular substructures that are associated with many different biological activities. These may be therapeutic areas (e.g. antihypertensive) and/or mechanism-based activities (e.g. renin inhibitor). This information helps us avoid chemical classes that are likely to have unanticipated side effects and also can suggest combinatorial libraries that might have activity on a variety of receptor targets. The method was applied to the USPDI and MDDR databases. There are clearly substructures in each database that occur in many compounds and span a variety of therapeutic categories. Some of these are expected, but some are not.  相似文献   

9.
In 2001, the European Commission published a policy statement ("White Paper") on future chemicals regulation and risk reduction that proposed the use of non-animal test systems and tailor-made testing approaches, including (Q)SARs, to reduce financial costs and the number of test animals employed. The authors have compiled a database containing data submitted within the EU chemicals notification procedure. From these data, (Q)SARs for the prediction of local irritation/corrosion and/or sensitisation potential were developed and published. These (Q)SARs, together with an expert system supporting their use, will be submitted for official validation and application within regulatory hazard assessment strategies. The main features are: two sets of structural alerts for the prediction of skin sensitisation hazard classification as defined by the European risk phrase R43, comprising 15 rules for chemical substructures deemed to be sensitising by direct action with cells or proteins, and three rules for substructures acting indirectly, i.e., requiring biochemical transformation; a decision support system (DSS) for the prediction of skin and/or eye lesion potential built from information extracted from our database. This DSS combines SARs defining reactive chemical substructures relevant for local lesions to be classified, and QSARs for the prediction of the absence of such a potential. The role of the BfR database, and (Q)SARs derived from it, in the use of current and future (EU) testing strategies for irritation and sensitisation is discussed.  相似文献   

10.
The NCI Developmental Therapeutics Program Human Tumor cell line data set is a publicly available database that contains cellular assay screening data for over 40 000 compounds tested in 60 human tumor cell lines. The database also contains microarray assay gene expression data for the cell lines, and so it provides an excellent information resource particularly for testing data mining methods that bridge chemical, biological, and genomic information. In this paper we describe a formal knowledge discovery approach to characterizing and data mining this set and report the results of some of our initial experiments in mining the set from a chemoinformatics perspective.  相似文献   

11.

In 2001, the European Commission published a policy statement ("White Paper") on future chemicals regulation and risk reduction that proposed the use of non-animal test systems and tailor-made testing approaches, including (Q)SARs, to reduce financial costs and the number of test animals employed. The authors have compiled a database containing data submitted within the EU chemicals notification procedure. From these data, (Q)SARs for the prediction of local irritation/corrosion and/or sensitisation potential were developed and published. These (Q)SARs, together with an expert system supporting their use, will be submitted for official validation and application within regulatory hazard assessment strategies. The main features are: ? two sets of structural alerts for the prediction of skin sensitisation hazard classification as defined by the European risk phrase R43, comprising 15 rules for chemical substructures deemed to be sensitising by direct action with cells or proteins, and three rules for substructures acting indirectly, i.e., requiring biochemical transformation; ? a decision support system (DSS) for the prediction of skin and/or eye lesion potential built from information extracted from our database. This DSS combines SARs defining reactive chemical substructures relevant for local lesions to be classified, and QSARs for the prediction of the absence of such a potential. The role of the BfR database, and (Q)SARs derived from it, in the use of current and future (EU) testing strategies for irritation and sensitisation is discussed.  相似文献   

12.
Natural products,as major resources for drug discovery historically,are gaining more attentions recently due to the advancement in genomic sequencing and other technologies,which makes them attractive and amenable to drug candidate screening.Collecting and mining the bioactivity information of natural products are extremely important for accelerating drug development process by reducing cost.Lately,a number of publicly accessible databases have been established to facilitate the access to the chemical biology data for small molecules including natural products.Thus,it is imperative for scientists in related fields to exploit these resources in order to expedite their researches on natural products as drug leads/candidates for disease treatment.PubChem,as a public database,contains large amounts of natural products associated with bioactivity data.In this review,we introduce the information system provided at PubChem,and systematically describe the applications for a set of PubChem web services for rapid data retrieval,analysis,and downloading of natural products.We hope this work can serve as a starting point for the researchers to perform data mining on natural products using PubChem.  相似文献   

13.
In this paper we propose a new method based on measurements of the structural similarity for the clustering of chemical databases. The proposed method allows the dynamic adjustment of the size and number of cells or clusters in which the database is classified. Classification is carried out using measurements of structural similarity obtained from the matching of molecular graphs. The classification process is open to the use of different similarity indexes and different measurements of matching. This process consists of the projection of the obtained measures of similarity among the elements of the database in a new space of similarity. The possibility of the dynamic readjustment of the dimension and characteristic of the projection space to adapt to the most favorable conditions of the problem under study and the simplicity and computational efficiency make the proposed method appropriate for its use with medium and large databases. The clustering method increases the performance of the screening processes in chemical databases, facilitating the recovery of chemical compounds that share all or subsets of common substructures to a given pattern. For the realization of the work a database of 498 natural compounds with wide molecular diversity extracted from SPECS and BIOSPECS B.V. free database has been used.  相似文献   

14.
The use of two types of parallel computer hardware for increasing the efficiency of processing in chemical structure data bases is discussed. The distributed array processor can be used for the clustering of 2-D chemical structure data bases by using the Jarvis—Patrick clustering method and for the ranking of output in an experimental system for substructure searching in the 3-D macromolecules in the Protein Data Bank. The Inmos transputer can be used in the construction of PC-based systems for 2-D substructure searching and in the identification of the maximal substructures common to pairs of 3-D molecules.  相似文献   

15.
16.

Background  

Chemical compounds affecting a bioactivity can usually be classified into several groups, each of which shares a characteristic substructure. We call these substructures "basic active structures" or BASs. The extraction of BASs is challenging when the database of compounds contains a variety of skeletons. Data mining technology, associated with the work of chemists, has enabled the systematic elaboration of BASs.  相似文献   

17.
The use of the mass spectral simulation system, MASSIS, is reported and its performance has been evaluated. The search for substructures matching with fragments stored in four pivot databases was realised using the Ullmann algorithm. Special cleavage rules, such as the McLafferty rearrangement, the retro-Diels-Alder reaction, elimination of a neutral small molecule and oxygen migration, are processed through shortest path and depth-first search algorithms. For a search in the database of small fragments, the key step is to determine the tautomeric fragments; then a match can be obtained using a subgraph isomorphism algorithm. A string match is used to determine peak intensity. If the limited environment of an atom is the same as that found in the database of relationships between fragment and intensity, this intensity value is assigned to the query atom. Performance in a set of tests is very important in evaluating the system performance. A comparison of peaks with an intensity greater than 5% (relative) shows that our system has a very high performance figure (> 90% ) for routine organic compounds.  相似文献   

18.
19.
Computer readable databases have become an integral part of chemical research right from planning data acquisition to interpretation of the information generated. The databases available today are numerical, spectral and bibliographic. Data representation by different schemes--relational, hierarchical and objects--is demonstrated. Quality index (QI) throws light on the quality of data. The objective, prospects and impact of database activity on expert systems are discussed. The number and size of corporate databases available on international networks crossed manageable number leading to databases about their contents. Subsets of corporate or small databases have been developed by groups of chemists. The features and role of knowledge-based or intelligent databases are described.  相似文献   

20.
Chemical fragment spaces are combinations of molecular fragments and connection rules. They offer the possibility to encode an enormously large number of chemical structures in a very compact format. Fragment spaces are useful both in similarity-based (2D) and structure-based (3D) de novo design applications. We present disconnection and filtering rules leading to several thousand unique, medium size fragments when applied to databases of druglike molecules. We evaluate alternative strategies to select subsets of these fragments, with the aim of maximizing the coverage of known druglike chemical space with a strongly reduced set of fragments. For these evaluations, we use the Ftrees fragment space method. We assess a diversity-oriented selection method based on maximum common substructures and a method biased toward high frequency of occurrence of fragments and find that they are complementary to each other.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号