期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A novel logic-based approach for quantitative toxicology prediction

Amini A Muggleton SH Lodhi H Sternberg MJ 《Journal of chemical information and modeling》2007,47(3):998-1006

相似文献

2.

Symbolic, neural, and Bayesian machine learning models for predicting carcinogenicity of chemical compounds

Bahler D Stone B Wellington C Bristol DW 《Journal of chemical information and computer sciences》2000,40(4):906-914

Experimental programs have been underway for several years to determine the environmental effects of chemical compounds, mixtures, and the like. Among these programs is the National Toxicology Program (NTP) on rodent carcinogenicity. Because these experiments are costly and time-consuming, the rate at which test articles (i.e., chemicals) can be tested is limited. The ability to predict the outcome of the analysis at various points in the process would facilitate informed decisions about the allocation of testing resources. To assist human experts in organizing an empirical testing regime, and to try to shed light on mechanisms of toxicity, we constructed toxicity models using various machine learning and data mining methods, both existing and those of our own devising. These models took the form of decision trees, rule sets, neural networks, rules extracted from trained neural networks, and Bayesian classifiers. As a training set, we used recent results from rodent carcinogenicity bioassays conducted by the NTP on 226 test articles. We performed 10-way cross-validation on each of our models to approximate their expected error rates on unseen data. The data set consists of physical-chemical parameters of test articles, alerting chemical substructures, salmonella mutagenicity assay results, subchronic histopathology data, and information on route, strain, and sex/species for 744 individual experiments. These results contribute to the ongoing process of evaluating and interpreting the data collected from chemical toxicity studies. 相似文献

3.

Rule extraction from a mutagenicity data set using adaptively grown phylogenetic-like trees

Bacha PA Gruver HS Den Hartog BK Tamura SY Nutt RF 《Journal of chemical information and computer sciences》2002,42(5):1104-1111

A public bacterial mutagenicity database was classified into 2-D structural families using a set of specific algorithms and clustering techniques that find overlapping classes of compounds based upon chemical substructures. Structure-activity relationships were learned from the biological activity of the compounds within each class and used to identify rules that define substructures potentially responsible for mutagenic activity. In addition, this method of analysis was used to compare the pharmacologically relevant substructure of test compounds with their potential toxic substructures making this a potentially valuable in silico profiling tool for lead selection and optimization. 相似文献

4.

Microarray data classification using inductive logic programming and gene ontology background information

Einar Ryeng Bjrn Kre Alsberg 《Journal of Chemometrics》2010,24(5):231-240

There exists many databases containing information on genes that are useful for background information in machine learning analysis of microarray data. The gene ontology and gene ontology annotation projects are among the most comprehensive of these. We demonstrate how inductive logic programming (ILP) can be used to build classification rules for microarray data which naturally incorporates the gene ontology and annotations to it as background knowledge without removing the inherent graph structure of the ontology. The ILP rules generated are parsimonious and easy to interpret. Copyright © 2010 John Wiley & Sons, Ltd. 相似文献

5.

Substructure mining using elaborate chemical representation

Kazius J Nijssen S Kok J Bäck T Ijzerman AP 《Journal of chemical information and modeling》2006,46(2):597-605

Substructure mining algorithms are important drug discovery tools since they can find substructures that affect physicochemical and biological properties. Current methods, however, only consider a part of all chemical information that is present within a data set of compounds. Therefore, the overall aim of our study was to enable more exhaustive data mining by designing methods that detect all substructures of any size, shape, and level of chemical detail. A means of chemical representation was developed that uses atomic hierarchies, thus enabling substructure mining to consider general and/or highly specific features. As a proof-of-concept, the efficient, multipurpose graph mining system Gaston learned substructures of any size and shape from a mutagenicity data set that was represented in this manner. From these substructures, we extracted a set of only six nonredundant, discriminative substructures that represent relevant biochemical knowledge. Our results demonstrate the individual and synergistic importance of elaborate chemical representation and mining for nonlinear substructures. We conclude that the combination of elaborate chemical representation and Gaston provides an excellent method for 2D substructure mining as this recipe systematically explores all substructures in different levels of chemical detail. 相似文献

6.

Data quality assurance for thermophysical property databases--applications to the TRC SOURCE data system

Dong Q Yan X Wilhoit RC Hong X Chirico RD Diky VV Frenkel M 《Journal of chemical information and computer sciences》2002,42(3):473-480

To a significant degree processes of database development are based upon human activities, which are susceptible to various errors. Propagation of errors in the processing leads to a decrease in the value of original data as well as that of any database products. Data quality is a critical issue that every database producer must handle as an inseparable part of the database management. Within the Thermodynamics Research Center (TRC), a systematic approach to implement database integrity rules was established through the use of modern database technology, statistical methods, and thermodynamic principles. The four major functions of the system--error prevention, database integrity enforcement, scientific data integrity protection, and database traceability--are detailed in this paper. 相似文献

7.

Multi-Task Neural Networks and Molecular Fingerprints to Enhance Compound Identification from LC-MS/MS Data

Viviana Consonni Fabio Gosetti Veronica Termopoli Roberto Todeschini Cecile Valsecchi Davide Ballabio 《Molecules (Basel, Switzerland)》2022,27(18)

Mass spectrometry (MS) is widely used for the identification of chemical compounds by matching the experimentally acquired mass spectrum against a database of reference spectra. However, this approach suffers from a limited coverage of the existing databases causing a failure in the identification of a compound not present in the database. Among the computational approaches for mining metabolite structures based on MS data, one option is to predict molecular fingerprints from the mass spectra by means of chemometric strategies and then use them to screen compound libraries. This can be carried out by calibrating multi-task artificial neural networks from large datasets of mass spectra, used as inputs, and molecular fingerprints as outputs. In this study, we prepared a large LC-MS/MS dataset from an on-line open repository. These data were used to train and evaluate deep-learning-based approaches to predict molecular fingerprints and retrieve the structure of unknown compounds from their LC-MS/MS spectra. Effects of data sparseness and the impact of different strategies of data curing and dimensionality reduction on the output accuracy have been evaluated. Moreover, extensive diagnostics have been carried out to evaluate modelling advantages and drawbacks as a function of the explored chemical space. 相似文献

8.

Finding multiactivity substructures by mining databases of drug-like compounds

Sheridan RP 《Journal of chemical information and computer sciences》2003,43(3):1037-1050

We have developed a method, given a database of molecules and associated activities, to identify molecular substructures that are associated with many different biological activities. These may be therapeutic areas (e.g. antihypertensive) and/or mechanism-based activities (e.g. renin inhibitor). This information helps us avoid chemical classes that are likely to have unanticipated side effects and also can suggest combinatorial libraries that might have activity on a variety of receptor targets. The method was applied to the USPDI and MDDR databases. There are clearly substructures in each database that occur in many compounds and span a variety of therapeutic categories. Some of these are expected, but some are not. 相似文献

9.

Regulatory use of (Q)SARs in toxicological hazard assessment strategies

Gerner I Spielmann H Hoefer T Liebsch M Herzler M 《SAR and QSAR in environmental research》2004,15(5-6):359-366

In 2001, the European Commission published a policy statement ("White Paper") on future chemicals regulation and risk reduction that proposed the use of non-animal test systems and tailor-made testing approaches, including (Q)SARs, to reduce financial costs and the number of test animals employed. The authors have compiled a database containing data submitted within the EU chemicals notification procedure. From these data, (Q)SARs for the prediction of local irritation/corrosion and/or sensitisation potential were developed and published. These (Q)SARs, together with an expert system supporting their use, will be submitted for official validation and application within regulatory hazard assessment strategies. The main features are: two sets of structural alerts for the prediction of skin sensitisation hazard classification as defined by the European risk phrase R43, comprising 15 rules for chemical substructures deemed to be sensitising by direct action with cells or proteins, and three rules for substructures acting indirectly, i.e., requiring biochemical transformation; a decision support system (DSS) for the prediction of skin and/or eye lesion potential built from information extracted from our database. This DSS combines SARs defining reactive chemical substructures relevant for local lesions to be classified, and QSARs for the prediction of the absence of such a potential. The role of the BfR database, and (Q)SARs derived from it, in the use of current and future (EU) testing strategies for irritation and sensitisation is discussed. 相似文献

10.

Chemical data mining of the NCI human tumor cell line database

Wang H Klinginsmith J Dong X Lee AC Guha R Wu Y Crippen GM Wild DJ 《Journal of chemical information and modeling》2007,47(6):2063-2076

The NCI Developmental Therapeutics Program Human Tumor cell line data set is a publicly available database that contains cellular assay screening data for over 40 000 compounds tested in 60 human tumor cell lines. The database also contains microarray assay gene expression data for the cell lines, and so it provides an excellent information resource particularly for testing data mining methods that bridge chemical, biological, and genomic information. In this paper we describe a formal knowledge discovery approach to characterizing and data mining this set and report the results of some of our initial experiments in mining the set from a chemoinformatics perspective. 相似文献

11.

Regulatory use of (Q)SARs in toxicological hazard assessment strategies

I. Gerner H. Spielmann T. Hoefer M. Liebsch M. Herzler 《SAR and QSAR in environmental research》2013,24(5-6):359-366

In 2001, the European Commission published a policy statement ("White Paper") on future chemicals regulation and risk reduction that proposed the use of non-animal test systems and tailor-made testing approaches, including (Q)SARs, to reduce financial costs and the number of test animals employed. The authors have compiled a database containing data submitted within the EU chemicals notification procedure. From these data, (Q)SARs for the prediction of local irritation/corrosion and/or sensitisation potential were developed and published. These (Q)SARs, together with an expert system supporting their use, will be submitted for official validation and application within regulatory hazard assessment strategies. The main features are: ? two sets of structural alerts for the prediction of skin sensitisation hazard classification as defined by the European risk phrase R43, comprising 15 rules for chemical substructures deemed to be sensitising by direct action with cells or proteins, and three rules for substructures acting indirectly, i.e., requiring biochemical transformation; ? a decision support system (DSS) for the prediction of skin and/or eye lesion potential built from information extracted from our database. This DSS combines SARs defining reactive chemical substructures relevant for local lesions to be classified, and QSARs for the prediction of the absence of such a potential. The role of the BfR database, and (Q)SARs derived from it, in the use of current and future (EU) testing strategies for irritation and sensitisation is discussed. 相似文献

12.

Web search and data mining of natural products and their bioactivities in PubChem

HAO Ming CHENG Tiejun WANG Yanli BRYANT H. Stephen 《中国科学:化学(英文版)》2013,56(10):1424-1435

Natural products,as major resources for drug discovery historically,are gaining more attentions recently due to the advancement in genomic sequencing and other technologies,which makes them attractive and amenable to drug candidate screening.Collecting and mining the bioactivity information of natural products are extremely important for accelerating drug development process by reducing cost.Lately,a number of publicly accessible databases have been established to facilitate the access to the chemical biology data for small molecules including natural products.Thus,it is imperative for scientists in related fields to exploit these resources in order to expedite their researches on natural products as drug leads/candidates for disease treatment.PubChem,as a public database,contains large amounts of natural products associated with bioactivity data.In this review,we introduce the information system provided at PubChem,and systematically describe the applications for a set of PubChem web services for rapid data retrieval,analysis,and downloading of natural products.We hope this work can serve as a starting point for the researchers to perform data mining on natural products using PubChem. 相似文献

13.

Clustering chemical databases using adaptable projection cells and MCS similarity values

Luque Ruiz I Cerruela García G Gómez-Nieto MA 《Journal of chemical information and modeling》2005,45(5):1178-1194

In this paper we propose a new method based on measurements of the structural similarity for the clustering of chemical databases. The proposed method allows the dynamic adjustment of the size and number of cells or clusters in which the database is classified. Classification is carried out using measurements of structural similarity obtained from the matching of molecular graphs. The classification process is open to the use of different similarity indexes and different measurements of matching. This process consists of the projection of the obtained measures of similarity among the elements of the database in a new space of similarity. The possibility of the dynamic readjustment of the dimension and characteristic of the projection space to adapt to the most favorable conditions of the problem under study and the simplicity and computational efficiency make the proposed method appropriate for its use with medium and large databases. The clustering method increases the performance of the screening processes in chemical databases, facilitating the recovery of chemical compounds that share all or subsets of common substructures to a given pattern. For the realization of the work a database of 498 natural compounds with wide molecular diversity extracted from SPECS and BIOSPECS B.V. free database has been used. 相似文献

14.

Searching of chemical structure data bases with parallel computer hardware

Edie M. Rasmussen Peter Willett Terence Wilson Gordon A. Manson George A. Wilson 《Analytica chimica acta》1990

The use of two types of parallel computer hardware for increasing the efficiency of processing in chemical structure data bases is discussed. The distributed array processor can be used for the clustering of 2-D chemical structure data bases by using the Jarvis—Patrick clustering method and for the ranking of output in an experimental system for substructure searching in the 3-D macromolecules in the Protein Data Bank. The Inmos transputer can be used in the construction of PC-based systems for 2-D substructure searching and in the identification of the maximal substructures common to pairs of 3-D molecules. 相似文献

15.

Substructure-based support vector machine classifiers for prediction of adverse effects in diverse classes of drugs

Bhavani S Nagargadde A Thawani A Sridhar V Chandra N 《Journal of chemical information and modeling》2006,46(6):2478-2486

相似文献

16.

The development of a knowledge base for basic active structures: an example case of dopamine agonists

Takashi Okada Masumi Yamakawa Norihito Ohmori Sachio Mori Hiroshi Horikawa Taketo Hayashi Satoshi Fujishima 《Chemistry Central journal》2010,4(1):1

Background

Chemical compounds affecting a bioactivity can usually be classified into several groups, each of which shares a characteristic substructure. We call these substructures "basic active structures" or BASs. The extraction of BASs is challenging when the database of compounds contains a variety of skeletons. Data mining technology, associated with the work of chemists, has enabled the systematic elaboration of BASs. 相似文献

17.

MASSIS: a mass spectrum simulation system. 2: Procedures and performance

Chen H Fan B Petitjean M Panaye A Doucet JP Li F Xia H Yuan S 《European journal of mass spectrometry (Chichester, England)》2003,9(5):445-457

The use of the mass spectral simulation system, MASSIS, is reported and its performance has been evaluated. The search for substructures matching with fragments stored in four pivot databases was realised using the Ullmann algorithm. Special cleavage rules, such as the McLafferty rearrangement, the retro-Diels-Alder reaction, elimination of a neutral small molecule and oxygen migration, are processed through shortest path and depth-first search algorithms. For a search in the database of small fragments, the key step is to determine the tautomeric fragments; then a match can be obtained using a subgraph isomorphism algorithm. A string match is used to determine peak intensity. If the limited environment of an atom is the same as that found in the database of relationships between fragment and intensity, this intensity value is assigned to the query atom. Performance in a set of tests is very important in evaluating the system performance. A comparison of peaks with an intensity greater than 5% (relative) shows that our system has a very high performance figure (> 90% ) for routine organic compounds. 相似文献

18.

C-QSAR: a database of 18,000 QSARs and associated biological and physical data

Kurup A 《Journal of computer-aided molecular design》2003,17(2-4):187-196

相似文献

19.

Data,knowledge and method bases in chemical sciences. Part IV. Current status in databases

Braibanti A Rao RS Rao GN Ramam VA Rao SV 《Annali di chimica》2002,92(7-8):689-704

Computer readable databases have become an integral part of chemical research right from planning data acquisition to interpretation of the information generated. The databases available today are numerical, spectral and bibliographic. Data representation by different schemes--relational, hierarchical and objects--is demonstrated. Quality index (QI) throws light on the quality of data. The objective, prospects and impact of database activity on expert systems are discussed. The number and size of corporate databases available on international networks crossed manageable number leading to databases about their contents. Subsets of corporate or small databases have been developed by groups of chemists. The features and role of knowledge-based or intelligent databases are described. 相似文献

20.

Chemical fragment spaces for de novo design

Mauser H Stahl M 《Journal of chemical information and modeling》2007,47(2):318-324

Chemical fragment spaces are combinations of molecular fragments and connection rules. They offer the possibility to encode an enormously large number of chemical structures in a very compact format. Fragment spaces are useful both in similarity-based (2D) and structure-based (3D) de novo design applications. We present disconnection and filtering rules leading to several thousand unique, medium size fragments when applied to databases of druglike molecules. We evaluate alternative strategies to select subsets of these fragments, with the aim of maximizing the coverage of known druglike chemical space with a strongly reduced set of fragments. For these evaluations, we use the Ftrees fragment space method. We assess a diversity-oriented selection method based on maximum common substructures and a method biased toward high frequency of occurrence of fragments and find that they are complementary to each other. 相似文献