首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The requirement of aligning each individual molecule in a data set severely limits the type of molecules which can be analysed with traditional structure activity relationship (SAR) methods. A method which solves this problem by using relations between objects is inductive logic programming (ILP). Another advantage of this methodology is its ability to include background knowledge as 1st-order logic. However, previous molecular ILP representations have not been effective in describing the electronic structure of molecules. We present a more unified and comprehensive representation based on Richard Bader's quantum topological atoms in molecules (AIM) theory where critical points in the electron density are connected through a network. AIM theory provides a wealth of chemical information about individual atoms and their bond connections enabling a more flexible and chemically relevant representation. To obtain even more relevant rules with higher coverage, we apply manual postprocessing and interpretation of ILP rules. We have tested the usefulness of the new representation in SAR modelling on classifying compounds of low/high mutagenicity and on a set of factor Xa inhibitors of high and low affinity.  相似文献   

2.
In chemoinformatics, searching for compounds which are structurally diverse and share a biological activity is called scaffold hopping. Scaffold hopping is important since it can be used to obtain alternative structures when the compound under development has unexpected side-effects. Pharmaceutical companies use scaffold hopping when they wish to circumvent prior patents for targets of interest. We propose a new method for scaffold hopping using inductive logic programming (ILP). ILP uses the observed spatial relationships between pharmacophore types in pretested active and inactive compounds and learns human-readable rules describing the diverse structures of active compounds. The ILP-based scaffold hopping method is compared to two previous algorithms (chemically advanced template search, CATS, and CATS3D) on 10 data sets with diverse scaffolds. The comparison shows that the ILP-based method is significantly better than random selection while the other two algorithms are not. In addition, the ILP-based method retrieves new active scaffolds which were not found by CATS and CATS3D. The results show that the ILP-based method is at least as good as the other methods in this study. ILP produces human-readable rules, which makes it possible to identify the three-dimensional features that lead to scaffold hopping. A minor variant of a rule learnt by ILP for scaffold hopping was subsequently found to cover an inhibitor identified by an independent study. This provides a successful result in a blind trial of the effectiveness of ILP to generate rules for scaffold hopping. We conclude that ILP provides a valuable new approach for scaffold hopping.  相似文献   

3.
Summary Neural networks and inductive logic programming (ILP) have been compared to linear regression for modelling the QSAR of the inhibition of E. coli dihydrofolate reductase (DHFR) by 2,4-diamino-5-(substitured benzyl)pyrimidines, and, in the subsequent paper [Hirst, J.D., King, R.D. and Sternberg, M.J.E., J. Comput.-Aided Mol. Design, 8 (1994) 421], the inhibition of rodent DHFR by 2,4-diamino-6,6-dimethyl-5-phenyl-dihydrotriazines. Cross-validation trials provide a statistically rigorous assessment of the predictive capabilities of the methods, with training and testing data selected randomly and all the methods developed using identical training data. For the ILP analysis, molecules are represented by attributes other than Hansch parameters. Neural networks and ILP perform better than linear regression using the attribute representation, but the difference is not statistically significant. The major benefit from the ILP analysis is the formulation of understandable rules relating the activity of the inhibitors to their chemical structure.  相似文献   

4.
Summary One of the largest available data sets for developing a quantitative structure-activity relationship (QSAR) — the inhibition of dihydrofolate reductase (DHFR) by 2,4-diamino-6,6-dimethyl-5-phenyl-dihydrotriazine derivatives — has been used for a sixfold cross-validation trial of neural networks, inductive logic programming (ILP) and linear regression. No statistically significant difference was found between the predictive capabilities of the methods. However, the representation of molecules by attributes, which is integral to the ILP approach, provides understandable rules about drug-receptor interactions.  相似文献   

5.
Warmr: a data mining tool for chemical data   总被引:5,自引:0,他引:5  
  相似文献   

6.
7.
8.
9.
10.
Traditional 3D‐quantitative structure–activity relationship (QSAR)/structure–activity relationship (SAR) methodologies are sensitive to the quality of an alignment step which is required to make molecular structures comparable. Even though many methods have been proposed to solve this problem, they often result in a loss of model interpretability. The requirement of alignment is a restriction imposed by traditional regression methods due to their failure to represent relations between data objects directly. Inductive logic programming (ILP) is a class of machine‐learning methods able to describe relational data directly. We propose a new methodology which is aimed at using the richness in molecular interaction fields (MIFs) without being restricted by any alignment procedure. A set of MIFs is computed and further compressed by finding their minima corresponding to the sites of strongest interaction between a molecule and the applied test probe. ILP uses these minima to build easily interpretable rules about activity expressed as pharmacophore rules in the powerful language of first‐order logic. We use a set of previously published inhibitors of factor Xa of the benzamidine family to discuss the problems, requirements and advantages of the new methodology. Copyright © 2007 John Wiley & Sons, Ltd.  相似文献   

11.
Normalization of the data of cDNA microarray is an obligatory step during microarray experiments due to the relatively frequent non-specific errors. Generally, normalization of microarray data is based on the null hypothesis and variance model. In the Yang's model (Yang et al., 2001), at least two types of noises are included. The one is additive noise and the other is multiplicative noise. Usually, background is considered as one of additive noise to the signal and the variation between the signal pixels is the representative multiplicative noise. In this study, the relation between the signal (spot intensity minus background intensity) and background was observed and the influence of background on normalization as a representative additive factor was investigated. Although the relation has not been considered as a factor affecting the normalization, it could improve the accuracy of microarray data when the normalization was carried out considering signal/background ratio. The background dependent normalization decreased the number of genes whose expression levels were changed significantly and it could make their distribution more consistent through the whole range of signal intensities. In this study, printing pin dependent normalization was also carried out regarding the printing pin as a representative multiplicative noise. It improved the distribution of spots in the Cy3-Cy5 scatter plot, but its effect was slight. These studies suggest that there are some influences of the signals on the local backgrounds and they must be considered for the normalization of cDNA microarray data.  相似文献   

12.
13.
Li-Juan Tang  Hai-Long Wu 《Talanta》2009,79(2):260-1694
One problem with discriminant analysis of microarray data is representation of each sample by a large number of genes that are possibly irrelevant, insignificant or redundant. Methods of variable selection are, therefore, of great significance in microarray data analysis. To circumvent the problem, a new gene mining approach is proposed based on the similarity between probability density functions on each gene for the class of interest with respect to the others. This method allows the ascertainment of significant genes that are informative for discriminating each individual class rather than maximizing the separability of all classes. Then one can select genes containing important information about the particular subtypes of diseases. Based on the mined significant genes for individual classes, a support vector machine with local kernel transform is constructed for the classification of different diseases. The combination of the gene mining approach with support vector machine is demonstrated for cancer classification using two public data sets. The results reveal that significant genes are identified for each cancer, and the classification model shows satisfactory performance in training and prediction for both data sets.  相似文献   

14.
This work reports how the use of a standard integrated circuit (IC) fabrication process can improve the potential of silicon nitride layers as substrates for microarray technology. It has been shown that chemical mechanical polishing (CMP) substantially improves the fluorescent intensity of positive control gene and test gene microarray spots on both low-pressure chemical vapor deposition (LPCVD) and plasma-enhanced chemical vapor deposition (PECVD) silicon nitride films, while maintaining a low fluorescent background. This results in the improved discrimination of low expressing genes. The results for the PECVD silicon nitride, which has been previously reported as unsuitable for microarray spotting, are particularly significant for future devices that hope to incorporate microelectronic control and analysis circuitry, due to the film's use as a final passivating layer.  相似文献   

15.
A DNA microarray can track the expression levels of thousands of genes simultaneously. Previous research has demonstrated that this technology can be useful in the classification of cancers. Cancer microarray data normally contains a small number of samples which have a large number of gene expression levels as features. To select relevant genes involved in different types of cancer remains a challenge. In order to extract useful gene information from cancer microarray data and reduce dimensionality, feature selection algorithms were systematically investigated in this study. Using a correlation-based feature selector combined with machine learning algorithms such as decision trees, nave Bayes and support vector machines, we show that classification performance at least as good as published results can be obtained on acute leukemia and diffuse large B-cell lymphoma microarray data sets. We also demonstrate that a combined use of different classification and feature selection approaches makes it possible to select relevant genes with high confidence. This is also the first paper which discusses both computational and biological evidence for the involvement of zyxin in leukaemogenesis.  相似文献   

16.
Du W  Gu T  Tang LJ  Jiang JH  Wu HL  Shen GL  Yu RQ 《Talanta》2011,85(3):1689-1694
As a greedy search algorithm, classification and regression tree (CART) is easily relapsing into overfitting while modeling microarray gene expression data. A straightforward solution is to filter irrelevant genes via identifying significant ones. Considering some significant genes with multi-modal expression patterns exhibiting systematic difference in within-class samples are difficult to be identified by existing methods, a strategy that unimodal transform of variables selected by interval segmentation purity (UTISP) for CART modeling is proposed. First, significant genes exhibiting varied expression patterns can be properly identified by a variable selection method based on interval segmentation purity. Then, unimodal transform is implemented to offer unimodal featured variables for CART modeling via feature extraction. Because significant genes with complex expression patterns can be properly identified and unimodal feature extracted in advance, this developed strategy potentially improves the performance of CART in combating overfitting or underfitting while modeling microarray data. The developed strategy is demonstrated using two microarray data sets. The results reveal that UTISP-based CART provides superior performance to k-nearest neighbors or CARTs coupled with other gene identifying strategies, indicating UTISP-based CART holds great promise for microarray data analysis.  相似文献   

17.
The NCI Developmental Therapeutics Program Human Tumor cell line data set is a publicly available database that contains cellular assay screening data for over 40 000 compounds tested in 60 human tumor cell lines. The database also contains microarray assay gene expression data for the cell lines, and so it provides an excellent information resource particularly for testing data mining methods that bridge chemical, biological, and genomic information. In this paper we describe a formal knowledge discovery approach to characterizing and data mining this set and report the results of some of our initial experiments in mining the set from a chemoinformatics perspective.  相似文献   

18.
In the past several years, a new set of technologies based on whole genome analysis have revolutionized the study of gene expression. These microarray or "gene chip" technologies, which arose out of the development of large-scale sequencing approaches, are now coming into increasing use, generating a far greater volume of data than the data representing the sequences themselves. This review focuses on the current state of development of these technologies, and the available approaches to manage and analyze the information they generate. The applicability of this technology to general problems in biomedicine is also discussed.  相似文献   

19.
The present research aimed to enhance the pharmaceutically active compounds’ (PhACs’) productivity from Streptomyces SUK 25 in submerged fermentation using response surface methodology (RSM) as a tool for optimization. Besides, the characteristics and mechanism of PhACs against methicillin-resistant Staphylococcus aureus were determined. Further, the techno-economic analysis of PhACs production was estimated. The independent factors include the following: incubation time, pH, temperature, shaker rotation speed, the concentration of glucose, mannitol, and asparagine, although the responses were the dry weight of crude extracts, minimum inhibitory concentration, and inhibition zone and were determined by RSM. The PhACs were characterized using GC-MS and FTIR, while the mechanism of action was determined using gene ontology extracted from DNA microarray data. The results revealed that the best operating parameters for the dry mass crude extracts production were 8.20 mg/L, the minimum inhibitory concentrations (MIC) value was 8.00 µg/mL, and an inhibition zone of 17.60 mm was determined after 12 days, pH 7, temperature 28 °C, shaker rotation speed 120 rpm, 1 g glucose /L, 3 g mannitol/L, and 0.5 g asparagine/L with R2 coefficient value of 0.70. The GC-MS and FTIR spectra confirmed the presence of 21 PhACs, and several functional groups were detected. The gene ontology revealed that 485 genes were upregulated and nine genes were downregulated. The specific and annual operation cost of the production of PhACs was U.S. Dollar (U.S.D) 48.61 per 100 mg compared to U.S.D 164.3/100 mg of the market price, indicating that it is economically cheaper than that at the market price.  相似文献   

20.
《Analytical letters》2012,45(13):2117-2134
Abstract

Rapid and efficient diagnosis is essential in the management of drug‐resistant tuberculosis. A DNA microarray technique based on differential hybridization method was described in the present study for detecting mutations in the RNA polymerase beta subunit (rpoB) gene of Mycobacterium tuberculosis (M. tuberculosis) cultures and in clinical specimens. The mutations in rpoB confer resistance to rifampin, an important first‐line antituberculosis drug. The differential hybridization approach was mainly based on the effect of a single base mismatch on the melting temperature of the hybridized DNA; therefore, any point mutation of rpoB gene resulting in the rifampin resistance can be detected efficiently. The development of the DNA microarray involves the design of dozens of oligonucleotide probes for identifying rifampin‐resistant and ‐sensitive strains. The method comprises isolating genomic DNA from the samples containing M. tuberculosis cells, amplifying rpoB gene coding sequence to produce fluorescently labelled product, and hybridization with the oligonucleotide arrays. The results demonstrated the capability of DNA microarray to provide important clinically relevant information about the rpoB gene of mycobacterial organisms. The DNA microarray offers a reliable diagnostic test for rapidly detecting multidrug resistance caused by gene mutations of mycobacteria.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号