首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
3.
4.
5.
A Bayesian network (BN) is a knowledge representation formalism that has proven to be a promising tool for analyzing gene expression data. Several problems still restrict its successful applications. Typical gene expression databases contain measurements for thousands of genes and no more than several hundred samples, but most existing BNs learning algorithms do not scale more than a few hundred variables. Current methods result in poor quality BNs when applied in such high-dimensional datasets. We propose a hybrid constraint-based scored-searching method that is effective for learning gene networks from DNA microarray data. In the first phase of this method, a novel algorithm is used to generate a skeleton BN based on dependency analysis. Then the resulting BN structure is searched by a scoring metric combined with the knowledge learned from the first phase. Computational tests have shown that the proposed method achieves more accurate results than state-of-the-art methods. This method can also be scaled beyond datasets with several hundreds of variables.  相似文献   

6.
BackgroundData made available through large cancer consortia like The Cancer Genome Atlas make for a rich source of information to be studied across and between cancers. In recent years, network approaches have been applied to such data in uncovering the complex interrelationships between mutational and expression profiles, but lack direct testing for expression changes via mutation. In this pan-cancer study we analyze mutation and gene expression information in an integrative manner by considering the networks generated by testing for differences in expression in direct association with specific mutations. We relate our findings among the 19 cancers examined to identify commonalities and differences as well as their characteristics.ResultsUsing somatic mutation and gene expression information across 19 cancers, we generated mutation–expression networks per cancer. On evaluation we found that our generated networks were significantly enriched for known cancer-related genes, such as skin cutaneous melanoma (p < 0.01 using Network of Cancer Genes 4.0). Our framework identified that while different cancers contained commonly mutated genes, there was little concordance between associated gene expression changes among cancers. Comparison between cancers showed a greater overlap of network nodes for cancers with higher overall non-silent mutation load, compared to those with a lower overall non-silent mutation load.ConclusionsThis study offers a framework that explores network information through co-analysis of somatic mutations and gene expression profiles. Our pan-cancer application of this approach suggests that while mutations are frequently common among cancer types, the impact they have on the surrounding networks via gene expression changes varies. Despite this finding, there are some cancers for which mutation-associated network behaviour appears to be similar: suggesting a potential framework for uncovering related cancers for which similar therapeutic strategies may be applicable. Our framework for understanding relationships among cancers has been integrated into an interactive R Shiny application, PAn Cancer Mutation Expression Networks (PACMEN), containing dynamic and static network visualization of the mutation–expression networks. PACMEN also features tools for further examination of network topology characteristics among cancers.  相似文献   

7.
8.
The activity of various additives promoting siloxane equilibration reactions is examined and quantified on model compounds. We found in particular that the “superbase” phosphazene derivative P4-tBu can promote very fast exchanges (a few seconds at 90 °C) even at low concentration (<0.1 wt %). We demonstrate that permanent silicone networks can be transformed into reprocessable and recyclable dynamic networks by mere introduction of such additives. Annealing at high temperature degrades the additives and deactivates the dynamic features of the silicone networks, reverting them back into permanent networks. A simple rheological experiment and the corresponding model allow to extract the critical kinetic parameters to predict and control such deactivations.  相似文献   

9.
We present a Bayesian inference approach to estimating conformational state populations from a combination of molecular modeling and sparse experimental data. Unlike alternative approaches, our method is designed for use with small molecules and emphasizes high‐resolution structural models, using inferential structure determination with reference potentials, and Markov Chain Monte Carlo to sample the posterior distribution of conformational states. As an application of the method, we determine solution‐state conformational populations of the 14‐membered macrocycle cineromycin B, using a combination of previously published sparse Nuclear Magnetic Resonance (NMR) observables and replica‐exchange molecular dynamic/Quantum Mechanical (QM)‐refined conformational ensembles. Our results agree better with experimental data compared to previous modeling efforts. Bayes factors are calculated to quantify the consistency of computational modeling with experiment, and the relative importance of reference potentials and other model parameters. © 2014 Wiley Periodicals, Inc.  相似文献   

10.
11.
The three-dimensional structures of proteins provide their functions and incorrect folding of its β-strands can be the cause of many diseases. There are two major approaches for determining protein structures: computational prediction and experimental methods that employ technologies such as Cryo-electron microscopy. Due to experimental methods’s high costs, extended wait times for its lengthy processes, and incompleteness of results, computational prediction is an attractive alternative. As the focus of the present paper, β-sheet structure prediction is a major portion of overall protein structure prediction. Prediction of other substructures, such as α-helices, is simpler with lower computational time complexities. Brute force methods are the most common approach and dynamic programming is also utilized to generate all possible conformations. The current study introduces the Subset Sum Approach (SSA) for the direct search space generation method, which is shown to outperform the dynamic programming approach in terms of both time and space. For the first time, the present work has calculated both the state space cardinality of the dynamic programming approach and the search space cardinality of the general brute force approaches. In regard to a set of pruning rules, SSA has demonstrated higher efficiency with respect to both time and accuracy in comparison to state-of-the-art methods.  相似文献   

12.
13.
14.
One of the exciting problems in systems biology research is to decipher how genome controls the development of complex biological system. The gene regulatory networks (GRNs) help in the identification of regulatory interactions between genes and offer fruitful information related to functional role of individual gene in a cellular system. Discovering GRNs lead to a wide range of applications, including identification of disease related pathways providing novel tentative drug targets, helps to predict disease response, and also assists in diagnosing various diseases including cancer. Reconstruction of GRNs from available biological data is still an open problem. This paper proposes a recurrent neural network (RNN) based model of GRN, hybridized with generalized extended Kalman filter for weight update in backpropagation through time training algorithm. The RNN is a complex neural network that gives a better settlement between biological closeness and mathematical flexibility to model GRN; and is also able to capture complex, non-linear and dynamic relationships among variables. Gene expression data are inherently noisy and Kalman filter performs well for estimation problem even in noisy data. Hence, we applied non-linear version of Kalman filter, known as generalized extended Kalman filter, for weight update during RNN training. The developed model has been tested on four benchmark networks such as DNA SOS repair network, IRMA network, and two synthetic networks from DREAM Challenge. We performed a comparison of our results with other state-of-the-art techniques which shows superiority of our proposed model. Further, 5% Gaussian noise has been induced in the dataset and result of the proposed model shows negligible effect of noise on results, demonstrating the noise tolerance capability of the model.  相似文献   

15.
Identification of significant interactions between genes and chemical compounds/drugs is an important issue in toxicogenomic studies as well as in drug discovery and development. There are some online and offline computational tools for toxicogenomic data analysis to identify the biomarker genes and their regulatory chemical compounds/drugs. However, none of the researchers has considered yet the identification of significant interactions between genes and compounds. Therefore, in this paper, we have discussed two approaches namely moving range chart (MRC) and logistic moving range chart (LMRC) for the identification of significant up-regulatory (UpR) and down-regulatory (DnR) gene-compound interactions as well as toxicogenomic biomarkers and their regulatory chemical compounds/drugs. We have investigated the performance of both MRC and LMRC approaches using simulated datasets. Simulation results show that both approaches perform almost equally in absence of outliers. However, in presence of outliers, the LMRC shows much better performance than the MRC. In case of real life toxicogenomic data analysis, the proposed LMRC approach detected some important down-regulated biomarker genes those were not detected by other approaches. Therefore, in this paper, our proposal is to use LMRC for robust identification of significant interactions between genes and chemical compounds/drugs.  相似文献   

16.
Gene regulatory networks inference is currently a topic under heavy research in the systems biology field. In this paper, gene regulatory networks are inferred via evolutionary model based on time-series microarray data. A non-linear differential equation model is adopted. Gene expression programming (GEP) is applied to identify the structure of the model and least mean square (LMS) is used to optimize the parameters in ordinary differential equations (ODEs). The proposed work has been first verified by synthetic data with noise-free and noisy time-series data, respectively, and then its effectiveness is confirmed by three real time-series expression datasets. Finally, a gene regulatory network was constructed with 12 Yeast genes. Experimental results demonstrate that our model can improve the prediction accuracy of microarray time-series data effectively.  相似文献   

17.
Neural networks are rapidly gaining popularity in chemical modeling and Quantitative Structure–Activity Relationship (QSAR) thanks to their ability to handle multitask problems. However, outcomes of neural networks depend on the tuning of several hyperparameters, whose small variations can often strongly affect their performance. Hence, optimization is a fundamental step in training neural networks although, in many cases, it can be very expensive from a computational point of view. In this study, we compared four of the most widely used approaches for tuning hyperparameters, namely, grid search, random search, tree-structured Parzen estimator, and genetic algorithms on three multitask QSAR datasets. We mainly focused on parsimonious optimization and thus not only on the performance of neural networks, but also the computational time that was taken into account. Furthermore, since the optimization approaches do not directly provide information about the influence of hyperparameters, we applied experimental design strategies to determine their effects on the neural network performance. We found that genetic algorithms, tree-structured Parzen estimator, and random search require on average 0.08% of the hours required by grid search; in addition, tree-structured Parzen estimator and genetic algorithms provide better results than random search.  相似文献   

18.
19.
Identification of disease genes, using computational methods, is an important issue in biomedical and bioinformatics research. According to observations that diseases with the same or similar phenotype have the same biological characteristics, researchers have tried to identify genes by using machine learning tools. In recent attempts, some semi-supervised learning methods, called positive-unlabeled learning, is used for disease gene identification. In this paper, we present a Perceptron ensemble of graph-based positive-unlabeled learning (PEGPUL) on three types of biological attributes: gene ontologies, protein domains and protein-protein interaction networks. In our method, a reliable set of positive and negative genes are extracted using co-training schema. Then, the similarity graph of genes is built using metric learning by concentrating on multi-rank-walk method to perform inference from labeled genes. At last, a Perceptron ensemble is learned from three weighted classifiers: multilevel support vector machine, k-nearest neighbor and decision tree. The main contributions of this paper are: (i) incorporating the statistical properties of gene data through choosing proper metrics, (ii) statistical evaluation of biological features, and (iii) noise robustness characteristic of PEGPUL via using multilevel schema. In order to assess PEGPUL, we have applied it on 12950 disease genes with 949 positive genes from six class of diseases and 12001 unlabeled genes. Compared with some popular disease gene identification methods, the experimental results show that PEGPUL has reasonable performance.  相似文献   

20.
A reliable selection of a representative subset of chemical compounds has been reported to be crucial for numerous tasks in computational chemistry and chemoinformatics. We investigated the usability of an approach on the basis of the k‐medoid algorithm for this task and in particular for experimental design and the split between training and validation set. We therefore compared the performance of models derived from such a selection to that of models derived using several other approaches, such as space‐filling design and D‐optimal design. We validated the performance on four datasets with different endpoints, representing toxicity, physicochemical properties and others. Compared with the models derived from the compounds selected by the other examined approaches, those derived with the k‐medoid selection show a high reliability for experimental design, as their performance was constantly among the best for all examined datasets. Of all the models derived with all examined approaches, those derived with the k‐medoid approach were the only ones that showed a significantly improved performance compared with a random selection, for all datasets, the whole examined range of selected compounds and for each dimensionality of the search space. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号