首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
The noise problem of cancer sequencing data has been a problem that can’t be ignored. Utilizing considerable way to reduce noise of these cancer data is an important issue in the analysis of gene co-expression network. In this paper, we apply a sparse and low-rank method which is Robust Principal Component Analysis (RPCA) to solve the noise problem for integrated data of multi-cancers from The Cancer Genome Atlas (TCGA). And then we build the gene co-expression network based on the integrated data after noise reduction. Finally, we perform nodes and pathways mining on the denoising networks. Experiments in this paper show that after denoising by RPCA, the gene expression data tend to be orderly and neat than before, and the constructed networks contain more pathway enrichment information than unprocessed data. Moreover, learning from the betweenness centrality of the nodes in the network, we find some abnormally expressed genes and pathways proven that are associated with many cancers from the denoised network. The experimental results indicate that our method is reasonable and effective, and we also find some candidate suspicious genes that may be linked to multi-cancers.  相似文献   

2.
3.
4.
5.
6.
One of the exciting problems in systems biology research is to decipher how genome controls the development of complex biological system. The gene regulatory networks (GRNs) help in the identification of regulatory interactions between genes and offer fruitful information related to functional role of individual gene in a cellular system. Discovering GRNs lead to a wide range of applications, including identification of disease related pathways providing novel tentative drug targets, helps to predict disease response, and also assists in diagnosing various diseases including cancer. Reconstruction of GRNs from available biological data is still an open problem. This paper proposes a recurrent neural network (RNN) based model of GRN, hybridized with generalized extended Kalman filter for weight update in backpropagation through time training algorithm. The RNN is a complex neural network that gives a better settlement between biological closeness and mathematical flexibility to model GRN; and is also able to capture complex, non-linear and dynamic relationships among variables. Gene expression data are inherently noisy and Kalman filter performs well for estimation problem even in noisy data. Hence, we applied non-linear version of Kalman filter, known as generalized extended Kalman filter, for weight update during RNN training. The developed model has been tested on four benchmark networks such as DNA SOS repair network, IRMA network, and two synthetic networks from DREAM Challenge. We performed a comparison of our results with other state-of-the-art techniques which shows superiority of our proposed model. Further, 5% Gaussian noise has been induced in the dataset and result of the proposed model shows negligible effect of noise on results, demonstrating the noise tolerance capability of the model.  相似文献   

7.
Acquired resistance is a major obstacle to the therapeutic efficacy of gefitinib in non-small-cell lung cancer (NSCLC). Current knowledge about the role of long non-coding RNAs (lncRNAs) in this phenomenon is insufficient. In this study, we searched RNA sequencing data for lncRNAs associated with acquired resistance to gefitinib in NSCLC, and constructed a functional lncRNA-mRNA co-expression network and protein-protein interaction (PPI) network to analyze their putative target genes and biological functions. The expression levels of 14 outstanding dysregulated lncRNAs and mRNA were verified using real-time PCR. Changes in the expression levels of 39 lncRNAs and 121 mRNAs showed common patterns in our two pairs of gefitinib-sensitive and gefitinib-resistant NSCLC cell lines. The co-expression network included 1235 connections among these common differentially expressed lncRNAs and mRNAs. The significantly enriched signaling pathways based on dysregulated mRNAs were mainly involved in the Hippo signaling pathway; proteoglycans in cancer; and valine, leucine, and isoleucine biosynthesis. The results show that LncRNAs play an important part in acquired gefitinib resistance in NSCLC by regulating mRNA expression and function, and may represent potential new molecular biomarkers and therapeutic targets for gefitinib-resistant NSCLC.  相似文献   

8.
Ordinary differential equations (ODE) have been widely used for modeling and analysis of dynamic gene networks in systems biology. In this paper, we propose an optimization method that can infer a gene regulatory network from time-series gene expression data. Specifically, the following four cases are considered: (1) reconstruction of a gene network from synthetic gene expression data with noise, (2) reconstruction of a gene network from synthetic gene expression data with time-delay, (3) reconstruction of a gene network from synthetic gene expression data with noise and time-delay, and (4) reconstruction of a gene network from experimental time-series data in budding yeast cell cycle.  相似文献   

9.
10.
Gene networks (GNs) have become one of the most important approaches for modeling biological processes. They are very useful to understand the different complex biological processes that may occur in living organisms. Currently, one of the biggest challenge in any study related with GN is to assure the quality of these GNs. In this sense, recent works use artificial data sets or a direct comparison with prior biological knowledge. However, these approaches are not entirely accurate as they only take into account direct gene–gene interactions for validation, leaving aside the weak (indirect) relationships.We propose a new measure, named gene network coherence (GNC), to rate the coherence of an input network according to different biological databases. In this sense, the measure considers not only the direct gene–gene relationships but also the indirect ones to perform a complete and fairer evaluation of the input network. Hence, our approach is able to use the whole information stored in the networks. A GNC JAVA-based implementation is available at: http://fgomezvela.github.io/GNC/.The results achieved in this work show that GNC outperforms the classical approaches for assessing GNs by means of three different experiments using different biological databases and input networks. According to the results, we can conclude that the proposed measure, which considers the inherent information stored in the direct and indirect gene–gene relationships, offers a new robust solution to the problem of GNs biological validation.  相似文献   

11.
Differential gene expression analysis and proteomics have exerted significant impact on the elucidation of concerted cellular processes, as simultaneous measurement of hundreds to thousands of individual objects on the level of RNA and protein ensembles became technically feasible. The availability of such data sets has promised a profound understanding of phenomena on an aggregate level, expressed as the phenotypic response (observables) of cells, e.g., in the presence of drugs, or characterization of cells and tissue displaying distinct patho-physiological states. However, the step of transforming these data into context, i.e., linking distinct expression or abundance patterns with phenotypic observables - and furthermore enabling a sound biological interpretation on the level of reaction networks and concerted pathways, is still a major shortcoming. This finding is certainly based on the enormous complexity embedded in cellular reaction networks, but a variety of computational approaches have been developed over the last few years to overcome these issues. This review provides an overview on computational procedures for analysis of genomic and proteomic data introducing a sequential analysis workflow: Explorative statistics for deriving a first, from the purely statistical viewpoint, relevant candidate gene/protein list, followed by co-regulation and network analysis to biologically expand this core list toward functional networks and pathways. The review on these procedures is complemented by example applications tailored at identification of disease-associated proteins. Optimization of computational procedures involved, in conjunction with the continuous increase in additional biological data, clearly has the potential of boosting our understanding of processes on a cell-wide level.  相似文献   

12.
13.
14.
A Bayesian network (BN) is a knowledge representation formalism that has proven to be a promising tool for analyzing gene expression data. Several problems still restrict its successful applications. Typical gene expression databases contain measurements for thousands of genes and no more than several hundred samples, but most existing BNs learning algorithms do not scale more than a few hundred variables. Current methods result in poor quality BNs when applied in such high-dimensional datasets. We propose a hybrid constraint-based scored-searching method that is effective for learning gene networks from DNA microarray data. In the first phase of this method, a novel algorithm is used to generate a skeleton BN based on dependency analysis. Then the resulting BN structure is searched by a scoring metric combined with the knowledge learned from the first phase. Computational tests have shown that the proposed method achieves more accurate results than state-of-the-art methods. This method can also be scaled beyond datasets with several hundreds of variables.  相似文献   

15.
Computational intelligence (CI) is a well-established paradigm with current systems having many of the characteristics of biological computers and capable of performing a variety of tasks that are difficult to do using conventional techniques. It is a methodology involving adaptive mechanisms and/or an ability to learn that facilitate intelligent behavior in complex and changing environments, such that the system is perceived to possess one or more attributes of reason, such as generalization, discovery, association and abstraction. The objective of this article is to present to the CI and bioinformatics research communities some of the state-of-the-art in CI applications to bioinformatics and motivate research in new trend-setting directions. In this article, we present an overview of the CI techniques in bioinformatics. We will show how CI techniques including neural networks, restricted Boltzmann machine, deep belief network, fuzzy logic, rough sets, evolutionary algorithms (EA), genetic algorithms (GA), swarm intelligence, artificial immune systems and support vector machines, could be successfully employed to tackle various problems such as gene expression clustering and classification, protein sequence classification, gene selection, DNA fragment assembly, multiple sequence alignment, and protein function prediction and its structure. We discuss some representative methods to provide inspiring examples to illustrate how CI can be utilized to address these problems and how bioinformatics data can be characterized by CI. Challenges to be addressed and future directions of research are also presented and an extensive bibliography is included.  相似文献   

16.
Metastases are the main cause of death in advanced breast cancer (BC) patients. Although chemotherapy and hormone therapy are current treatment strategies, drug resistance is frequent and still not completely understood.In this study, a bioinformatics analysis was performed on BC patients to explore the molecular mechanisms associated with BC metastasis. Microarray gene expression profiles of metastatic and non metastatic BC patients were downloaded from Gene Expression Omnibus (GEO) dataset. Raw data were normalized and merged using the Combat tool. Pathways enriched with differently expressed genes were identified and a pathway co-expression network was generated using Pearson’s correlation. We identified from this network, which includes 17 pathways and 128 interactions, the pathways that most influence the network efficiency. Moreover, protein interaction network was investigated to identify hub genes of the pathway network. The prognostic role of the network was evaluated with a survival analysis using an independent dataset.In conclusion, the pathway co-expression network could contribute to understanding the mechanism and development of BC metastases.  相似文献   

17.
Grouped gene selection is the most important task for analyzing the microarray data of rat liver regeneration. Many existing gene selection methods cannot outstand the interactions among the selected genes. In the process of rat liver regeneration, one of the most important events involved in many biological processes is the proliferation of rat hepatocytes, so it can be used as a measure of the effectiveness of the method. Here we proposed an adaptive sparse group lasso to select genes in groups for rat hepatocyte proliferation. The weighted gene co-expression networks analysis was used to identify modules corresponding to gene pathways, based on which a strategy of dividing genes into groups was proposed. A strategy of adaptive gene selection was also presented by assessing the gene significance and introducing the adaptive lasso penalty. Moreover, an improved blockwise descent algorithm was proposed. Experimental results demonstrated that the proposed method can improve the classification accuracy, and select less number of significant genes which act jointly in groups and have direct or indirect effects on rat hepatocyte proliferation. The effectiveness of the method was verified by the method of rat hepatocyte proliferation.  相似文献   

18.
Gene expression profiling by microarray technology is usually difficult to interpret into a simpler pattern. One approach to resolve the complexity of gene expression profiles is the application of artificial neural networks (ANNs). A potential difficulty in this strategy, however, is that the non-linear nature of ANN makes it essentially a 'black-box' computation process. Addition of a fuzzy logic approach is useful because it can complement ANN by explicitly specifying membership function during computation. We employed a hybrid approach of neural network and fuzzy logic to further analyze a published microarray study of gene responses to eight bacteria in human macrophages. The original analysis by hierarchical clustering found common gene responses to all bacteria but did not address individual responses. Our method allowed exploration of the gene response of the host to individual bacterium. We implemented a two-layer, feed-forward neural network containing the principle of 'competitive learning' (i.e. 'winner-take-all'). The weights of the trained neural network were fed into a fuzzy logic inference system. A new measurement, called the impact rating (IR) was also introduced to explore the degree of importance of each gene. To assess the reliability of the IR value, a bootstrap re-sampling method was applied to the dataset and a confidence level for each IR was obtained. Our approach has successfully uncovered the unique features of host response to individual bacterium. Further, application of gene ontology (GO) annotation to the genes of high IR values in each response has suggested new biological pathways for individual host-pathogen interactions.  相似文献   

19.
High-throughput screening (HTS) campaigns in pharmaceutical companies have accumulated a large amount of data for several million compounds over a couple of hundred assays. Despite the general awareness that rich information is hidden inside the vast amount of data, little has been reported for a systematic data mining method that can reliably extract relevant knowledge of interest for chemists and biologists. We developed a data mining approach based on an algorithm called ontology-based pattern identification (OPI) and applied it to our in-house HTS database. We identified nearly 1500 scaffold families with statistically significant structure-HTS activity profile relationships. Among them, dozens of scaffolds were characterized as leading to artifactual results stemming from the screening technology employed, such as assay format and/or readout. Four types of compound scaffolds can be characterized based on this data mining effort: tumor cytotoxic, general toxic, potential reporter gene assay artifact, and target family specific. The OPI-based data mining approach can reliably identify compounds that are not only structurally similar but also share statistically significant biological activity profiles. Statistical tests such as Kruskal-Wallis test and analysis of variance (ANOVA) can then be applied to the discovered scaffolds for effective assignment of relevant biological information. The scaffolds identified by our HTS data mining efforts are an invaluable resource for designing SAR-robust diversity libraries, generating in silico biological annotations of compounds on a scaffold basis, and providing novel target family specific scaffolds for focused compound library design.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号