首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
To explore the pathogenic mechanisms of MicroRNA (miRNA) on diverse diseases, many researchers have concentrated on discovering the potential associations between miRNA and disease using machine learning methods. However, the prediction accuracy of supervised machine learning methods is limited by lacking of experimentally-validated uncorrelated miRNA-disease pairs. Without these negative samples, training a highly accurate model is much more difficult. Different from traditional miRNA-disease prediction models using randomly selected unknown samples as negative training samples, we propose an ensemble learning framework to solve this positive-unlabeled (PU) learning problem. The framework incorporates two steps, i.e., a novel semi-supervised Kmeans (SS-Kmeans) to extract reliable negative samples from unknown miRNA-disease pairs and subagging method to generate diverse training sample sets to make full use of those reliable negative samples for ensemble learning. Combined with effective random vector functional link (RVFL) network as prediction model, the proposed framework showed superior prediction accuracy comparing with other popular approaches. A case study on lung and gastric neoplasms further confirms the framework’s efficacy at identifying miRNA disease associations.  相似文献   

2.
As a large group of small non-coding RNAs (ncRNAs), Piwi-interacting RNAs (piRNAs) have been detected to be associated with various diseases. Identifying disease associated piRNAs can provide promising candidate molecular targets to promote the drug design. Although, a few computational ensemble methods have been developed for identifying piRNA-disease associations, the low-quality negative associations even with positive associations used during the training process prevent the predictive performance improvement. In this study, we proposed a new computational predictor named iPiDA-sHN to predict potential piRNA-disease associations. iPiDA-sHN presented the piRNA-disease pairs by incorporating piRNA sequence information, the known piRNA-disease association network, and the disease semantic graph. High-level features of piRNA-disease associations were extracted by the Convolutional Neural Network (CNN). Two-step positive-unlabeled learning strategy based on Support Vector Machine (SVM) was employed to select the high quality negative samples from the unknown piRNA-disease pairs. Finally, the SVM predictor trained with the known piRNA-disease associations and the high quality negative associations was used to predict new piRNA-disease associations. The experimental results showed that iPiDA-sHN achieved superior predictive ability compared with other state-of-the-art predictors.  相似文献   

3.
It has been shown that the generalized F-statistics can give satisfactory performances in identifying differentially expressed genes with microarray data. However, for some complex diseases, it is still possible to identify a high proportion of false positives because of the modest differential expressions of disease related genes and the systematic noises of microarrays. The main purpose of this study is to develop statistical methods for Affymetrix microarray gene expression data so that the impact on false positives from non-expressed genes can be reduced. I proposed two novel generalized F-statistics for identifying differentially expressed genes and a novel approach for estimating adjusting factors. The proposed statistical methods systematically combine filtering of non-expressed genes and identification of differentially expressed genes. For comparison, the discussed statistical methods were applied to an experimental data set for a type 2 diabetes study. In both two- and three-sample analyses, the proposed statistics showed improvement on the control of false positives.  相似文献   

4.
Mining patterns of co-expressed genes across the subset of conditions help to narrow down the search space for the analysis of gene expression data. Identifying conditions specific key genes from the large-scale gene expression data is a challenging task. The conditions specific key gene signifies functional behavior of a group of co-expressed genes across the subset of conditions and can be act as biomarkers of the diseases. In this paper, we have propose a novel approach for identification of conditions specific key genes from Basal-Like Breast Cancer (BLBC) disease using biclustering algorithm and Gene Co-expression Network (GCN). The proposed approach is a two-stage approach. In the first stage, significant biclusters have been extracted with the help of ‘runibic’ biclustering algorithm. The second stage identifies conditions specific key genes from the extracted significant biclusters with the help of GCN. By using difference matrix and gene correlation matrix, we have constructed biologically meaningful and statistically strong GCN. Also, presented the proposed approach with the help of a process diagram and demonstrated the procedure with an example of bicluster number 93 (Bic93). From the experimental results, we observed that 95% and 85% of the extracted biclusters are found to be biologically significant at the p-values less than 0.05 and 0.01 respectively. We have compared proposed approach with the Weighted Gene Co-expression Network Analysis (WGCNA) based approach. From the comparison, our approach has performed effectively and extracted biologically significant biclusters. Also, identified conditions specific key genes which cannot be extracted using the WGCNA based approach. Some of the important identified known key genes are PIK3CA, SHC3, ERBB2, SHC4, PTOV1, STAG1, ZNF215 etc. These key genes can be used as a diagnostic and prognostic biomarker for the BLBC disease after the rigorous analysis. The identified conditions specific key genes can be helpful to reduce the analysis time and increase the accuracy of further research such as biomarker identification, drug target discovery etc.  相似文献   

5.
In the past, the identification and isolation of phenotype-associated genes was a difficult and time-consuming task. However, recent improvements of methods that are designed to isolate differentially expressed genes have remarkably speeded up the process of target gene isolation. The ultimate goal of functional genomics is to apply these technologies to clone phenotype-associated genes irrespective of the availability of probes (e.g., antibodies) and an intimate knowledge of biological background. We demonstrate the use of a novel subtractive cDNA cloning approach for the isolation and characterization of target genes of the Epstein-Barr virus nuclear antigen 2 (EBNA2). Two different subtractive cDNA libraries specific for two different time periods following activation of a conditional estrogen receptor/EBNA2 (ER/EBNA2) fusion protein were generated. Comparison of the two libraries by cross-hybridization experiments allowed the differentiation between direct and indirect target genes of EBNA2 and led to the identification of a novel direct target gene of EBNA2.  相似文献   

6.
7.
We developed a statistical method that allows each trinucleotide to be associated with a unique frame among the three possible ones in a (protein coding) gene. An extensive gene study in 175 complete bacterial genomes based on this statistical approach resulted in identification of 72 new circular codes. Finding a circular code enables an immediate retrieval of the reading frame locally anywhere in a gene. No knowledge of location of the start codon is required and a short window of only a few nucleotides is sufficient for automatic retrieval. We have therefore developed a factorization method (that explores previously found circular codes) for retrieving the reading frames of bacterial genes. Its principle is new and easy to understand. Neither complex treatment nor specific information on the nucleotide sequences is necessary. Moreover, the method can be used for short regions in nucleotide sequences (less than 25 nucleotides in protein coding genes). Selected additional properties of circular codes and their possible biological consequences are also discussed.  相似文献   

8.
9.
It has recently been shown that cancer genes (oncogenes) tend to have heterogeneous expressions across disease samples. So it is reasonable to assume that in a microarray data only a subset of disease samples will be activated (often referred to as outliers), which presents some new challenges for statistical analysis. In this paper, we study the multi-class cancer outlier differential gene expression detection. Statistical methods will be proposed to take into account the expression heterogeneity. Through simulation studies and application to public microarray data, we will show that the proposed methods could provide more comprehensive analysis results and improve upon the traditional differential gene expression detection methods, which often ignore the expression heterogeneity and may loss power. Supplementary information can be found at http://www.biostat.umn.edu/~baolin/research/orf.html.  相似文献   

10.
Since antiquity, humans have used body fluids like saliva, urine and sweat for the diagnosis of diseases. The amount, color and smell of body fluids are still used in many traditional medical practices to evaluate an illness and make a diagnosis. The development and application of analytical methods for the detailed analysis of body fluids has led to the discovery of numerous disease biomarkers. Recently, mass spectrometry (MS), nuclear magnetic resonance spectroscopy (NMR), and multivariate statistical techniques have been incorporated into a multidisciplinary approach to profile changes in small molecules associated with the onset and progression of human diseases. The goal of these efforts is to identify metabolites that are uniquely correlated with a specific human disease in order to accurately diagnose and treat the malady. In this review we will discuss recent developments in sample preparation, experimental techniques, the identification and quantification of metabolites, and the chemometric tools used to search for biomarkers of human diseases using NMR.  相似文献   

11.
BackgroundFinding candidate genes associated with a disease is an important issue in biomedical research. Recently, many network-based methods have been proposed that implicitly utilize the modularity principle, which states that genes causing the same or similar diseases tend to form physical or functional modules in gene/protein relationship networks. Of these methods, the random walk with restart (RWR) algorithm is considered to be a state-of-the-art approach, but the modularity principle has not been fully considered in traditional RWR approaches. Therefore, we propose a novel method called ORIENT (neighbor-favoring weight reinforcement) to improve the performance of RWR through proper intensification of the weights of interactions close to the known disease genes.ResultsThrough extensive simulations over hundreds of diseases, we observed that our approach performs better than the traditional RWR algorithm. In particular, our method worked best when the weights of interactions involving only the nearest neighbor genes of the disease genes were intensified. Interestingly, the performance of our approach was negatively related to the probability with which the random walk will restart, whereas the performance of RWR without the weight-reinforcement was positively related in dense gene/protein relationship networks. We further found that the density of the disease gene-projected sub-graph and the number of paths between the disease genes in a gene/protein relationship network may be explanatory variables for the RWR performance. Finally, a comparison with other well-known gene prioritization tools including Endeavour, ToppGene, and BioGraph, revealed that our approach shows significantly better performance.ConclusionTaken together, these findings provide insight to efficiently guide RWR in disease gene prioritization.  相似文献   

12.
Rapid detection and discrimination of dangerous biological materials such as bacteria and their spores has become a security aim of considerable importance. Various analytical methods, including FTIR spectroscopy combined with statistical analysis have been used to identify vegetative bacteria, bacterial spores and background interferants. The present work discusses the application of FTIR technique performed in reflectance mode using Horizontal Attenuated Total Reflectance accessory (HATR) to the discrimination of biological materials. In comparison with transmission technique the HATR is more rapid and do not require the sample destruction, simultaneously giving similar absorbance bands. HATR-FTIR results combined with statistical analysis PCA and HCA demonstrate that this combination provides novel and accurate microbial identification technique.  相似文献   

13.
Powell JJ 《The Analyst》2002,127(6):842-846
Aluminosilicates are a group of ubiquitous environmental particles that, in some cases, have been implicated in human disease. Characterisation of aluminosilicates in tissue samples requires, first, their in situ identification and, secondly, analysis of their aluminium and silicon content or, at least, aluminium:silicon ratio. Here, histochemical staining, microscopy and X-ray microanalysis were investigated as potential methods for the detection of aluminosilicates in biological samples. In contrast to aluminium phosphate or hydroxide, aluminosilicates were refractory to histochemical staining for aluminium. However, using electron microscopy, back scattered electrons allowed identification of aluminosilicates in tissue-like (gelatine) sections. X-ray microanalysis, using conventional peak:background ratios, did not provide a sufficiently accurate assessment of the aluminium content of various standard aluminosilicates to allow their identification. However, the similar spectral energies of aluminium and silicon allowed spectral peak heights to be directly compared and, using simple standards, aluminium:silicon ratios were found for a range of reference particles. Application of this technique should allow the localisation and identification of aluminosilicates in biological samples.  相似文献   

14.
Gene Ontology (GO) provides GO annotations (GOA) that associate gene products with GO terms that summarize their cellular, molecular and functional aspects in the context of biological pathways. GO Consortium (GOC) resorts to various quality assurances to ensure the correctness of annotations. Due to resources limitations, only a small portion of annotations are manually added/checked by GO curators, and a large portion of available annotations are computationally inferred. While computationally inferred annotations provide greater coverage of known genes, they may also introduce annotation errors (noise) that could mislead the interpretation of the gene functions and their roles in cellular and biological processes. In this paper, we investigate how to identify noisy annotations, a rarely addressed problem, and propose a novel approach called NoisyGOA. NoisyGOA first measures taxonomic similarity between ontological terms using the GO hierarchy and semantic similarity between genes. Next, it leverages the taxonomic similarity and semantic similarity to predict noisy annotations. We compare NoisyGOA with other alternative methods on identifying noisy annotations under different simulated cases of noisy annotations, and on archived GO annotations. NoisyGOA achieved higher accuracy than other alternative methods in comparison. These results demonstrated both taxonomic similarity and semantic similarity contribute to the identification of noisy annotations. Our study shows that annotation errors are predictable and removing noisy annotations improves the performance of gene function prediction. This study can prompt the community to study methods for removing inaccurate annotations, a critical step for annotating gene and pathway functions.  相似文献   

15.
Protein - Protein Interaction Network (PPIN) analysis unveils molecular level mechanisms involved in disease condition. To explore the complex regulatory mechanisms behind epilepsy and to address the clinical and biological issues of epilepsy, in silico techniques are feasible in a cost- effective manner. In this work, a hierarchical procedure to identify influential genes and regulatory pathways in epilepsy prognosis is proposed. To obtain key genes and pathways causing epilepsy, integration of two benchmarked datasets which are exclusively devoted for complex disorders is done as an initial step. Using STRING database, PPIN is constructed for modelling protein-protein interactions. Further, key interactions are obtained from the established PPIN using network centrality measures followed by network propagation algorithm -Random Walk with Restart (RWR). The outcome of the method reveals some influential genes behind epilepsy prognosis, along with their associated pathways like PI3 kinase, VEGF signaling, Ras, Wnt signaling etc. In comparison with similar works, our results have shown improvement in identifying unique molecular functions, biological processes, gene co-occurrences etc. Also, CORUM provides an annotation for approximately 60% of similarity in human protein complexes with the obtained result. We believe that the formulated strategy can put-up the vast consideration of indigenous drugs towards meticulous identification of genes encoded by protein against several combinatorial disorders.  相似文献   

16.
17.
王立志  刘路宽  刘晶 《化学通报》2021,84(10):1023-1030
外泌体是所有真核细胞分泌到细胞外的直径介于30~150 nm的一种膜性纳米囊泡,参与细胞间生物信号的传递。大量实验证据表明,外泌体参与多种生物功能并发挥重要作用,包括蛋白质、RNA和脂质等生物分子的转移及多种疾病生理和病理过程的调节,被认为是疾病诊断、治疗和预后的重要的生物标志物和药物载体,因此发展简单、高效、经济的外泌体分离与纯化技术将有助于疾病的早期诊断和精准治疗。目前,利用外泌体的物理化学和生物化学特性已开发出多种分离外泌体的技术,但仍缺乏标准化和规模化临床级外泌体的分离方法,从而限制了其临床应用。另外,对分离出的外泌体的特征、纯度和数量的鉴定是判断外泌体分离纯化方法优劣的重要指标。本文综述了外泌体分离与纯化技术以及鉴定方法的研究进展,主要讨论分离技术的机制、性能、挑战和前景以及外泌体的鉴定方法,以期为外泌体的分离纯化提供新的思路和解决策略。  相似文献   

18.
Mass spectrometric approaches have recently gained increasing access to molecular immunology and several methods have been developed that enable detailed chemical structure identification of antigen-antibody interactions. Selective proteolytic digestion and MS-peptide mapping (epitope excision) has been successfully employed for epitope identification of protein antigens. In addition, "affinity proteomics" using partial epitope excision has been developed as an approach with unprecedented selectivity for direct protein identification from biological material. The potential of these methods is illustrated by the elucidation of a beta-amyloid plaque-specific epitope recognized by therapeutic antibodies from transgenic mouse models of Alzheimer's disease. Using an immobilized antigen and antibody-proteolytic digestion and analysis by high resolution Fourier transform ion cyclotron resonance mass spectrometry has lead to a new approach for the identification of antibody paratope structures (paratope-excision; "parex-prot"). In this method, high resolution MS-peptide data at the low ppm level are required for direct identification of paratopes using protein databases. Mass spectrometric epitope mapping and determination of "molecular antibody-recognition signatures" offer high potential, especially for the development of new molecular diagnostics and the evaluation of new vaccine lead structures.  相似文献   

19.
The identification of disease-relevant genes represents a challenge in microarray-based disease diagnosis where the sample size is often limited. Among established methods, reversible jump Markov Chain Monte Carlo (RJMCMC) methods have proven to be quite promising for variable selection. However, the design and application of an RJMCMC algorithm requires, for example, special criteria for prior distributions. Also, the simulation from joint posterior distributions of models is computationally extensive, and may even be mathematically intractable. These disadvantages may limit the applications of RJMCMC algorithms. Therefore, the development of algorithms that possess the advantages of RJMCMC methods and are also efficient and easy to follow for selecting disease-associated genes is required. Here we report a RJMCMC-like method, called random frog that possesses the advantages of RJMCMC methods and is much easier to implement. Using the colon and the estrogen gene expression datasets, we show that random frog is effective in identifying discriminating genes. The top 2 ranked genes for colon and estrogen are Z50753, U00968, and Y10871_at, Z22536_at, respectively. (The source codes with GNU General Public License Version 2.0 are freely available to non-commercial users at: http://code.google.com/p/randomfrog/.)  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号