首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
High dimensional datasets contain up to thousands of features, and can result in immense computational costs for classification tasks. Therefore, these datasets need a feature selection step before the classification process. The main idea behind feature selection is to choose a useful subset of features to significantly improve the comprehensibility of a classifier and maximize the performance of a classification algorithm. In this paper, we propose a one-per-class model for high dimensional datasets. In the proposed method, we extract different feature subsets for each class in a dataset and apply the classification process on the multiple feature subsets. Finally, we merge the prediction results of the feature subsets and determine the final class label of an unknown instance data. The originality of the proposed model is to use appropriate feature subsets for each class. To show the usefulness of the proposed approach, we have developed an application method following the proposed model. From our results, we confirm that our method produces higher classification accuracy than previous novel feature selection and classification methods.  相似文献   

2.
In this article, we develop a formalism to obtain the energy levels of the electron in a central force potential confined in a spherical quantum dot with radius rC by the proper quantization rule and the Wentzel‐Kramers‐Brillouin approximation. It is shown that the numerical results are in good agreement with exact solutions. To illustrate this method, we consider the linear harmonic oscillator and Coulomb potential confined within an impenetrable sphere of radius rC in three dimensions. © 2013 Wiley Periodicals, Inc.  相似文献   

3.
High throughput analysis of differential gene expression is a powerful tool that can be applied to many areas in molecular cell biology, including differentiation, development, physiology, and pharmacology. In recent years, a variety of techniques have been developed to analyze differential gene expression, including comparative expressed sequence tag sequencing, differential display, representational difference analysis, cDNA or oligonucleotide arrays, and serial analysis of gene expression. This review explains the technologies, their scopes, impact on science, as well as their costs and possible limitations. The application of differential display is presented as a tool to identify genes induced by darkness or yellowing process in rice leaves.  相似文献   

4.
In the past several years, a new set of technologies based on whole genome analysis have revolutionized the study of gene expression. These microarray or "gene chip" technologies, which arose out of the development of large-scale sequencing approaches, are now coming into increasing use, generating a far greater volume of data than the data representing the sequences themselves. This review focuses on the current state of development of these technologies, and the available approaches to manage and analyze the information they generate. The applicability of this technology to general problems in biomedicine is also discussed.  相似文献   

5.
Gene regulatory networks inference is currently a topic under heavy research in the systems biology field. In this paper, gene regulatory networks are inferred via evolutionary model based on time-series microarray data. A non-linear differential equation model is adopted. Gene expression programming (GEP) is applied to identify the structure of the model and least mean square (LMS) is used to optimize the parameters in ordinary differential equations (ODEs). The proposed work has been first verified by synthetic data with noise-free and noisy time-series data, respectively, and then its effectiveness is confirmed by three real time-series expression datasets. Finally, a gene regulatory network was constructed with 12 Yeast genes. Experimental results demonstrate that our model can improve the prediction accuracy of microarray time-series data effectively.  相似文献   

6.
The large size of the hyperspectral datasets that are produced with modern mass spectrometric imaging techniques makes it difficult to analyze the results. Unsupervised statistical techniques are needed to extract relevant information from these datasets and reduce the data into a surveyable overview. Multivariate statistics are commonly used for this purpose. Computational power and computer memory limit the resolution at which the datasets can be analyzed with these techniques. We introduce the use of a data format capable of efficiently storing sparse datasets for multivariate analysis. This format is more memory-efficient and therefore it increases the possible resolution together with a decrease of computation time. Three multivariate techniques are compared for both sparse-type data and non-sparse data acquired in two different imaging ToF-SIMS experiments and one LDI-ToF imaging experiment. There is no significant qualitative difference in the use of different data formats for the same multivariate algorithms. All evaluated multivariate techniques could be applied on both SIMS and the LDI imaging datasets. Principal component analysis is shown to be the fastest choice; however a small increase of computation time using a VARIMAX optimization increases the decomposition quality significantly. PARAFAC analysis is shown to be very effective in separating different chemical components but the calculations take a significant amount of time, limiting its use as a routine technique. An effective visualization of the results of the multivariate analysis is as important for the analyst as the computational issues. For this reason, a new technique for visualization is presented, combining both spectral loadings and spatial scores into one three-dimensional view on the complete datacube.  相似文献   

7.
8.
9.
Genomes of many organisms have been sequenced over the last few years. However, transforming such raw sequence data into knowledge remains a hard task. A great number of prediction programs have been developed to address part of this problem: the location of genes along a genome and their expression. We propose a multi-objective methodology to combine state-of-the-art algorithms into an aggregation scheme in order to obtain optimal methods’ aggregations. The results obtained show a major improvement in sensitivity when our methodology is compared to the performance of individual methods for gene finding and gene expression problems. The methodology proposed here is an automatic method generator, and a step forward to exploit all already existing methods, by providing alternative optimal methods’ aggregations to answer concrete queries for a certain biological problem with a maximized accuracy of the prediction. As more approaches are integrated for each of the presented problems, de novo accuracy can be expected to improve further.  相似文献   

10.
A DNA fragment of 2,042 bp containing a novel β-mannanase gene, man5A, was identified from the genome of the mannan-degrading bacterium Bacillus circulans CGMCC1554. The open reading frame of man5A comprised 978 bp encoding a protein of 326 amino acids with a predicted molecular weight of 32 kDa. The amino acid sequence of the encoded mannanase, MAN5A, showed the highest identity (78.5%) to β-mannanases belonging to glycosyl hydrolases family 5. The gene man5A was efficiently expressed in Escherichia coli and Pichia pastoris with the highest activity of 541 U/ml in a 3-L fermenter. Recombinant MAN5A purified from E. coli had a high specific activity of 4,839 U/mg, which is much higher than that of enzymes that showed high sequence identity. The enzyme showed maximum activity at pH 7.6 and 60 °C and resistance to trypsin. After hydrolysis of LBG, oligomannosides accounted for 76% of the hydrolysis products. All these properties collectively make MAN5A a better candidate than current mannanases for use in the food and feed industry.  相似文献   

11.
There is growing interest in the application of machine learning techniques in bioinformatics. The supervised machine learning approach has been widely applied to bioinformatics and gained a lot of success in this research area. With this learning approach researchers first develop a large training set, which is a time-consuming and costly process. Moreover, the proportion of the positive examples and negative examples in the training set may not represent the real-world data distribution, which causes concept drift. Active learning avoids these problems. Unlike most conventional learning methods where the training set used to derive the model remains static, the classifier can actively choose the training data and the size of training set increases. We introduced an algorithm for performing active learning with support vector machine and applied the algorithm to gene expression profiles of colon cancer, lung cancer, and prostate cancer samples. We compared the classification performance of active learning with that of passive learning. The results showed that employing the active learning method can achieve high accuracy and significantly reduce the need for labeled training instances. For lung cancer classification, to achieve 96% of the total positives, only 31 labeled examples were needed in active learning whereas in passive learning 174 labeled examples were required. That meant over 82% reduction was realized by active learning. In active learning the areas under the receiver operating characteristic (ROC) curves were over 0.81, while in passive learning the areas under the ROC curves were below 0.50.  相似文献   

12.
13.
We searched for metastasis-related genes in adenoid cystic carcinoma by suppression subtractive hybridization analysis of high and low metastasis cell lines. Twelve genes (ten previously identified and two novel sequences) were identified as being expressed at lower levels in high metastasis cell line Acc-M when compared to low metastasis cell line Acc-2. The known sequences corresponded to the genes for cysteine-rich angiogenesis induction factor (cyr61), chromosome 7 RP11-52501 clone, G-protein, WAS familial ferritin I heavy chain, jumping translocation breakpoint, eukaryotic translation elongation, folate receptor and three ribosomal proteins. Among them, the G protein and ferritin I heavy chain genes contained mutations in the high metastasis cell line. The two novel gene sequences have been named ACC metastasis-associated RNH and ACC metastasis-associated suspected protein (GenBank # AF522024 and AF522025, respectively). Taken together, these results suggest that reduced expression and/or mutation of several genes in the tumor cell line Acc-M are associated with high tumor metastasis, providing important molecular biological materials for further study of metastasis control and possible targets for cancer gene therapy.  相似文献   

14.
As social network analysis is gaining popularity in modeling real world problems, the task of applying the social network model concepts and notions to biological data is still one of the most attractive research problems to be addressed. According, our work described in this paper focuses on a particular set of genes that reside on the community boundaries in gene co-expression networks. Stemmed from community mining problem in social networks, peripheries of communities (i.e., boundaries) can be used to aid certain biological analysis. The proposed method consists of three parts: 1) Finding communities of gene co-expression networks through clustering. 2) Analyzing stability of community structures by Monte Carlo method. 3) Designing of dynamic adoption of boundaries using geometric convexity. We validated our findings using breast cancer gene expression data from various studies. Our approach contributes to the new branch of applying social network mechanisms in biological data analysis, leading to new data mining strategies implied by witnessing social behaviors in gene expression analysis.  相似文献   

15.
It's wonderful to be here today, I would like to start with the most important part, by saying thank you. First of all, I want to thank Andy Fire for being such a tremendous colleague, friend, and collaborator going back over the years. Without Andy I definitely wouldn't be here today. I need to thank the University of Massachusetts for providing for my laboratory, for believing in me, and for giving me not only a place and money, but great colleagues with whom to pursue my research. Without UMass and the great environment provided for me there, I probably would not be here today. And, of course my family; I'm not going to spend time now thanking them individually, but they know how important they are.  相似文献   

16.
Many human diseases occur due to the over or under-expression of genes which can be corrected either by silencing or over-expression, respectively by transforming with specific nucleic acid (NA). NA transformation for medical purposes to alter the cellular gene expression is challenging because NA cannot cross efficiently the cellular biomembrane. One option, the viral vectors, is risky for patients and, the non-viral vectors have lower transformation efficiency. From the past few years, nanoparticles (NPs) are being studied extensively for their use as a vector to deliver NA. They are of a sub-micron size, have a large surface area, rapid absorption ability and can reach inside of the cells. These properties make them a suitable gene carrier. NPs types - organic, inorganic, organic/inorganic hybrid and polymeric NPs, having different properties that can be used to deliver the NA. They possess various properties like biocompatibility, targeted delivery of gene, controlled release of NA which makes them suitable for different uses. In this review, we are describing and comparing various methods to synthesize various kinds of NPs and how they can be conjugated with NA. A series of modifications in NPs to form the polyplex are also discussed along with the varying outcomes in terms of changes in the gene expression and its cytotoxicity towards different cell lines. This review is helpful for nano-scientists to decide which method to be followed for a specific need via controlling gene expression.  相似文献   

17.
18.
H-bonding mediated molecular recognition between substrate and ligand -COOH groups orients the substrate so that remote, catalyzed oxygenation of an alkyl C-H bond by a Mn-oxo active site can occur with very high (>98%) regio- and stereoselectivity. This paper identifies steric exclusion-exclusion of non H-bonded substrate molecules from the active site-as one requirement for high selectivity, along with the entropic advantage of intramolecularity. If unbound substrate molecules were able to reach the active site, they would react unselectively, degrading the observed selectivity. Both of the faces of the catalyst are blocked by two ligand molecules each with a -COOH group. The acid p-(t)BuC6H4COOH binds to the ligand -COOH recognition site but is not oxidized and merely blocks approach of the substrate therefore acting as an effective inhibitor for ibuprofen oxidation in both free acid and ibuprofen ester form. Dixon plots show that inhibition is competitive for the free acid ibuprofen substrate, no doubt because this substrate can compete with the inhibitor for binding to the recognition site. In contrast, inhibition is uncompetitive for the ibuprofen-ester substrate, consistent with this ester substrate no longer being able to bind to the recognition site. Inhibition can be reversed with MeCOOH, an acid that can competitively bind to the recognition site but, being sterically small, no longer blocks access to the active site.  相似文献   

19.
Class prediction based on DNA microarray data has been emerged as one of the most important application of bioinformatics for diagnostics/prognostics. Robust classifiers are needed that use most biologically relevant genes embedded in the data. A consensus approach that combines multiple classifiers has attributes that mitigate this difficulty compared to a single classifier. A new classification method named as consensus analysis of multiple classifiers using non-repetitive variables (CAMCUN) was proposed for the analysis of hyper-dimensional gene expression data. The CAMCUN method combined multiple classifiers, each of which was built from distinct, non-repeated genes that were selected for effectiveness in class differentiation. Thus, the CAMCUN utilized most biologically relevant genes in the final classifier. The CAMCUN algorithm was demonstrated to give consistently more accurate predictions for two well-known datasets for prostate cancer and leukemia. Importantly, the CAMCUN algorithm employed an integrated 10-fold cross-validation and randomization test to assess the degree of confidence of the predictions for unknown samples.  相似文献   

20.
This study was undertaken to test the hypothesis that structurally similar PAHs induce similar gene expression profiles. THP-1 cells were exposed to a series of 12 selected PAHs at 50 microM for 24 hours and gene expressions profiles were analyzed using both unsupervised and supervised methods. Clustering analysis of gene expression profiles revealed that the 12 tested chemicals were grouped into five clusters. Within each cluster, the gene expression profiles are more similar to each other than to the ones outside the cluster. One-methylanthracene and 1-methylfluorene were found to have the most similar profiles; dibenzothiophene and dibenzofuran were found to share common profiles with fluorine. As expression pattern comparisons were expanded, similarity in genomic fingerprint dropped off dramatically. Prediction analysis of microarrays (PAM) based on the clustering pattern generated 49 predictor genes that can be used for sample discrimination. Moreover, a significant analysis of Microarrays (SAM) identified 598 genes being modulated by tested chemicals with a variety of biological processes, such as cell cycle, metabolism, and protein binding and KEGG pathways being significantly (p < 0.05) affected. It is feasible to distinguish structurally different PAHs based on their genomic fingerprints, which are mechanism based.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号