首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
Naturally inspired evolutionary algorithms prove effectiveness when used for solving feature selection and classification problems. Artificial Bee Colony (ABC) is a relatively new swarm intelligence method. In this paper, we propose a new hybrid gene selection method, namely Genetic Bee Colony (GBC) algorithm. The proposed algorithm combines the used of a Genetic Algorithm (GA) along with Artificial Bee Colony (ABC) algorithm. The goal is to integrate the advantages of both algorithms. The proposed algorithm is applied to a microarray gene expression profile in order to select the most predictive and informative genes for cancer classification. In order to test the accuracy performance of the proposed algorithm, extensive experiments were conducted. Three binary microarray datasets are use, which include: colon, leukemia, and lung. In addition, another three multi-class microarray datasets are used, which are: SRBCT, lymphoma, and leukemia. Results of the GBC algorithm are compared with our recently proposed technique: mRMR when combined with the Artificial Bee Colony algorithm (mRMR-ABC). We also compared the combination of mRMR with GA (mRMR-GA) and Particle Swarm Optimization (mRMR-PSO) algorithms. In addition, we compared the GBC algorithm with other related algorithms that have been recently published in the literature, using all benchmark datasets. The GBC algorithm shows superior performance as it achieved the highest classification accuracy along with the lowest average number of selected genes. This proves that the GBC algorithm is a promising approach for solving the gene selection problem in both binary and multi-class cancer classification.  相似文献   

2.
We propose that quantitative structure–activity relationship (QSAR) predictions should be explicitly represented as predictive (probability) distributions. If both predictions and experimental measurements are treated as probability distributions, the quality of a set of predictive distributions output by a model can be assessed with Kullback–Leibler (KL) divergence: a widely used information theoretic measure of the distance between two probability distributions. We have assessed a range of different machine learning algorithms and error estimation methods for producing predictive distributions with an analysis against three of AstraZeneca’s global DMPK datasets. Using the KL-divergence framework, we have identified a few combinations of algorithms that produce accurate and valid compound-specific predictive distributions. These methods use reliability indices to assign predictive distributions to the predictions output by QSAR models so that reliable predictions have tight distributions and vice versa. Finally we show how valid predictive distributions can be used to estimate the probability that a test compound has properties that hit single- or multi- objective target profiles.  相似文献   

3.
With the rapid development of DNA microarray technology and next-generation technology, a large number of genomic data were generated. So how to extract more differentially expressed genes from genomic data has become a matter of urgency. Because Low-Rank Representation (LRR) has the high performance in studying low-dimensional subspace structures, it has attracted a chunk of attention in recent years. However, it does not take into consideration the intrinsic geometric structures in data.In this paper, a new method named Laplacian regularized Low-Rank Representation (LLRR) has been proposed and applied on genomic data, which introduces graph regularization into LRR. By taking full advantages of the graph regularization, LLRR method can capture the intrinsic non-linear geometric information among the data. The LLRR method can decomposes the observation matrix of genomic data into a low rank matrix and a sparse matrix through solving an optimization problem. Because the significant genes can be considered as sparse signals, the differentially expressed genes are viewed as the sparse perturbation signals. Therefore, the differentially expressed genes can be selected according to the sparse matrix. Finally, we use the GO tool to analyze the selected genes and compare the P-values with other methods.The results on the simulation data and two real genomic data illustrate that this method outperforms some other methods: in differentially expressed gene selection.  相似文献   

4.
The widespread popularity of replica exchange and expanded ensemble algorithms for simulating complex molecular systems in chemistry and biophysics has generated much interest in discovering new ways to enhance the phase space mixing of these protocols in order to improve sampling of uncorrelated configurations. Here, we demonstrate how both of these classes of algorithms can be considered as special cases of Gibbs sampling within a Markov chain Monte Carlo framework. Gibbs sampling is a well-studied scheme in the field of statistical inference in which different random variables are alternately updated from conditional distributions. While the update of the conformational degrees of freedom by Metropolis Monte Carlo or molecular dynamics unavoidably generates correlated samples, we show how judicious updating of the thermodynamic state indices--corresponding to thermodynamic parameters such as temperature or alchemical coupling variables--can substantially increase mixing while still sampling from the desired distributions. We show how state update methods in common use can lead to suboptimal mixing, and present some simple, inexpensive alternatives that can increase mixing of the overall Markov chain, reducing simulation times necessary to obtain estimates of the desired precision. These improved schemes are demonstrated for several common applications, including an alchemical expanded ensemble simulation, parallel tempering, and multidimensional replica exchange umbrella sampling.  相似文献   

5.
Despite recent progress, our understanding of enzymes remains limited: the prediction of the changes that should be introduced to alter their properties or catalytic activities in an expected direction remains difficult. An alternative to rational design is selection of mutants endowed with the anticipated properties from a large collection of possible solutions generated by random mutagenesis. We describe here a new technique of in vitro selection of genes on the basis of the catalytic activity of the encoded enzymes. The gene coding for the enzyme to be engineered is cloned into the genome of a filamentous phage, whereas the enzyme itself is displayed on its surface, creating a phage enzyme. A bifunctional organic label containing a suicide inhibitor of the enzyme and a ligand with high affinity for an immobilized receptor are constructed. On incubation of a mixture of phage enzymes, those phages showing an activity on the inhibitor under the conditions of the experiment are labeled. These phages can be recovered by affinity chromatography. The design of the label and the factors controlling the selectivity of the selection are analyzed. The advantages of the technique and its scope in terms of the enzymes that can be engineered are discussed.  相似文献   

6.
Qi Shen  Wei-Min Shi  Bao-Xian Ye 《Talanta》2007,71(4):1679-1683
In the analysis of gene expression profiles, the number of tissue samples with genes expression levels available is usually small compared with the number of genes. This can lead either to possible overfitting or even to a complete failure in analysis of microarray data. The selection of genes that are really indicative of the tissue classification concerned is becoming one of the key steps in microarray studies. In the present paper, we have combined the modified discrete particle swarm optimization (PSO) and support vector machines (SVM) for tumor classification. The modified discrete PSO is applied to select genes, while SVM is used as the classifier or the evaluator. The proposed approach is used to the microarray data of 22 normal and 40 colon tumor tissues and showed good prediction performance. It has been demonstrated that the modified PSO is a useful tool for gene selection and mining high dimension data.  相似文献   

7.
A DNA microarray can track the expression levels of thousands of genes simultaneously. Previous research has demonstrated that this technology can be useful in the classification of cancers. Cancer microarray data normally contains a small number of samples which have a large number of gene expression levels as features. To select relevant genes involved in different types of cancer remains a challenge. In order to extract useful gene information from cancer microarray data and reduce dimensionality, feature selection algorithms were systematically investigated in this study. Using a correlation-based feature selector combined with machine learning algorithms such as decision trees, nave Bayes and support vector machines, we show that classification performance at least as good as published results can be obtained on acute leukemia and diffuse large B-cell lymphoma microarray data sets. We also demonstrate that a combined use of different classification and feature selection approaches makes it possible to select relevant genes with high confidence. This is also the first paper which discusses both computational and biological evidence for the involvement of zyxin in leukaemogenesis.  相似文献   

8.
Single nano-objects display strong fluctuations of their fluorescence signals. These random and irreproducible variations must be subject to statistical analysis to provide microscopic information. We review the main evaluation methods used so far by experimentalists in the field of single-molecule spectroscopy: time traces, correlation functions, distributions of "on" and "off" times, higher-order correlations. We compare their advantages and weaknesses from a theoretical point of view, illustrating our main conclusions with simple numerical simulations. We then review experiments on different types of single nano-objects, the phenomena which are observed and the statistical analyses applied to them.  相似文献   

9.
Producing good low‐dimensional representations of high‐dimensional data is a common and important task in many data mining applications. Two methods that have been particularly useful in this regard are multidimensional scaling and nonlinear mapping. These methods attempt to visualize a set of objects described by means of a dissimilarity or distance matrix on a low‐dimensional display plane in a way that preserves the proximities of the objects to whatever extent is possible. Unfortunately, most known algorithms are of quadratic order, and their use has been limited to relatively small data sets. We recently demonstrated that nonlinear maps derived from a small random sample of a large data set exhibit the same structure and characteristics as that of the entire collection, and that this structure can be easily extracted by a neural network, making possible the scaling of data set orders of magnitude larger than those accessible with conventional methodologies. Here, we present a variant of this algorithm based on local learning. The method employs a fuzzy clustering methodology to partition the data space into a set of Voronoi polyhedra, and uses a separate neural network to perform the nonlinear mapping within each cell. We find that this local approach offers a number of advantages, and produces maps that are virtually indistinguishable from those derived with conventional algorithms. These advantages are discussed using examples from the fields of combinatorial chemistry and optical character recognition. © 2001 John Wiley & Sons, Inc. J Comput Chem 22: 373–386, 2001  相似文献   

10.
11.
RNA-seq data are challenging existing omics data analytics for its volume and complexity. Although quite a few computational models were proposed from different standing points to conduct differential expression (D.E.) analysis, almost all these methods do not provide a rigorous feature selection for high-dimensional RNA-seq count data. Instead, most or even all genes are invited into differential calls no matter they have real contributions to data variations or not. Thus, it would inevitably affect the robustness of D.E. analysis and lead to the increase of false positive ratios.In this study, we presented a novel feature selection method: nonnegative singular value approximation (NSVA) to enhance RNA-seq differential expression analysis by taking advantage of RNA-seq count data's non-negativity. As a variance-based feature selection method, it selects genes according to its contribution to the first singular value direction of input data in a data-driven approach. It demonstrates robustness to depth bias and gene length bias in feature selection in comparison with its five peer methods. Combining with state-of-the-art RNA-seq differential expression analysis, it contributes to enhancing differential expression analysis by lowering false discovery rates caused by the biases. Furthermore, we demonstrated the effectiveness of the proposed feature selection by proposing a data-driven differential expression analysis: NSVA-seq, besides conducting network marker discovery.  相似文献   

12.
This study aimed at validating common bootstrap algorithms for reference interval calculation.We simulated 1500 random sets of 50–120 results originating from eight different statistical distributions. In total, 97.5 percentile reference limits were estimated from bootstrapping 5000 replicates, with confidence limits obtained by: (a) normal, (b) from standard error, (c) bootstrap percentile (as in RefVal) (d) BCa, (e) basic, or (f) student methods. Reference interval estimates obtained with ordinary bootstrapping and confidence intervals by percentile method were accurate for distributions close to normality and devoid of outliers, but not for log-normal distributions with outliers. Outlier removal and transformation to normality improved reference interval estimation, and the basic method was superior in such cases. In conclusions, if the neighborhood of the relevant percentile contains non-normally distributed results, bootstrapping fails. The distribution of bootstrap estimates should be plotted, and a non-normal distribution should warrant transformation or outlier removal. Electronic supplementary material  The online version of this article (doi:) contains supplementary material, which is available to authorized users.  相似文献   

13.
14.
Non-SELEX selection of aptamers   总被引:5,自引:0,他引:5  
Aptamers are typically selected from libraries of random DNA (or RNA) sequences by SELEX, which involves multiple rounds of alternating steps of partitioning and PCR amplification. Here we report, for the first time, non-SELEX selection of aptamers-a process that involves repetitive steps of partitioning with no amplification between them. A highly efficient affinity method, non-equilibrium capillary electrophoresis of equilibrium mixtures (NECEEM), was used for partitioning. We found that three steps of NECEEM-based partitioning in the non-SELEX approach were sufficient to improve the affinity of a DNA library to a target protein by more than 4 orders of magnitude. The resulting affinity was higher than that of the enriched library obtained in three rounds of NECEEM-based SELEX. Remarkably, NECEEM-based non-SELEX selection took only 1 h in contrast to several days or several weeks required for a typical SELEX procedure by conventional partitioning methods. In addition, NECEEM-based non-SELEX allowed us to accurately measure the abundance of aptamers in the library. Not only does this work introduce an extremely fast and economical method for aptamer selection, but it also suggests that aptamers may be much more abundant than they are thought to be. Finally, this work opens the opportunity for selection of drug candidates from libraries of small molecules, which cannot be PCR-amplified and thus are not approachable by SELEX.  相似文献   

15.
 Depth distributions of minor elements in systems in which diffusion takes place are discussed together with the methods of determinating these distributions by GD-OES depth profiling. The quantification method is illustrated using as example Si, Fe, Zn and Sn diffusion/segregation phenomena in a Ti-thin-film/Al-substrate system with an Al3Ti interface diffusion layer grown at elevated temperatures. Received: 15 October 1994 / Revised: 16 February 1995 / Accepted: 17 March 1995 Correspondence to: Z. Weiss  相似文献   

16.
We have implemented the serial replica exchange method (SREM) and simulated tempering (ST) enhanced sampling algorithms in a global distributed computing environment. Here we examine the helix-coil transition of a 21 residue alpha-helical peptide in explicit solvent. For ST, we demonstrate the efficacy of a new method for determining initial weights allowing the system to perform a random walk in temperature space based on short trial simulations. These weights are updated throughout the production simulation by an adaptive weighting method. We give a detailed comparison of SREM, ST, as well as standard MD and find that SREM and ST give equivalent results in reasonable agreement with experimental data. In addition, we find that both enhanced sampling methods are much more efficient than standard MD simulations. The melting temperature of the Fs peptide with the AMBER99phi potential was calculated to be about 310 K, which is in reasonable agreement with the experimental value of 334 K. We also discuss other temperature dependent properties of the helix-coil transition. Although ST has certain advantages over SREM, both SREM and ST are shown to be powerful methods via distributed computing and will be applied extensively in future studies of complex bimolecular systems.  相似文献   

17.
Over the past decade, we have witnessed a bloom in the field of evolutive protein engineering which is fueled by advances in molecular biology techniques and high-throughput screening technology. Directed protein evolution is a powerful algorithm using iterative cycles of random mutagenesis and screening for tailoring protein properties to our needs in industrial applications and for elucidating proteins' structure function relationships. This review summarizes, categorizes and discusses advantages and disadvantages of random mutagenesis methods used for generating genetic diversity. These random mutagenesis methods have been classified into four main categories depending on the method employed for nucleotide substitutions: enzyme based methods (Category I), synthetic chemistry based methods (Category II), whole cell methods (Category III) and combined methods (Category I-II, I-III and II-III). The basic principle of each method is discussed and varied mutagenic conditions are summarized in Tables and compared (benchmarked) to each other in terms of: mutational bias, controllable mutation frequency, ability to generate consecutive nucleotide substitutions and subset diversity, dependency on gene length, technical simplicity/robustness and cost-effectiveness. The latter comparison shows how highly-biased and limited current diversity creating methods are. Based on these limitations, strategies for generating diverse mutant libraries are proposed and discussed (RaMuS-Flowchart; KISS principle). We hope that this review provides, especially for researchers just entering the field of directed evolution, a guide for developing successful directed evolution strategies by selecting complementary methods for generating diverse mutant libraries.  相似文献   

18.
We apply a Bayesian parameter estimation technique to a chemical kinetic mechanism for n‐propylbenzene oxidation in a shock tube to propagate errors in experimental data to errors in Arrhenius parameters and predicted species concentrations. We find that, to apply the methodology successfully, conventional optimization is required as a preliminary step. This is carried out in two stages: First, a quasi‐random global search using a Sobol low‐discrepancy sequence is conducted, followed by a local optimization by means of a hybrid gradient‐descent/Newton iteration method. The concentrations of 37 species at a variety of temperatures, pressures, and equivalence ratios are optimized against a total of 2378 experimental observations. We then apply the Bayesian methodology to study the influence of uncertainties in the experimental measurements on some of the Arrhenius parameters in the model as well as some of the predicted species concentrations. Markov chain Monte Carlo algorithms are employed to sample from the posterior probability densities, making use of polynomial surrogates of higher order fitted to the model responses. We conclude that the methodology provides a useful tool for the analysis of distributions of model parameters and responses, in particular their uncertainties and correlations. Limitations of the method are discussed. For example, we find that using second‐order response surfaces and assuming normal distributions for propagated errors is largely adequate, but not always.  相似文献   

19.
Protein–protein interactions (PPIs) perform various functions and regulate processes throughout cells. Knowledge of the full network of PPIs is vital to biomedical research, but most of the PPIs are still unknown. As it is infeasible to discover all of them experimentally due to technical and resource limitations, computational prediction of PPIs is essential and accurately assessing the performance of algorithms is required before further application or translation. However, many published methods compose their evaluation datasets incorrectly, using a higher proportion of positive class data than occuring naturally, leading to exaggerated performance. We re-implemented various published algorithms and evaluated them on datasets with realistic data compositions and found that their performance is overstated in original publications; with several methods outperformed by our control models built on ‘illogical’ and random number features. We conclude that these methods are influenced by an over-characterization of some proteins in the literature and due to scale-free nature of PPI network and that they fail when tested on all possible protein pairs. Additionally, we found that sequence-only-based algorithms performed worse than those that employ functional and expression features. We present a benchmark evaluation of many published algorithms for PPI prediction. The source code of our implementations and the benchmark datasets created here are made available in open source.  相似文献   

20.
Many commercially available software programs claim similar efficiency and accuracy as variable selection tools. Genetic algorithms are commonly used variable selection methods where most relevant variables can be differentiated from ‘less important’ variables using evolutionary computing techniques. However, different vendors offer several algorithms, and the puzzling question is: which one is the appropriate method of choice? In this study, several genetic algorithm tools (e.g. GFA from Cerius2, QuaSAR-Evolution from MOE and Partek’s genetic algorithm) were compared. Stepwise multiple linear regression models were generated using the most relevant variables identified by the above genetic algorithms. This procedure led to the successful generation of Quantitative Structure–activity Relationship (QSAR) models for (a) proprietary datasets and (b) the Selwood dataset.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号