首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
2.
Gene expression data are characterized by thousands even tens of thousands of measured genes on only a few tissue samples. This can lead either to possible overfitting and dimensional curse or even to a complete failure in analysis of microarray data. Gene selection is an important component for gene expression-based tumor classification systems. In this paper, we develop a hybrid particle swarm optimization (PSO) and tabu search (HPSOTS) approach for gene selection for tumor classification. The incorporation of tabu search (TS) as a local improvement procedure enables the algorithm HPSOTS to overleap local optima and show satisfactory performance. The proposed approach is applied to three different microarray data sets. Moreover, we compare the performance of HPSOTS on these datasets to that of stepwise selection, the pure TS and PSO algorithm. It has been demonstrated that the HPSOTS is a useful tool for gene selection and mining high dimension data.  相似文献   

3.
Li-Juan Tang  Hai-Long Wu 《Talanta》2009,79(2):260-1694
One problem with discriminant analysis of microarray data is representation of each sample by a large number of genes that are possibly irrelevant, insignificant or redundant. Methods of variable selection are, therefore, of great significance in microarray data analysis. To circumvent the problem, a new gene mining approach is proposed based on the similarity between probability density functions on each gene for the class of interest with respect to the others. This method allows the ascertainment of significant genes that are informative for discriminating each individual class rather than maximizing the separability of all classes. Then one can select genes containing important information about the particular subtypes of diseases. Based on the mined significant genes for individual classes, a support vector machine with local kernel transform is constructed for the classification of different diseases. The combination of the gene mining approach with support vector machine is demonstrated for cancer classification using two public data sets. The results reveal that significant genes are identified for each cancer, and the classification model shows satisfactory performance in training and prediction for both data sets.  相似文献   

4.
Protein methylation is involved in dozens of biological processes and plays an important role in adjusting protein physicochemical properties, conformation and function. However, with the rapid increase of protein sequence entering into databanks, the gap between the number of known sequence and the number of known methylation annotation is widening rapidly. Therefore, it is vitally significant to develop a computational method for quick and accurate identification of methylation sites. In this study, a novel predictor (Methy_SVMIACO) based on support vector machine (SVM) and improved ant colony optimization algorithm (IACO) is developed to identify methylation sites. The IACO is utilized to find the optimal feature subset and parameter of SVM, while SVM is employed to perform the identification of methylation sites. Comparison of the IACO with conventional ACO shows that the IACO converges quickly toward the global optimal solution and it is more useful tool for feature selection and SVM parameter optimization. The performance of Methy_SVMIACO is evaluated with a sensitivity of 85.71%, a specificity of 86.67%, an accuracy of 86.19% and a Matthew's correlation coefficient (MCC) of 0.7238 for lysine as well as a sensitivity of 89.08%, a specificity of 94.07%, an accuracy of 91.56% and a MCC of 0.8323 for arginine in 10-fold cross-validation test. It is shown through the analysis of the optimal feature subset that some upstream and downstream residues play important role in the methylation of arginine and lysine. Compared with other existing methods, the Methy_SVMIACO provides higher Acc, Sen and Spe, indicating that the current method may serve as a powerful complementary tool to other existing approaches in this area. The Methy_SVMIACO can be acquired freely on request from the authors.  相似文献   

5.
Multivariate spectral analysis has been widely applied in chemistry and other fields. Spectral data consisting of measurements at hundreds and even thousands of analytical channels can now be obtained in a few seconds. It is widely accepted that before a multivariate regression model is built, a well-performed variable selection can be helpful to improve the predictive ability of the model. In this paper, the concept of traditional wavelength variable selection has been extended and the idea of variable weighting is incorporated into least-squares support vector machine (LS-SVM). A recently proposed global optimization method, particle swarm optimization (PSO) algorithm is used to search for the weights of variables and the hyper-parameters involved in LS-SVM optimizing the training of a calibration set and the prediction of an independent validation set. All the computation process of this method is automatic. Two real data sets are investigated and the results are compared those of PLS, uninformative variable elimination-PLS (UVE-PLS) and LS-SVM models to demonstrate the advantages of the proposed method.  相似文献   

6.
Improved binary PSO for feature selection using gene expression data   总被引:2,自引:0,他引:2  
Gene expression profiles, which represent the state of a cell at a molecular level, have great potential as a medical diagnosis tool. Compared to the number of genes involved, available training data sets generally have a fairly small sample size in cancer type classification. These training data limitations constitute a challenge to certain classification methodologies. A reliable selection method for genes relevant for sample classification is needed in order to speed up the processing rate, decrease the predictive error rate, and to avoid incomprehensibility due to the large number of genes investigated. Improved binary particle swarm optimization (IBPSO) is used in this study to implement feature selection, and the K-nearest neighbor (K-NN) method serves as an evaluator of the IBPSO for gene expression data classification problems. Experimental results show that this method effectively simplifies feature selection and reduces the total number of features needed. The classification accuracy obtained by the proposed method has the highest classification accuracy in nine of the 11 gene expression data test problems, and is comparative to the classification accuracy of the two other test problems, as compared to the best results previously published.  相似文献   

7.
8.
Parameter estimation in models plays a significant role in developing mathematical models which are used for understanding and analyzing physical, chemical and biological systems. Parameter estimation problems for vapor-liquid equilibrium (VLE) data modeling are very challenging due to the high non-linearity of thermodynamic models. Recently, stochastic global optimizations and their applications to these problems have attracted greater interest. Of the many stochastic global optimization methods, Bare-bones particle swarm optimization (BBPSO) is attractive since it has no parameters to be tuned by the user. In this study, modifications are introduced to the original BBPSO using mutation and crossover operators of differential evolution algorithm to update certain particles in the population. The performance of the resulting algorithm is tested on 10 benchmark functions and applied to 16 VLE problems. The performance of the proposed BBPSO for VLE modeling is compared with that of other stochastic algorithms, namely, differential evolution (DE), DE with tabu list, genetic algorithm and simulated annealing in order to identify their relative strengths for VLE data modeling. The proposed BBPSO is shown to be better than or comparable to the other stochastic global optimization algorithms tested for parameter estimation in modeling VLE data.  相似文献   

9.
10.
Particle swarm optimization is a novel evolutionary stochastic global optimization method that has gained popularity in the chemical engineering community. This optimization strategy has been successfully used for several applications including thermodynamic calculations. To the best of our knowledge, the performance of PSO in phase stability and equilibrium calculations for both multicomponent reactive and non-reactive mixtures has not yet been reported. This study introduces the application of particle swarm optimization and several of its variants for solving phase stability and equilibrium problems in multicomponent systems with or without chemical equilibrium. The reliability and efficiency of a number of particle swarm optimization algorithms are tested and compared using multicomponent systems with vapor–liquid and liquid–liquid equilibrium. Our results indicate that the classical particle swarm optimization with constant cognitive and social parameters is a reliable method and offers the best performance for global minimization of the tangent plane distance function and the Gibbs energy function in both reactive and non-reactive systems.  相似文献   

11.
In this paper, the support vector machine was trained to grasp the relationship between the pair-coupled amino acid composition and the content of protein secondary structural elements, including -helix, 310-helix, π-helix, β-strand, β-bridge, turn, bend and the rest random coil. Self-consistency and cross validation tests were made to assess the performance of our method. Results superior to or competitive with the popular theoretical and experimental methods have been obtained.  相似文献   

12.
Experimental design methodology was used to optimize the linear gradient elution chromatography. The effect of initial mobile phase composition (φin), initial isocratic time (tin), and gradient time (tG) on the retention times of phenyl thiohydantoin aminoacids (PTH‐amino acids) was investigated. The experiments were performed according to Box–Behnken experimental design to map the chromatographic response surface. Then multiple linear regression and support vector machine (SVM) methods were used to fit the retention times of solutes. Results shows the SVM models have better ability to predict retention time and used for grid search in factor space. The SVM models were verified, as good agreement was observed between the predicted and experimental values of retention time in the optimal condition.  相似文献   

13.
Naturally inspired evolutionary algorithms prove effectiveness when used for solving feature selection and classification problems. Artificial Bee Colony (ABC) is a relatively new swarm intelligence method. In this paper, we propose a new hybrid gene selection method, namely Genetic Bee Colony (GBC) algorithm. The proposed algorithm combines the used of a Genetic Algorithm (GA) along with Artificial Bee Colony (ABC) algorithm. The goal is to integrate the advantages of both algorithms. The proposed algorithm is applied to a microarray gene expression profile in order to select the most predictive and informative genes for cancer classification. In order to test the accuracy performance of the proposed algorithm, extensive experiments were conducted. Three binary microarray datasets are use, which include: colon, leukemia, and lung. In addition, another three multi-class microarray datasets are used, which are: SRBCT, lymphoma, and leukemia. Results of the GBC algorithm are compared with our recently proposed technique: mRMR when combined with the Artificial Bee Colony algorithm (mRMR-ABC). We also compared the combination of mRMR with GA (mRMR-GA) and Particle Swarm Optimization (mRMR-PSO) algorithms. In addition, we compared the GBC algorithm with other related algorithms that have been recently published in the literature, using all benchmark datasets. The GBC algorithm shows superior performance as it achieved the highest classification accuracy along with the lowest average number of selected genes. This proves that the GBC algorithm is a promising approach for solving the gene selection problem in both binary and multi-class cancer classification.  相似文献   

14.
15.
In this paper, we proposed a wavelength selection method based on random decision particle swarm optimization with attractor for near‐infrared (NIR) spectra quantitative analysis. The proposed method was incorporated with partial least square (PLS) to construct a prediction model. The proposed method chooses the current own optimal or the current global optimal to calculate the attractor. Then the particle updates its flight velocity by the attractor, and the particle state is updated by the random decision with the new velocity. Moreover, the root‐mean‐square error of cross‐validation is adopted as the fitness function for the proposed method. In order to demonstrate the usefulness of the proposed method, PLS with all wavelengths, uninformative variable elimination by PLS, elastic net, genetic algorithm combined with PLS, the discrete particle swarm optimization combined with PLS, the modified particle swarm optimization combined with PLS, the neighboring particle swarm optimization combined with PLS, and the proposed method are used for building the components quantitative analysis models of NIR spectral datasets, and the effectiveness of these models is compared. Two application studies are presented, which involve NIR data obtained from an experiment of meat content determination using NIR and a combustion procedure. Results verify that the proposed method has higher predictive ability for NIR spectral data and the number of selected wavelengths is less. The proposed method has faster convergence speed and could overcome the premature convergence problem. Furthermore, although improving the prediction precision may sacrifice the model complexity under a certain extent, the proposed method is overfitted slightly. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

16.
17.
18.
19.
20.
DNA microarray data has been widely used in cancer research due to the significant advantage helped to successfully distinguish between tumor classes. However, typical gene expression data usually presents a high-dimensional imbalanced characteristic, which poses severe challenge for traditional machine learning methods to construct a robust classifier performing well on both the minority and majority classes. As one of the most successful feature weighting techniques, Relief is considered to particularly suit to handle high-dimensional problems. Unfortunately, almost all relief-based methods have not taken the class imbalance distribution into account. This study identifies that existing Relief-based algorithms may underestimate the features with the discernibility ability of minority classes, and ignore the distribution characteristic of minority class samples. As a result, an additional bias towards being classified into the majority classes can be introduced. To this end, a new method, named imRelief, is proposed for efficiently handling high-dimensional imbalanced gene expression data. imRelief can correct the bias towards to the majority classes, and consider the scattered distributional characteristic of minority class samples in the process of estimating feature weights. This way, imRelief has the ability to reward the features which perform well at separating the minority classes from other classes. Experiments on four microarray gene expression data sets demonstrate the effectiveness of imRelief in both feature weighting and feature subset selection applications.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号