首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 406 毫秒
1.
The presence of less relevant or highly correlated features often decrease classification accuracy. Feature selection in which most informative variables are selected for model generation is an important step in data-driven modeling. In feature selection, one often tries to satisfy multiple criteria such as feature discriminating power, model performance or subset cardinality. Therefore, a multi-objective formulation of the feature selection problem is more appropriate. In this paper, we propose to use fuzzy criteria in feature selection by using a fuzzy decision making framework. This formulation allows for a more flexible definition of the goals in feature selection, and avoids the problem of weighting different goals is classical multi-objective optimization. The optimization problem is solved using an ant colony optimization algorithm proposed in our previous work. We illustrate the added value of the approach by applying our proposed fuzzy feature selection algorithm to eight benchmark problems.  相似文献   

2.
The feature selection problem is an interesting and important topic which is relevant for a variety of database applications. This paper utilizes the Tabu Search metaheuristic algorithm to implement a feature subset selection procedure while the nearest neighbor classification method is used for the classification task. Tabu Search is a general metaheuristic procedure that is used in order to guide the search to obtain good solutions in complex solution spaces. Several metrics are used in the nearest neighbor classification method, such as the euclidean distance, the Standardized Euclidean distance, the Mahalanobis distance, the City block metric, the Cosine distance and the Correlation distance, in order to identify the most significant metric for the nearest neighbor classifier. The performance of the proposed algorithms is tested using various benchmark datasets from UCI Machine Learning Repository.  相似文献   

3.
This paper presents a design methodology for IP networks under end-to-end Quality-of-Service (QoS) constraints. Particularly, we consider a more realistic problem formulation in which the link capacities of a general-topology packet network are discrete variables. This Discrete Capacity Assignment (DCA) problem can be classified as a constrained combinatorial optimization problem. A refined TCP/IP traffic modeling technique is also considered in order to estimate performance metrics for networks loaded by realistic traffic patterns. We propose a discrete variable Particle Swarm Optimization (PSO) procedure to find solutions for the problem. A simple approach called Bottleneck Link Heuristic (BLH) is also proposed to obtain admissible solutions in a fast way. The PSO performance, compared to that one of an exhaustive search (ES) procedure, suggests that the PSO algorithm provides a quite efficient approach to obtain (near) optimal solutions with small computational effort.  相似文献   

4.
A minimax feature selection problem for constructing a classifier using support vector machines is considered. Properties of the solutions of this problem are analyzed. An improvement of the saddle point search algorithm based on extending the bound for the step parameter is proposed. A new nondifferential optimization algorithm is developed that, together with the saddle point search algorithm, forms a hybrid feature selection algorithm. The efficiency of the algorithm for computing Dykstra’s projections as applied for the feature selection problem is experimentally estimated.  相似文献   

5.
Feature reduction based on rough set theory is an effective feature selection method in pattern recognition applications. Finding a minimal subset of the original features is inherent in rough set approach to feature selection. As feature reduction is a Nondeterministic Polynomial‐time‐hard problem, it is necessary to develop fast optimal or near‐optimal feature selection algorithms. This article aims to propose an exact feature selection algorithm in rough set that is efficient in terms of computation time. The proposed algorithm begins the examination of a solution tree by a breadth‐first strategy. The pruned nodes are held in a version of the trie data structure. Based on the monotonic property of dependency degree, all subsets of the pruned nodes cannot be optimal solutions. Thus, by detecting these subsets in trie, it is not necessary to calculate their dependency degree. The search on the tree continues until the optimal solution is found. This algorithm is improved by selecting an initial search level determined by the hill‐climbing method instead of searching the tree from the level below the root. The length of the minimal reduct and the size of data set can influence which starting search level is more efficient. The experimental results using some of the standard UCI data sets, demonstrate that the proposed algorithm is effective and efficient for data sets with more than 30 features. © 2014 Wiley Periodicals, Inc. Complexity 20: 50–62, 2015  相似文献   

6.
The curse of dimensionality is based on the fact that high dimensional data is often difficult to work with. A large number of features can increase the noise of the data and thus the error of a learning algorithm. Feature selection is a solution for such problems where there is a need to reduce the data dimensionality. Different feature selection algorithms may yield feature subsets that can be considered local optima in the space of feature subsets. Ensemble feature selection combines independent feature subsets and might give a better approximation to the optimal subset of features. We propose an ensemble feature selection approach based on feature selectors’ reliability assessment. It aims at providing a unique and stable feature selection without ignoring the predictive accuracy aspect. A classification algorithm is used as an evaluator to assign a confidence to features selected by ensemble members based on their associated classification performance. We compare our proposed approach to several existing techniques and to individual feature selection algorithms. Results show that our approach often improves classification performance and feature selection stability for high dimensional data sets.  相似文献   

7.
Feature Selection (FS) is an important pre-processing step in data mining and classification tasks. The aim of FS is to select a small subset of most important and discriminative features. All the traditional feature selection methods assume that the entire input feature set is available from the beginning. However, online streaming features (OSF) are an integral part of many real-world applications. In OSF, the number of training examples is fixed while the number of features grows with time as new features stream in. A critical challenge for online streaming feature selection (OSFS) is the unavailability of the entire feature set before learning starts. Several efforts have been made to address the OSFS problem, however they all need some prior knowledge about the entire feature space to select informative features. In this paper, the OSFS problem is considered from the rough sets (RS) perspective and a new OSFS algorithm, called OS-NRRSAR-SA, is proposed. The main motivation for this consideration is that RS-based data mining does not require any domain knowledge other than the given dataset. The proposed algorithm uses the classical significance analysis concepts in RS theory to control the unknown feature space in OSFS problems. This algorithm is evaluated extensively on several high-dimensional datasets in terms of compactness, classification accuracy, run-time, and robustness against noises. Experimental results demonstrate that the algorithm achieves better results than existing OSFS algorithms, in every way.  相似文献   

8.
Feature selection consists of choosing a subset of available features that capture the relevant properties of the data. In supervised pattern classification, a good choice of features is fundamental for building compact and accurate classifiers. In this paper, we develop an efficient feature selection method using the zero-norm l 0 in the context of support vector machines (SVMs). Discontinuity at the origin for l 0 makes the solution of the corresponding optimization problem difficult to solve. To overcome this drawback, we use a robust DC (difference of convex functions) programming approach which is a general framework for non-convex continuous optimisation. We consider an appropriate continuous approximation to l 0 such that the resulting problem can be formulated as a DC program. Our DC algorithm (DCA) has a finite convergence and requires solving one linear program at each iteration. Computational experiments on standard datasets including challenging feature-selection problems of the NIPS 2003 feature selection challenge and gene selection for cancer classification show that the proposed method is promising: while it suppresses up to more than 99% of the features, it can provide a good classification. Moreover, the comparative results illustrate the superiority of the proposed approach over standard methods such as classical SVMs and feature selection concave.  相似文献   

9.
本文研究考虑交易成本的投资组合模型,分别以风险价值(VAR)和夏普比率(SR)作为投资组合的风险评价指标和效益评价指标。为有效求解此模型,本文在引力搜索和粒子群算法的基础上提出了一种混合优化算法(IN-GSA-PSO),将粒子群算法的群体最佳位置和个体最佳位置与引力搜索算法的加速度算子有机结合,使混合优化算法充分发挥单一算法的开采能力和探索能力。通过对算法相关参数的合理设置,算法能够达到全局搜索和局部搜索的平衡,快速收敛到模型的最优解。本文选取上证50股2014年下半年126个交易日的数据,运用Matlab软件进行仿真实验,实验结果显示,考虑交易成本的投资组合模型可使投资者得到更高的收益率。研究同时表明,基于PSO和GSA的混合算法在求解投资组合模型时比单一算法具有更好的性能,能够得到满意的优化结果。  相似文献   

10.
We propose a novel cooperative swarm intelligence algorithm to solve multi-objective discrete optimization problems (MODP). Our algorithm combines a firefly algorithm (FA) and a particle swarm optimization (PSO). Basically, we address three main points: the effect of FA and PSO cooperation on the exploration of the search space, the discretization of the two algorithms using a transfer function, and finally, the use of the epsilon dominance relation to manage the size of the external archive and to guarantee the convergence and the diversity of Pareto optimal solutions.We compared the results of our algorithm with the results of five well-known meta-heuristics on nine multi-objective knapsack problem benchmarks. The experiments show clearly the ability of our algorithm to provide a better spread of solutions with a better convergence behavior.  相似文献   

11.
The increasing demand for high reliability, safety and availability of technical systems calls for innovative maintenance strategies. The use of prognostic health management (PHM) approach where maintenance action is taken based on current and future health state of a component or system is rapidly gaining popularity in the maintenance industry. Multiclass support vector machines (MC-SVM) has been identified as a promising algorithm in PHM applications due to its high classification accuracy. However, it requires parameter tuning for each application, with the objective of minimizing the classification error. This is a single objective optimization problem which requires the use of optimization algorithms that are capable of exhaustively searching for the global optimum parameters. This work proposes the use of hybrid differential evolution (DE) and particle swarm optimization (PSO) in optimally tuning the MC-SVM parameters. DE identifies the search limit of the parameters while PSO finds the global optimum within the search limit. The feasibility of the approach is verified using bearing run-to-failure data and the results show that the proposed method significantly increases health state classification accuracy. (© 2014 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

12.
This paper addresses the design of a network of observation locations in a spatial domain that will be used to estimate unknown parameters of a distributed parameter system. We consider a setting where we are given a finite number of possible sites at which to locate a sensor, but cost constraints allow only some proper subset of them to be selected. We formulate this problem as the selection of the gauged sites so as to maximize the log-determinant of the Fisher information matrix associated with the estimated parameters. The search for the optimal solution is performed using the branch-and-bound method in which an extremely simple and efficient technique is employed to produce an upper bound to the maximum objective function. Its idea consists in solving a relaxed problem through the application of a simplicial decomposition algorithm in which the restricted master problem is solved using a multiplicative algorithm for optimal design. The use of the proposed approach is illustrated by a numerical example involving sensor selection for a two-dimensional convective diffusion process.  相似文献   

13.
We present the design of more effective and efficient genetic algorithm based data mining techniques that use the concepts of feature selection. Explicit feature selection is traditionally done as a wrapper approach where every candidate feature subset is evaluated by executing the data mining algorithm on that subset. In this article we present a GA for doing both the tasks of mining and feature selection simultaneously by evolving a binary code along side the chromosome structure used for evolving the rules. We then present a wrapper approach to feature selection based on Hausdorff distance measure. Results from applying the above techniques to a real world data mining problem show that combining both the feature selection methods provides the best performance in terms of prediction accuracy and computational efficiency.  相似文献   

14.
Particle swarm optimization (PSO) is characterized by a fast convergence, which can lead the algorithms of this class to stagnate in local optima. In this paper, a variant of the standard PSO algorithm is presented, called PSO-2S, based on several initializations in different zones of the search space, using charged particles. This algorithm uses two kinds of swarms, a main one that gathers the best particles of auxiliary ones, initialized several times. The auxiliary swarms are initialized in different areas, then an electrostatic repulsion heuristic is applied in each area to increase its diversity. We analyse the performance of the proposed approach on a testbed made of unimodal and multimodal test functions with and without coordinate rotation and shift. The Lennard-Jones potential problem is also used. The proposed algorithm is compared to several other PSO algorithms on this benchmark. The obtained results show the efficiency of the proposed algorithm.  相似文献   

15.
A new modification to the particle swarm optimization (PSO) algorithm is proposed aiming to make the algorithm less sensitive to selection of the initial search domain. To achieve this goal, we release the boundaries of the search domain and enable each boundary to drift independently, guided by the number of collisions with particles involved in the optimization process. The gradual modification of the active search domain range enables us to prevent particles from revisiting less promising regions of the search domain and also to explore the areas located outside the initial search domain. With time, the search domain shrinks around a region holding a global extremum. This helps improve the quality of the final solution obtained. It also makes the algorithm less sensitive to initial choice of the search domain ranges. The effectiveness of the proposed Floating Boundary PSO (FBPSO) is demonstrated using a set of standard test functions. To control the performance of the algorithm, new parameters are introduced. Their optimal values are determined through numerical examples.  相似文献   

16.
We analyse a new optimization-based approach for feature selection that uses the nested partitions method for combinatorial optimization as a heuristic search procedure to identify good feature subsets. In particular, we show how to improve the performance of the nested partitions method using random sampling of instances. The new approach uses a two-stage sampling scheme that determines the required sample size to guarantee convergence to a near-optimal solution. This approach therefore also has attractive theoretical characteristics. In particular, when the algorithm terminates in finite time, rigorous statements can be made concerning the quality of the final feature subset. Numerical results are reported to illustrate the key results, and show that the new approach is considerably faster than the original nested partitions method and other feature selection methods.  相似文献   

17.
This paper presents a co-evolutionary particle swarm optimization (PSO) algorithm, hybridized with noising metaheuristics, for solving the delay constrained least cost (DCLC) path problem, i.e., shortest-path problem with a delay constraint on the total “cost” of the optimal path. The proposed algorithm uses the principle of Lagrange relaxation based aggregated cost. It essentially consists of two concurrent PSOs for solving the resulting minimization-maximization problem. The main PSO is designed as a hybrid PSO-noising metaheuristics algorithm for efficient global search to solve the minimization part of the DCLC-Lagrangian relaxation by finding multiple shortest paths between a source-destination pair. The auxiliary/second PSO is a co-evolutionary PSO to obtain the optimal Lagrangian multiplier for solving the maximization part of the Lagrangian relaxation problem. For the main PSO, a novel heuristics-based path encoding/decoding scheme has been devised for representation of network paths as particles. The simulation results on several networks with random topologies illustrate the efficiency of the proposed hybrid algorithm for the constrained shortest path computation problems.  相似文献   

18.
本文针对求解旅行商问题的标准粒子群算法所存在的早熟和低效的问题,提出一种基于Greedy Heuristic的初始解与粒子群相结合的混合粒子群算法(SKHPSO)。该算法通过本文给出的类Kruskal算法作为Greedy Heuristic的具体实现手段,产生一个较优的初始可行解,作为粒子群中的一员,然后再用改进的混合粒子群算法进行启发式搜索。SKHPSO的局部搜索借鉴了Lin-Kernighan邻域搜索,而全局搜索结合了遗传算法中的交叉及置换操作。应用该算法对TSPLIB中的典型算例进行了算法测试分析,结果表明:SKHPSO可明显提高求解的质量和效率。  相似文献   

19.
The feature selection problem aims to choose a subset of a given set of features that best represents the whole in a particular aspect, preserving the original semantics of the variables on the given samples and classes. In 2004, a new approach to perform feature selection was proposed. It was based on a NP-complete combinatorial optimisation problem called (\(\alpha ,\beta \))-k-feature set problem. Although effective for many practical cases, which made the approach an important feature selection tool, the only existing solution method, proposed on the original paper, was found not to work well for several instances. Our work aims to cover this gap found on the literature, quickly obtaining high quality solutions for the instances that existing approach can not solve. This work proposes a heuristic based on the greedy randomised adaptive search procedure and tabu search to address this problem; and benchmark instances to evaluate its performance. The computational results show that our method can obtain high quality solutions for both real and the proposed artificial instances and requires only a fraction of the computational resources required by the state of the art exact and heuristic approaches which use mixed integer programming models.  相似文献   

20.
Several meta-heuristic algorithms, such as evolutionary algorithms (EAs) and genetic algorithms (GAs), have been developed for solving feature selection problems due to their efficiency for searching feature subset spaces in feature selection problems. Recently, hybrid GAs have been proposed to improve the performance of conventional GAs by embedding a local search operation, or sequential forward floating search mutation, into the GA. Existing hybrid algorithms may damage individuals’ genetic information obtained from genetic operations during the local improvement procedure because of a sequential process of the mutation operation and the local improvement operation. Another issue with a local search operation used in the existing hybrid algorithms is its inappropriateness for large-scale problems. Therefore, we propose a novel approach for solving large-sized feature selection problems, namely, an EA with a partial sequential forward floating search mutation (EAwPS). The proposed approach integrates a local search technique, that is, the partial sequential forward floating search mutation into an EA method. Two algorithms, EAwPS-binary representation (EAwPS-BR) for medium-sized problems and EAwPS-integer representation (EAwPS-IR) for large-sized problems, have been developed. The adaptation of a local improvement method into the EA speeds up the search and directs the search into promising solution areas. We compare the performance of the proposed algorithms with other popular meta-heuristic algorithms using the medium- and large-sized data sets. Experimental results demonstrate that the proposed EAwPS extracts better features within reasonable computational times.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号