期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Incremental accelerated gradient methods for SVM classification: study of the constrained approach

Nicolas Couellan Sophie Jan 《Computational Management Science》2014,11(4):419-444

We investigate constrained first order techniques for training support vector machines (SVM) for online classification tasks. The methods exploit the structure of the SVM training problem and combine ideas of incremental gradient technique, gradient acceleration and successive simple calculations of Lagrange multipliers. Both primal and dual formulations are studied and compared. Experiments show that the constrained incremental algorithms working in the dual space achieve the best trade-off between prediction accuracy and training time. We perform comparisons with an unconstrained large scale learning algorithm (Pegasos stochastic gradient) to emphasize that our choice can remain competitive for large scale learning due to the very special structure of the training problem. 相似文献

2.

Online streaming feature selection using rough sets

《International Journal of Approximate Reasoning》2016

Feature Selection (FS) is an important pre-processing step in data mining and classification tasks. The aim of FS is to select a small subset of most important and discriminative features. All the traditional feature selection methods assume that the entire input feature set is available from the beginning. However, online streaming features (OSF) are an integral part of many real-world applications. In OSF, the number of training examples is fixed while the number of features grows with time as new features stream in. A critical challenge for online streaming feature selection (OSFS) is the unavailability of the entire feature set before learning starts. Several efforts have been made to address the OSFS problem, however they all need some prior knowledge about the entire feature space to select informative features. In this paper, the OSFS problem is considered from the rough sets (RS) perspective and a new OSFS algorithm, called OS-NRRSAR-SA, is proposed. The main motivation for this consideration is that RS-based data mining does not require any domain knowledge other than the given dataset. The proposed algorithm uses the classical significance analysis concepts in RS theory to control the unknown feature space in OSFS problems. This algorithm is evaluated extensively on several high-dimensional datasets in terms of compactness, classification accuracy, run-time, and robustness against noises. Experimental results demonstrate that the algorithm achieves better results than existing OSFS algorithms, in every way. 相似文献

3.

Pegasos: primal estimated sub-gradient solver for SVM 总被引：2，自引：0，他引：2

Shai Shalev-Shwartz Yoram Singer Nathan Srebro Andrew Cotter 《Mathematical Programming》2011,127(1):3-30

We describe and analyze a simple and effective stochastic sub-gradient descent algorithm for solving the optimization problem cast by Support Vector Machines (SVM). We prove that the number of iterations required to obtain a solution of accuracy e{\epsilon} is [(O)\tilde](1 / e){\tilde{O}(1 / \epsilon)}, where each iteration operates on a single training example. In contrast, previous analyses of stochastic gradient descent methods for SVMs require W(1 / e²){\Omega(1 / \epsilon^2)} iterations. As in previously devised SVM solvers, the number of iterations also scales linearly with 1/λ, where λ is the regularization parameter of SVM. For a linear kernel, the total run-time of our method is [(O)\tilde](d/(le)){\tilde{O}(d/(\lambda \epsilon))}, where d is a bound on the number of non-zero features in each example. Since the run-time does not depend directly on the size of the training set, the resulting algorithm is especially suited for learning from large datasets. Our approach also extends to non-linear kernels while working solely on the primal objective function, though in this case the runtime does depend linearly on the training set size. Our algorithm is particularly well suited for large text classification problems, where we demonstrate an order-of-magnitude speedup over previous SVM learning methods. 相似文献

4.

A convergent decomposition algorithm for support vector machines 总被引：1，自引：0，他引：1

S. Lucidi L. Palagi A. Risi M. Sciandrone 《Computational Optimization and Applications》2007,38(2):217-234

In this work we consider nonlinear minimization problems with a single linear equality constraint and box constraints. In particular we are interested in solving problems where the number of variables is so huge that traditional optimization methods cannot be directly applied. Many interesting real world problems lead to the solution of large scale constrained problems with this structure. For example, the special subclass of problems with convex quadratic objective function plays a fundamental role in the training of Support Vector Machine, which is a technique for machine learning problems. For this particular subclass of convex quadratic problem, some convergent decomposition methods, based on the solution of a sequence of smaller subproblems, have been proposed. In this paper we define a new globally convergent decomposition algorithm that differs from the previous methods in the rule for the choice of the subproblem variables and in the presence of a proximal point modification in the objective function of the subproblems. In particular, the new rule for sequentially selecting the subproblems appears to be suited to tackle large scale problems, while the introduction of the proximal point term allows us to ensure the global convergence of the algorithm for the general case of nonconvex objective function. Furthermore, we report some preliminary numerical results on support vector classification problems with up to 100 thousands variables. 相似文献

5.

An efficient hierarchical preconditioner for quadratic discretizations of finite element problems

A. El maliki R. Guénette M. Fortin 《Numerical Linear Algebra with Applications》2011,18(5):789-803

Higher order finite element discretizations, although providing higher accuracy, are considered to be computationally expensive and of limited use for large‐scale problems. In this paper, we have developed an efficient iterative solver for solving large‐scale quadratic finite element problems. The proposed approach shares some common features with geometric multigrid methods but does not need structured grids to create the coarse problem. This leads to a robust method applicable to finite element problems discretized by unstructured meshes such as those from adaptive remeshing strategies. The method is based on specific properties of hierarchical quadratic bases. It can be combined with an algebraic multigrid (AMG) preconditioner or with other algebraic multilevel block factorizations. The algorithm can be accelerated by flexible Krylov subspace methods. We present some numerical results on the convection–diffusion and linear elasticity problems to illustrate the efficiency and the robustness of the presented algorithm. In these experiments, the performance of the proposed method is compared with that of an AMG preconditioner and other iterative solvers. Our approach requires less computing time and less memory storage. Copyright © 2010 John Wiley & Sons, Ltd. 相似文献

6.

A new simple homotopy algorithm for linear programming I

《Journal of Complexity》1988,4(2):124-136

相似文献

7.

A Finite Algorithm for Globally Optimizing a Class of Rank-Two Reverse Convex Programs

Takahito Kuno Yoshitsugu Yamamoto 《Journal of Global Optimization》1998,12(3):247-265

In this paper, we propose an algorithm for solving a linear program with an additional rank-two reverse convex constraint. Unlike the existing methods which generate approximately optimal solutions, the algorithm provides a rigorous optimal solution to this nonconvex problem by a finite number of dual pivot operations. Computational results indicate that the algorithm is practical and can solve fairly large scale problems. 相似文献

8.

A two step algorithm for solving a large scale semi-definite logit model

Hiroshi Konno Naoya Kawadai Hiroshi Shimode 《Optimization Letters》2007,1(4):329-340

This paper proposes a two step algorithm for solving a large scale semi-definite logit model, which is appreciated as a powerful model in failure discriminant analysis. This problem has been successfully solved by a cutting plane (outer approximation) algorithm. However, it requires much more computation time than the corresponding linear logit model. A two step algorithm to be proposed in this paper is intended to reduce the amount of computation time by eliminating a certain portion of the data based on the information obtained by solving an associated linear logit model. It will be shown that this algorithm can generate a solution with almost the same quality as the solution obtained by solving the original large scale semi-definite model within a fraction of computation time. 相似文献

9.

Sparse optimization in feature selection: application in neuroimaging

K. Kampa S. Mehta C. A. Chou W. A. Chaovalitwongse T. J. Grabowski 《Journal of Global Optimization》2014,59(2-3):439-457

Feature selection plays an important role in the successful application of machine learning techniques to large real-world datasets. Avoiding model overfitting, especially when the number of features far exceeds the number of observations, requires selecting informative features and/or eliminating irrelevant ones. Searching for an optimal subset of features can be computationally expensive. Functional magnetic resonance imaging (fMRI) produces datasets with such characteristics creating challenges for applying machine learning techniques to classify cognitive states based on fMRI data. In this study, we present an embedded feature selection framework that integrates sparse optimization for regularization (or sparse regularization) and classification. This optimization approach attempts to maximize training accuracy while simultaneously enforcing sparsity by penalizing the objective function for the coefficients of the features. This process allows many coefficients to become zero, which effectively eliminates their corresponding features from the classification model. To demonstrate the utility of the approach, we apply our framework to three different real-world fMRI datasets. The results show that regularized classifiers yield better classification accuracy, especially when the number of initial features is large. The results further show that sparse regularization is key to achieving scientifically-relevant generalizability and functional localization of classifier features. The approach is thus highly suited for analysis of fMRI data. 相似文献

10.

基于GBMTS算法的不平衡数据分类研究

顾玉萍程龙生陈湘来《数理统计与管理》2016,(6):1016-1027

解决不平衡数据分类问题,在现实中有着深远的意义。马田系统利用单一的正常类别构建基准空间和测量基准尺度,并由此建立数据分类模型,十分适合不平衡数据分类问题的处理。本文以传统马田系统方法为基础,结合信噪比及F-value、G-mean等分类精度,建立了基于遗传算法的基准空间优化模型,同时运用Bagging集成化算法,构造了改进马田系统模型算法GBMTS。通过对不同分类方法及相关数据集的实验分析,表明:GBMTS算法较其他分类算法,更能够有效的处理不平衡数据的分类问题。相似文献

11.

A clustering ensemble framework based on elite selection of weighted clusters

Hamid Parvin Behrouz Minaei-Bidgoli 《Advances in Data Analysis and Classification》2013,7(2):181-208

Each clustering algorithm usually optimizes a qualification metric during its progress. The qualification metric in conventional clustering algorithms considers all the features equally important; in other words each feature participates in the clustering process equivalently. It is obvious that some features have more information than others in a dataset. So it is highly likely that some features should have lower importance degrees during a clustering or a classification algorithm; due to their lower information or their higher variances and etc. So it is always a desire for all artificial intelligence communities to enforce the weighting mechanism in any task that identically uses a number of features to make a decision. But there is always a certain problem of how the features can be participated in the clustering process (in any algorithm, but especially in clustering algorithm) in a weighted manner. Recently, this problem is dealt with by locally adaptive clustering (LAC). However, like its traditional competitors the LAC suffers from inefficiency in data with imbalanced clusters. This paper solves the problem by proposing a weighted locally adaptive clustering (WLAC) algorithm that is based on the LAC algorithm. However, WLAC algorithm suffers from sensitivity to its two parameters that should be tuned manually. The performance of WLAC algorithm is affected by well-tuning of its parameters. Paper proposes two solutions. The first is based on a simple clustering ensemble framework to examine the sensitivity of the WLAC algorithm to its manual well-tuning. The second is based on cluster selection method. 相似文献

12.

A comparative study on large scale kernelized support vector machines

Daniel Horn Aydın Demircioğlu Bernd Bischl Tobias Glasmachers Claus Weihs 《Advances in Data Analysis and Classification》2018,12(4):867-883

Kernelized support vector machines (SVMs) belong to the most widely used classification methods. However, in contrast to linear SVMs, the computation time required to train such a machine becomes a bottleneck when facing large data sets. In order to mitigate this shortcoming of kernel SVMs, many approximate training algorithms were developed. While most of these methods claim to be much faster than the state-of-the-art solver LIBSVM, a thorough comparative study is missing. We aim to fill this gap. We choose several well-known approximate SVM solvers and compare their performance on a number of large benchmark data sets. Our focus is to analyze the trade-off between prediction error and runtime for different learning and accuracy parameter settings. This includes simple subsampling of the data, the poor-man’s approach to handling large scale problems. We employ model-based multi-objective optimization, which allows us to tune the parameters of learning machine and solver over the full range of accuracy/runtime trade-offs. We analyze (differences between) solvers by studying and comparing the Pareto fronts formed by the two objectives classification error and training time. Unsurprisingly, given more runtime most solvers are able to find more accurate solutions, i.e., achieve a higher prediction accuracy. It turns out that LIBSVM with subsampling of the data is a strong baseline. Some solvers systematically outperform others, which allows us to give concrete recommendations of when to use which solver. 相似文献

13.

A hierarchical multi-label classification ant colony algorithm for protein function prediction

Fernando E. B. Otero Alex A. Freitas Colin G. Johnson 《Memetic Computing》2010,2(3):165-181

This paper proposes a novel ant colony optimisation (ACO) algorithm tailored for the hierarchical multi-label classification problem of protein function prediction. This problem is a very active research field, given the large increase in the number of uncharacterised proteins available for analysis and the importance of determining their functions in order to improve the current biological knowledge. Since it is known that a protein can perform more than one function and many protein functional-definition schemes are organised in a hierarchical structure, the classification problem in this case is an instance of a hierarchical multi-label problem. In this type of problem, each example may belong to multiple class labels and class labels are organised in a hierarchical structure—either a tree or a directed acyclic graph structure. It presents a more complex problem than conventional flat classification, given that the classification algorithm has to take into account hierarchical relationships between class labels and be able to predict multiple class labels for the same example. The proposed ACO algorithm discovers an ordered list of hierarchical multi-label classification rules. It is evaluated on sixteen challenging bioinformatics data sets involving hundreds or thousands of class labels to be predicted and compared against state-of-the-art decision tree induction algorithms for hierarchical multi-label classification. 相似文献

14.

A Frisch-Newton Algorithm for Sparse Quantile Regression 总被引：3，自引：0，他引：3

RogerKoenker PinNg 《应用数学学报(英文版)》2005,21(2):225-236

Recent experience has shown that interior-point methods using a log barrier approach are far superior to classical simplex methods for computing solutions to large parametric quantile regression problems. In many large empirical applications, the design matrix has a very sparse structure. A typical example is the classical fixed-effect model for panel data where the parametric dimension of the model can be quite large, but the number of non-zero elements is quite small. Adopting recent developments in sparse linear algebra we introduce a modified version of the Prisch-Newton algorithm for quantile regression described in Portnoy and Koenker~([28]). The new algorithm substantially reduces the storage (memory) requirements and increases computational speed. The modified algorithm also facilitates the development of nonparametric quantile regression methods. The pseudo design matrices employed in nonparametric quantile regression smoothing are inherently sparse in both the fidelity and roughness penalty components. Exploiting the sparse structure of these problems opens up a whole range of new possibilities for multivariate smoothing on large data sets via ANOVA-type decomposition and partial linear models. 相似文献

15.

Ensemble feature selection for high dimensional data: a new method and a comparative study

Afef Ben Brahim Mohamed Limam 《Advances in Data Analysis and Classification》2018,12(4):937-952

The curse of dimensionality is based on the fact that high dimensional data is often difficult to work with. A large number of features can increase the noise of the data and thus the error of a learning algorithm. Feature selection is a solution for such problems where there is a need to reduce the data dimensionality. Different feature selection algorithms may yield feature subsets that can be considered local optima in the space of feature subsets. Ensemble feature selection combines independent feature subsets and might give a better approximation to the optimal subset of features. We propose an ensemble feature selection approach based on feature selectors’ reliability assessment. It aims at providing a unique and stable feature selection without ignoring the predictive accuracy aspect. A classification algorithm is used as an evaluator to assign a confidence to features selected by ensemble members based on their associated classification performance. We compare our proposed approach to several existing techniques and to individual feature selection algorithms. Results show that our approach often improves classification performance and feature selection stability for high dimensional data sets. 相似文献

16.

A Feature Selection Newton Method for Support Vector Machine Classification 总被引：4，自引：1，他引：3

Glenn M. Fung O.L. Mangasarian 《Computational Optimization and Applications》2004,28(2):185-202

A fast Newton method, that suppresses input space features, is proposed for a linear programming formulation of support vector machine classifiers. The proposed stand-alone method can handle classification problems in very high dimensional spaces, such as 28,032 dimensions, and generates a classifier that depends on very few input features, such as 7 out of the original 28,032. The method can also handle problems with a large number of data points and requires no specialized linear programming packages but merely a linear equation solver. For nonlinear kernel classifiers, the method utilizes a minimal number of kernel functions in the classifier that it generates. 相似文献

17.

Machine duplication and part subcontracting in the presence of alternative cell locations in manufacturing cell design

R Logendran V Sirikrai 《The Journal of the Operational Research Society》2000,51(5):609-624

In this paper a model and a solution algorithm are reported to simultaneously deal with the processes of machine duplication and part subcontracting in the presence of two significant design issues in cellular manufacturing systems: (i) alternative cell locations; and (ii) the maximum number of machines assigned to a cell. As the problem, formulated as a polynomial programming model, is shown NP-hard in the strong sense, a higher-level heuristic algorithm based upon a concept known as ‘tabu search’ is presented. An example (small) problem is solved to demonstrate the functionality of the algorithm. Additionally, the small problem is solved for its optimal solution under different scenarios, both with linear single-row and linear double-row layout arrangements, and the solutions obtained are shown to match with those obtained with the heuristic algorithm. A comparison of six different versions of tabu search-based heuristics (TSH 1–TSH 6) is performed to investigate the impact of long-term memory and the use of fixed versus variable tabu-list sizes. A carefully constructed statistical experiment, based on randomised complete-block design, is used to test the performance on three different problem structures, classified as small, medium and large. The results show that TSH 6 with variable tabu-list size and long-term memory based on minimal frequencies is preferred for the single-row layout, while TSH 4 with variable tabu-list size and no long-term memory is preferred for the double-row layout. When subject to budgetary restrictions, the proposed approach can be used by parts manufacturing companies to determine which of the following three actions should be undertaken for each bottleneck part: bottleneck part left as in the initial solution, all the bottleneck machines connected to it are duplicated, or the part subcontracted. 相似文献

18.

Classification Using Generalized Partial Least Squares

《Journal of computational and graphical statistics》2013,22(2):280-298

Advances in computational biology have made simultaneous monitoring of thousands of features possible. The high throughput technologies not only bring about a much richer information context in which to study various aspects of gene function, but they also present the challenge of analyzing data with a large number of covariates and few samples. As an integral part of machine learning, classification of samples into two or more categories is almost always of interest to scientists. We address the question of classification in this setting by extending partial least squares (PLS), a popular dimension reduction tool in chemometrics, in the context of generalized linear regression, based on a previous approach, iteratively reweighted partial least squares, that is, IRWPLS. We compare our results with two-stage PLS and with other classifiers. We show that by phrasing the problem in a generalized linear model setting and by applying Firth's procedure to avoid (quasi)separation, we often get lower classification error rates. 相似文献

19.

Parallel algorithm for training multiclass proximal Support Vector Machines

Lingfeng Niu 《Applied mathematics and computation》2011,217(12):5328-5337

In this paper we describe a proximal Support Vector Machine algorithm for multiclassification problem by one-vs-all scheme. The computational requirement for the new algorithm is almost the same as training one of its element binary proximal Support Vector Machines. Low rank approximation is taken to reduce computational costs when the kernel matrix is too large. An error bound estimation for the approximated solution is given, which is used as a stopping criteria for low rank approximation. A post-processing strategy is developed to overcome the difficulty arising from unbalanced data and to improve the classification accuracy. A parallel implementation of the algorithm using standard MPI communication routines is provided to handle large-scale problems and to accelerate the training process. Experiment results on several public datasets validate the effectiveness of our proposed algorithm. 相似文献

20.

Classification and Characterization of Gene Expression Data with Generalized Eigenvalues

M. R. Guarracino S. Cuciniello P. M. Pardalos 《Journal of Optimization Theory and Applications》2009,141(3):533-545

In this study, we present Incremental Learning and Decremented Characterization of Regularized Generalized Eigenvalue Classification (ILDC-ReGEC), a novel algorithm to train a generalized eigenvalue classifier with a substantially smaller subset of points and features of the original data. The proposed method provides a constructive way to understand the influence of new training data on an existing classification model and the grouping of features that determine the class of samples. We show through numerical experiments that this technique has comparable accuracy with respect to other methods. Furthermore, experiments show that it is possible to obtain a classification model with about 30% of the training samples and less then 5% of the initial features. Matlab implementation of the ILDC-ReGEC algorithm is freely available from the authors. Research partially supported by NSF and Air Force grants. 相似文献