首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 609 毫秒
1.
The existing support vector machines (SVMs) are all assumed that all the features of training samples have equal contributions to construct the optimal separating hyperplane. However, for a certain real-world data set, some features of it may possess more relevances to the classification information, while others may have less relevances. In this paper, the linear feature-weighted support vector machine (LFWSVM) is proposed to deal with the problem. Two phases are employed to construct the proposed model. First, the mutual information (MI) based approach is used to assign appropriate weights for each feature of the whole given data set. Second, the proposed model is trained by the samples with their features weighted by the obtained feature weight vector. Meanwhile, the feature weights are embedded in the quadratic programming through detailed theoretical deduction to obtain the dual solution to the original optimization problem. Although the calculation of feature weights may add an extra computational cost, the proposed model generally exhibits better generalization performance over the traditional support vector machine (SVM) with linear kernel function. Experimental results upon one synthetic data set and several benchmark data sets confirm the benefits in using the proposed method. Moreover, it is also shown in experiments that the proposed MI based approach to determining feature weights is superior to the other two mostly used methods.  相似文献   

2.
李璐  刘万荣 《经济数学》2004,21(4):347-354
本文讨论了向量损失函数下参数估计的可容许与在常用损失函数下可容许之间的关系 ,并研究了在一元线性模型、多元线性模型中参数估计在特定估计类及一切估计类中的可容许性 ,给出了估计可容许的一些充要条件和充分条件 .  相似文献   

3.
Gaussians are important tools for learning from data of large dimensions. The variance of a Gaussian kernel is a measurement of the frequency range of function components or features retrieved by learning algorithms induced by the Gaussian. The learning ability and approximation power increase when the variance of the Gaussian decreases. Thus, it is natural to use  Gaussians with decreasing variances  for online algorithms when samples are imposed one by one. In this paper, we consider fully online classification algorithms associated with a general loss function and varying Gaussians which are closely related to regularization schemes in reproducing kernel Hilbert spaces. Learning rates are derived in terms of the smoothness of a target function associated with the probability measure controlling sampling and the loss function. A critical estimate is given for the norm of the difference of regularized target functions as the variance of the Gaussian changes. Concrete learning rates are presented for the online learning algorithm with the least square loss function.  相似文献   

4.
In this paper, we consider unregularized online learning algorithms in a Reproducing Kernel Hilbert Space (RKHS). Firstly, we derive explicit convergence rates of the unregularized online learning algorithms for classification associated with a general α-activating loss (see Definition 1 below). Our results extend and refine the results in [30] for the least square loss and the recent result [3] for the loss function with a Lipschitz-continuous gradient. Moreover, we establish a very general condition on the step sizes which guarantees the convergence of the last iterate of such algorithms. Secondly, we establish, for the first time, the convergence of the unregularized pairwise learning algorithm with a general loss function and derive explicit rates under the assumption of polynomially decaying step sizes. Concrete examples are used to illustrate our main results. The main techniques are tools from convex analysis, refined inequalities of Gaussian averages [5], and an induction approach.  相似文献   

5.
Supervised learning methods are powerful techniques to learn a function from a given set of labeled data, the so-called training data. In this paper the support vector machines approach is applied to an image classification task. Starting with the corresponding Tikhonov regularization problem, reformulated as a convex optimization problem, we introduce a conjugate dual problem to it and prove that, whenever strong duality holds, the function to be learned can be expressed via the dual optimal solutions. Corresponding dual problems are then derived for different loss functions. The theoretical results are applied by numerically solving a classification task using high dimensional real-world data in order to obtain optimal classifiers. The results demonstrate the excellent performance of support vector classification for this particular problem.  相似文献   

6.
Classification on high-dimensional data with thousands to tens of thousands of dimensions is a challenging task due to the high dimensionality and the quality of the feature set. The problem can be addressed by using feature selection to choose only informative features or feature construction to create new high-level features. Genetic programming (GP) using a tree-based representation can be used for both feature construction and implicit feature selection. This work presents a comprehensive study to investigate the use of GP for feature construction and selection on high-dimensional classification problems. Different combinations of the constructed and/or selected features are tested and compared on seven high-dimensional gene expression problems, and different classification algorithms are used to evaluate their performance. The results show that the constructed and/or selected feature sets can significantly reduce the dimensionality and maintain or even increase the classification accuracy in most cases. The cases with overfitting occurred are analysed via the distribution of features. Further analysis is also performed to show why the constructed feature can achieve promising classification performance.  相似文献   

7.
Selecting important features in nonlinear kernel spaces is a difficult challenge in both classification and regression problems. This article proposes to achieve feature selection by optimizing a simple criterion: a feature-regularized loss function. Features within the kernel are weighted, and a lasso penalty is placed on these weights to encourage sparsity. This feature-regularized loss function is minimized by estimating the weights in conjunction with the coefficients of the original classification or regression problem, thereby automatically procuring a subset of important features. The algorithm, KerNel Iterative Feature Extraction (KNIFE), is applicable to a wide variety of kernels and high-dimensional kernel problems. In addition, a modification of KNIFE gives a computationally attractive method for graphically depicting nonlinear relationships between features by estimating their feature weights over a range of regularization parameters. The utility of KNIFE in selecting features through simulations and examples for both kernel regression and support vector machines is demonstrated. Feature path realizations also give graphical representations of important features and the nonlinear relationships among variables. Supplementary materials with computer code and an appendix on convergence analysis are available online.  相似文献   

8.
Order decomposable set problems are introduced as a general class of problems concerning sets of multidimensional objects. We give a method of structuring sets such that the answer to an order decomposable set problem can be maintained with low worst-case time bounds, while objects are inserted and deleted in the set. Examples include the maintenance of the two-dimensional Voronoi diagram of a set of points, and of the convex hull of a three-dimensional point set in linear time. We show that there is a strong connection between the general dynamization method given and the well-known technique of divide-and-conquer used for solving many problems concerning static sets of objects. Although the upper bounds obtained are low in order of magnitude, the results do not necessarily imply the existence of fast feasible update routines. Hence the results merely assess theoretical bounds for the various set problems considered.  相似文献   

9.
Bounds on convergence are given for a general class of nonlinear programming algorithms. Methods in this class generate at each interation both constraint multipliers and approximate solutions such that, under certain specified assumptions, accumulation points of the multiplier and solution sequences satisfy the Fritz John or the Kuhn—Tucker optimality conditions. Under stronger assumptions, convergence bounds are derived for the sequences of approximate solution, multiplier and objective function values. The theory is applied to an interior—exterior penalty function algorithm modified to allow for inexact subproblem solutions. An entirely new convergence bound in terms of the square root of the penalty controlling parameter is given for this algorithm.  相似文献   

10.
Combining multiple classifiers, known as ensemble methods, can give substantial improvement in prediction performance of learning algorithms especially in the presence of non-informative features in the data sets. We propose an ensemble of subset of kNN classifiers, ESkNN, for classification task in two steps. Firstly, we choose classifiers based upon their individual performance using the out-of-sample accuracy. The selected classifiers are then combined sequentially starting from the best model and assessed for collective performance on a validation data set. We use bench mark data sets with their original and some added non-informative features for the evaluation of our method. The results are compared with usual kNN, bagged kNN, random kNN, multiple feature subset method, random forest and support vector machines. Our experimental comparisons on benchmark classification problems and simulated data sets reveal that the proposed ensemble gives better classification performance than the usual kNN and its ensembles, and performs comparable to random forest and support vector machines.  相似文献   

11.
In the Knowledge Discovery Process, classification algorithms are often used to help create models with training data that can be used to predict the classes of untested data instances. While there are several factors involved with classification algorithms that can influence classification results, such as the node splitting measures used in making decision trees, feature selection is often used as a pre-classification step when using large data sets to help eliminate irrelevant or redundant attributes in order to increase computational efficiency and possibly to increase classification accuracy. One important factor common to both feature selection as well as to classification using decision trees is attribute discretization, which is the process of dividing attribute values into a smaller number of discrete values. In this paper, we will present and explore a new hybrid approach, ChiBlur, which involves the use of concepts from both the blurring and χ2-based approaches to feature selection, as well as concepts from multi-objective optimization. We will compare this new algorithm with algorithms based on the blurring and χ2-based approaches.  相似文献   

12.
RNA-sample pooling is sometimes inevitable, but should be avoided in classification tasks like biomarker studies. Our simulation framework investigates a two-class classification study based on gene expression profiles to point out how strong the outcomes of single sample designs differ to those of pooling designs. The results show how the effects of pooling depend on pool size, discriminating pattern, number of informative features and the statistical learning method used (support vector machines with linear and radial kernel, random forest (RF), linear discriminant analysis, powered partial least squares discriminant analysis (PPLS-DA) and partial least squares discriminant analysis (PLS-DA)). As a measure for the pooling effect, we consider prediction error (PE) and the coincidence of important feature sets for classification based on PLS-DA, PPLS-DA and RF. In general, PPLS-DA and PLS-DA show constant PE with increasing pool size and low PE for patterns for which the convex hull of one class is not a cover of the other class. The coincidence of important feature sets is larger for PLS-DA and PPLS-DA as it is for RF. RF shows the best results for patterns in which the convex hull of one class is a cover of the other class, but these depend strongly on the pool size. We complete the PE results with experimental data which we pool artificially. The PE of PPLS-DA and PLS-DA are again least influenced by pooling and are low. Additionally, we show under which assumption the PLS-DA loading weights, as a measure for importance of features regarding classification, are equal for the different designs.  相似文献   

13.
pth Power Lagrangian Method for Integer Programming   总被引:1,自引:0,他引:1  
When does there exist an optimal generating Lagrangian multiplier vector (that generates an optimal solution of an integer programming problem in a Lagrangian relaxation formulation), and in cases of nonexistence, can we produce the existence in some other equivalent representation space? Under what conditions does there exist an optimal primal-dual pair in integer programming? This paper considers both questions. A theoretical characterization of the perturbation function in integer programming yields a new insight on the existence of an optimal generating Lagrangian multiplier vector, the existence of an optimal primal-dual pair, and the duality gap. The proposed pth power Lagrangian method convexifies the perturbation function and guarantees the existence of an optimal generating Lagrangian multiplier vector. A condition for the existence of an optimal primal-dual pair is given for the Lagrangian relaxation method to be successful in identifying an optimal solution of the primal problem via the maximization of the Lagrangian dual. The existence of an optimal primal-dual pair is assured for cases with a single Lagrangian constraint, while adopting the pth power Lagrangian method. This paper then shows that an integer programming problem with multiple constraints can be always converted into an equivalent form with a single surrogate constraint. Therefore, success of a dual search is guaranteed for a general class of finite integer programming problems with a prominent feature of a one-dimensional dual search.  相似文献   

14.
In this paper, an improved Feature Extraction Method (FEM), which selects discriminative feature sets able to lead to high classification rates in pattern recognition tasks, is presented. The resulted features are the wavelet coefficients of an improved compressed signal, consisting of the Zernike moments amplitudes. By applying a straightforward methodology, it is aimed to construct optimal feature vectors in the sense of vector dimensionality and information content for classification purposes. The resulting surrogate feature vector is of lower dimensionality than the original Zernike moment feature vector and thus more appropriate for pattern recognition tasks.Appropriate validation tests have been arranged, in order to investigate the performance of the proposed algorithm by measuring the discriminative power of the new feature vectors despite the information loss.  相似文献   

15.
Neyman-Pearson classification has been studied in several articles before.But they all proceeded in the classes of indicator functions with indicator function as the loss function,which make the calculation to be difficult.This paper investigates NeymanPearson classification with convex loss function in the arbitrary class of real measurable functions.A general condition is given under which Neyman-Pearson classification with convex loss function has the same classifier as that with indicator loss function.We give analysis to NP-ERM with convex loss function and prove it's performance guarantees.An example of complexity penalty pair about convex loss function risk in terms of Rademacher averages is studied,which produces a tight PAC bound of the NP-ERM with convex loss function.  相似文献   

16.
We consider a class of vector optimization problems with linear restrictions in which each objective function is a sum of a linear function and of a norm of a linear vector function. Under some conditions we prove weak, direct and converse duality statements. In comparison with former papers the considered class is more general and our results are sharper.  相似文献   

17.
刘小茂  张钧 《应用数学》1998,11(4):63-66
对一般线性模型在平方损失函数下,得到了一维不可估参数函数的线性估计为可容许估计的充要条件,以及模型中参数向量(非线性可估)的线性估计为可容许估计的两个充要条件,并得到了多维参数函数(可估或不可估)的线性估计为可容许估计的一个充分条件以及特殊情况下的一个充要条件.  相似文献   

18.
19.
In this paper, we study regression problems over a separable Hilbert space with the square loss, covering non-parametric regression over a reproducing kernel Hilbert space. We investigate a class of spectral/regularized algorithms, including ridge regression, principal component regression, and gradient methods. We prove optimal, high-probability convergence results in terms of variants of norms for the studied algorithms, considering a capacity assumption on the hypothesis space and a general source condition on the target function. Consequently, we obtain almost sure convergence results with optimal rates. Our results improve and generalize previous results, filling a theoretical gap for the non-attainable cases.  相似文献   

20.
The Hierarchical Chinese Postman Problem is finding a shortest traversal of all edges of a graph respecting precedence constraints given by a partial order on classes of edges. We show that the special case with connected classes is NP-hard even on orders decomposable into a chain and an incomparable class. For the case with linearly ordered (possibly disconnected) classes, we get 5/3-approximations and fixed-parameter algorithms by transferring results from the Rural Postman Problem.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号