首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
We consider linear programming approaches for support vector machines (SVM). The linear programming problems are introduced as an approximation of the quadratic programming problems commonly used in SVM. When we consider the kernel based nonlinear discriminators, the approximation can be viewed as kernel principle component analysis which generates an important subspace from the feature space characterized the kernel function. We show that any data points nonlinearly, and implicitly, projected into the feature space by kernel functions can be approximately expressed as points lying a low dimensional Euclidean space explicitly, which enables us to develop linear programming formulations for nonlinear discriminators. We also introduce linear programming formulations for multicategory classification problems. We show that the same maximal margin principle exploited in SVM can be involved into the linear programming formulations. Moreover, considering the low dimensional feature subspace extraction, we can generate nonlinear multicategory discriminators by solving linear programming problems.Numerical experiments on real world datasets are presented. We show that the fairly low dimensional feature subspace can achieve a reasonable accuracy, and that the linear programming formulations calculate discriminators efficiently. We also discuss a sampling strategy which might be crucial for huge datasets.  相似文献   

2.
Sliced inverse regression (SIR) is an important method for reducing the dimensionality of input variables. Its goal is to estimate the effective dimension reduction directions. In classification settings, SIR is closely related to Fisher discriminant analysis. Motivated by reproducing kernel theory, we propose a notion of nonlinear effective dimension reduction and develop a nonlinear extension of SIR called kernel SIR (KSIR). Both SIR and KSIR are based on principal component analysis. Alternatively, based on principal coordinate analysis, we propose the dual versions of SIR and KSIR, which we refer to as sliced coordinate analysis (SCA) and kernel sliced coordinate analysis (KSCA), respectively. In the classification setting, we also call them discriminant coordinate analysis and kernel discriminant coordinate analysis. The computational complexities of SIR and KSIR rely on the dimensionality of the input vector and the number of input vectors, respectively, while those of SCA and KSCA both rely on the number of slices in the output. Thus, SCA and KSCA are very efficient dimension reduction methods.  相似文献   

3.
We consider local polynomial fitting for estimating a regression function and its derivatives nonparametrically. This method possesses many nice features, among which automatic adaptation to the boundary and adaptation to various designs. A first contribution of this paper is the derivation of an optimal kernel for local polynomial regression, revealing that there is a universal optimal weighting scheme. Fan (1993, Ann. Statist., 21, 196-216) showed that the univariate local linear regression estimator is the best linear smoother, meaning that it attains the asymptotic linear minimax risk. Moreover, this smoother has high minimax risk. We show that this property also holds for the multivariate local linear regression estimator. In the univariate case we investigate minimax efficiency of local polynomial regression estimators, and find that the asymptotic minimax efficiency for commonly-used orders of fit is 100% among the class of all linear smoothers. Further, we quantify the loss in efficiency when going beyond this class.  相似文献   

4.
Quantile regression provides a more complete statistical analysis of the stochastic relationships among random variables. Sometimes quantile regression functions estimated at different orders can cross each other. We propose a new non-crossing quantile regression method using doubly penalized kernel machine (DPKM) which uses heteroscedastic location-scale model as basic model and estimates both location and scale functions simultaneously by kernel machines. The DPKM provides the satisfying solution to estimating non-crossing quantile regression functions when multiple quantiles for high-dimensional data are needed. We also present the model selection method that employs cross validation techniques for choosing the parameters which affect the performance of the DPKM. One real example and two synthetic examples are provided to show the usefulness of the DPKM.  相似文献   

5.
Support Vector Machines (SVMs) are now very popular as a powerful method in pattern classification problems. One of main features of SVMs is to produce a separating hyperplane which maximizes the margin in feature space induced by nonlinear mapping using kernel function. As a result, SVMs can treat not only linear separation but also nonlinear separation. While the soft margin method of SVMs considers only the distance between separating hyperplane and misclassified data, we propose in this paper multi-objective programming formulation considering surplus variables. A similar formulation was extensively researched in linear discriminant analysis mostly in 1980s by using Goal Programming(GP). This paper compares these conventional methods such as SVMs and GP with our proposed formulation through several examples.Received: September 2003, Revised: December 2003,  相似文献   

6.
When a radial basis function network (RBFN) is used for identification of a nonlinear multi-input multi-output (MIMO) system, the number of hidden layer nodes, the initial parameters of the kernel, and the initial weights of the network must be determined first. For this purpose, a systematic way that integrates the support vector regression (SVR) and the least squares regression (LSR) is proposed to construct the initial structure of the RBFN. The first step of the proposed method is to determine the number of hidden layer nodes and the initial parameters of the kernel by the SVR method. Then the weights of the RBFN are determined by solving a simple minimization problem based on the concept of LSR. After initialization, an annealing robust learning algorithm (ARLA) is then applied to train the RBFN. With the proposed initialization approach, one can find that the designed RBFN has few hidden layer nodes while maintaining good performance. To show the feasibility and superiority of the annealing robust radial basis function networks (ARRBFNs) for identification of MIMO systems, several illustrative examples are included.  相似文献   

7.
The existing support vector machines (SVMs) are all assumed that all the features of training samples have equal contributions to construct the optimal separating hyperplane. However, for a certain real-world data set, some features of it may possess more relevances to the classification information, while others may have less relevances. In this paper, the linear feature-weighted support vector machine (LFWSVM) is proposed to deal with the problem. Two phases are employed to construct the proposed model. First, the mutual information (MI) based approach is used to assign appropriate weights for each feature of the whole given data set. Second, the proposed model is trained by the samples with their features weighted by the obtained feature weight vector. Meanwhile, the feature weights are embedded in the quadratic programming through detailed theoretical deduction to obtain the dual solution to the original optimization problem. Although the calculation of feature weights may add an extra computational cost, the proposed model generally exhibits better generalization performance over the traditional support vector machine (SVM) with linear kernel function. Experimental results upon one synthetic data set and several benchmark data sets confirm the benefits in using the proposed method. Moreover, it is also shown in experiments that the proposed MI based approach to determining feature weights is superior to the other two mostly used methods.  相似文献   

8.
We describe adaptive Markov chain Monte Carlo (MCMC) methods for sampling posterior distributions arising from Bayesian variable selection problems. Point-mass mixture priors are commonly used in Bayesian variable selection problems in regression. However, for generalized linear and nonlinear models where the conditional densities cannot be obtained directly, the resulting mixture posterior may be difficult to sample using standard MCMC methods due to multimodality. We introduce an adaptive MCMC scheme that automatically tunes the parameters of a family of mixture proposal distributions during simulation. The resulting chain adapts to sample efficiently from multimodal target distributions. For variable selection problems point-mass components are included in the mixture, and the associated weights adapt to approximate marginal posterior variable inclusion probabilities, while the remaining components approximate the posterior over nonzero values. The resulting sampler transitions efficiently between models, performing parameter estimation and variable selection simultaneously. Ergodicity and convergence are guaranteed by limiting the adaptation based on recent theoretical results. The algorithm is demonstrated on a logistic regression model, a sparse kernel regression, and a random field model from statistical biophysics; in each case the adaptive algorithm dramatically outperforms traditional MH algorithms. Supplementary materials for this article are available online.  相似文献   

9.
非线性回归模型中的约束拟似然   总被引:1,自引:0,他引:1  
韩郁葱 《大学数学》2005,21(3):45-51
在非线性回归模型中,拟得分函数是一类线性无偏估计函数中的最优者(GodambeandHeyde(1987),朱仲义(1996)),而由拟得分函数得到的拟似然估计在由线性无偏估计函数得到的估计类中具有渐近最优性(林路(1999)).本文则研究非线性回归模型中的有偏估计函数理论,构造了参数的约束拟似然估计,得到了约束拟似然的局部最优性,局部改进了拟似然估计,从而扩充了线性模型中的有偏估计理论.  相似文献   

10.
The curse of dimensionality is based on the fact that high dimensional data is often difficult to work with. A large number of features can increase the noise of the data and thus the error of a learning algorithm. Feature selection is a solution for such problems where there is a need to reduce the data dimensionality. Different feature selection algorithms may yield feature subsets that can be considered local optima in the space of feature subsets. Ensemble feature selection combines independent feature subsets and might give a better approximation to the optimal subset of features. We propose an ensemble feature selection approach based on feature selectors’ reliability assessment. It aims at providing a unique and stable feature selection without ignoring the predictive accuracy aspect. A classification algorithm is used as an evaluator to assign a confidence to features selected by ensemble members based on their associated classification performance. We compare our proposed approach to several existing techniques and to individual feature selection algorithms. Results show that our approach often improves classification performance and feature selection stability for high dimensional data sets.  相似文献   

11.
Learning gradients is one approach for variable selection and feature covariation estimation when dealing with large data of many variables or coordinates. In a classification setting involving a convex loss function, a possible algorithm for gradient learning is implemented by solving convex quadratic programming optimization problems induced by regularization schemes in reproducing kernel Hilbert spaces. The complexity for such an algorithm might be very high when the number of variables or samples is huge. We introduce a gradient descent algorithm for gradient learning in classification. The implementation of this algorithm is simple and its convergence is elegantly studied. Explicit learning rates are presented in terms of the regularization parameter and the step size. Deep analysis for approximation by reproducing kernel Hilbert spaces under some mild conditions on the probability measure for sampling allows us to deal with a general class of convex loss functions.  相似文献   

12.
The presence of less relevant or highly correlated features often decrease classification accuracy. Feature selection in which most informative variables are selected for model generation is an important step in data-driven modeling. In feature selection, one often tries to satisfy multiple criteria such as feature discriminating power, model performance or subset cardinality. Therefore, a multi-objective formulation of the feature selection problem is more appropriate. In this paper, we propose to use fuzzy criteria in feature selection by using a fuzzy decision making framework. This formulation allows for a more flexible definition of the goals in feature selection, and avoids the problem of weighting different goals is classical multi-objective optimization. The optimization problem is solved using an ant colony optimization algorithm proposed in our previous work. We illustrate the added value of the approach by applying our proposed fuzzy feature selection algorithm to eight benchmark problems.  相似文献   

13.
In this paper, we propose a new optimization framework for improving feature selection in medical data classification. We call this framework Support Feature Machine (SFM). The use of SFM in feature selection is to find the optimal group of features that show strong separability between two classes. The separability is measured in terms of inter-class and intra-class distances. The objective of SFM optimization model is to maximize the correctly classified data samples in the training set, whose intra-class distances are smaller than inter-class distances. This concept can be incorporated with the modified nearest neighbor rule for unbalanced data. In addition, a variation of SFM that provides the feature weights (prioritization) is also presented. The proposed SFM framework and its extensions were tested on 5 real medical datasets that are related to the diagnosis of epilepsy, breast cancer, heart disease, diabetes, and liver disorders. The classification performance of SFM is compared with those of support vector machine (SVM) classification and Logical Data Analysis (LAD), which is also an optimization-based feature selection technique. SFM gives very good classification results, yet uses far fewer features to make the decision than SVM and LAD. This result provides a very significant implication in diagnostic practice. The outcome of this study suggests that the SFM framework can be used as a quick decision-making tool in real clinical settings.  相似文献   

14.
Analysis of Support Vector Machines Regression   总被引:1,自引:0,他引:1  
Support vector machines regression (SVMR) is a regularized learning algorithm in reproducing kernel Hilbert spaces with a loss function called the ε-insensitive loss function. Compared with the well-understood least square regression, the study of SVMR is not satisfactory, especially the quantitative estimates of the convergence of this algorithm. This paper provides an error analysis for SVMR, and introduces some recently developed methods for analysis of classification algorithms such as the projection operator and the iteration technique. The main result is an explicit learning rate for the SVMR algorithm under some assumptions. Research supported by NNSF of China No. 10471002, No. 10571010 and RFDP of China No. 20060001010.  相似文献   

15.
Feature Selection (FS) is an important pre-processing step in data mining and classification tasks. The aim of FS is to select a small subset of most important and discriminative features. All the traditional feature selection methods assume that the entire input feature set is available from the beginning. However, online streaming features (OSF) are an integral part of many real-world applications. In OSF, the number of training examples is fixed while the number of features grows with time as new features stream in. A critical challenge for online streaming feature selection (OSFS) is the unavailability of the entire feature set before learning starts. Several efforts have been made to address the OSFS problem, however they all need some prior knowledge about the entire feature space to select informative features. In this paper, the OSFS problem is considered from the rough sets (RS) perspective and a new OSFS algorithm, called OS-NRRSAR-SA, is proposed. The main motivation for this consideration is that RS-based data mining does not require any domain knowledge other than the given dataset. The proposed algorithm uses the classical significance analysis concepts in RS theory to control the unknown feature space in OSFS problems. This algorithm is evaluated extensively on several high-dimensional datasets in terms of compactness, classification accuracy, run-time, and robustness against noises. Experimental results demonstrate that the algorithm achieves better results than existing OSFS algorithms, in every way.  相似文献   

16.
Dimensionality reduction is an important technique in surrogate modeling and machine learning. In this article, we propose a supervised dimensionality reduction method, “least squares regression principal component analysis” (LSR-PCA), applicable to both classification and regression problems. To show the efficacy of this method, we present different examples in visualization, classification, and regression problems, comparing it with several state-of-the-art dimensionality reduction methods. Finally, we present a kernel version of LSR-PCA for problems where the inputs are correlated nonlinearly. The examples demonstrate that LSR-PCA can be a competitive dimensionality reduction method.  相似文献   

17.
Multiclass classification and probability estimation have important applications in data analytics. Support vector machines (SVMs) have shown great success in various real-world problems due to their high classification accuracy. However, one main limitation of standard SVMs is that they do not provide class probability estimates, and thus fail to offer uncertainty measure about class prediction. In this article, we propose a simple yet effective framework to endow kernel SVMs with the feature of multiclass probability estimation. The new probability estimator does not rely on any parametric assumption on the data distribution, therefore, it is flexible and robust. Theoretically, we show that the proposed estimator is asymptotically consistent. Computationally, the new procedure can be conveniently implemented using standard SVM softwares. Our extensive numerical studies demonstrate competitive performance of the new estimator when compared with existing methods such as multiple logistic regression, linear discrimination analysis, tree-based methods, and random forest, under various classification settings. Supplementary materials for this article are available online.  相似文献   

18.
This paper investigates the feature subset selection problem for the binary classification problem using logistic regression model. We developed a modified discrete particle swarm optimization (PSO) algorithm for the feature subset selection problem. This approach embodies an adaptive feature selection procedure which dynamically accounts for the relevance and dependence of the features included the feature subset. We compare the proposed methodology with the tabu search and scatter search algorithms using publicly available datasets. The results show that the proposed discrete PSO algorithm is competitive in terms of both classification accuracy and computational performance.  相似文献   

19.
针对肿瘤的早期诊断,提出了一种基于提升小波变换的特征提取的方法,对肿瘤数据样本进行分析鉴别.该方法利用提升小波变换对190例肝癌(包括对照)和107例肺癌(包括对照)基因表达谱芯片数据进行处理后,提取信号的低频信息,经支持向量机训练学习,构造分类器模型,用于癌和非癌样本的区分甄别.实验结果表明,经提升小波变换提取的特征基因,送入分类器中能得到较高的分类率,且在支持向量机中选取线性核函数或径向基函数都能达到较好的分类效果.通过随机选取的20例基因表达谱芯片样本,对所建立的模型进行了测试,获得了很好的效果,因此,本文提出的方法对肿瘤的诊断有一定的应用意义.  相似文献   

20.
讨论了在强相关数据情形下对回归函数的小波估计,并且给出了估计量的均方误差的一个渐近展开表示式. 对研究估计量的优劣,所推导的近似表示式显得非常重要.对一般的回归函数核估计,如果回归函数不是充分光滑,这个均方误差表示式并不成立A·D2但对小波估计,即使回归函数间断连续,这个均方误差表示式仍然成立.因此,小波估计的收敛速度要比核估计来得快,从而小波估计在某种程度上改进了现有的核估计.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号