首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Learning gradients is one approach for variable selection and feature covariation estimation when dealing with large data of many variables or coordinates. In a classification setting involving a convex loss function, a possible algorithm for gradient learning is implemented by solving convex quadratic programming optimization problems induced by regularization schemes in reproducing kernel Hilbert spaces. The complexity for such an algorithm might be very high when the number of variables or samples is huge. We introduce a gradient descent algorithm for gradient learning in classification. The implementation of this algorithm is simple and its convergence is elegantly studied. Explicit learning rates are presented in terms of the regularization parameter and the step size. Deep analysis for approximation by reproducing kernel Hilbert spaces under some mild conditions on the probability measure for sampling allows us to deal with a general class of convex loss functions.  相似文献   

2.
The support vector machine (SVM) is a popular learning method for binary classification. Standard SVMs treat all the data points equally, but in some practical problems it is more natural to assign different weights to observations from different classes. This leads to a broader class of learning, the so-called weighted SVMs (WSVMs), and one of their important applications is to estimate class probabilities besides learning the classification boundary. There are two parameters associated with the WSVM optimization problem: one is the regularization parameter and the other is the weight parameter. In this article, we first establish that the WSVM solutions are jointly piecewise-linear with respect to both the regularization and weight parameter. We then develop a state-of-the-art algorithm that can compute the entire trajectory of the WSVM solutions for every pair of the regularization parameter and the weight parameter at a feasible computational cost. The derived two-dimensional solution surface provides theoretical insight on the behavior of the WSVM solutions. Numerically, the algorithm can greatly facilitate the implementation of the WSVM and automate the selection process of the optimal regularization parameter. We illustrate the new algorithm on various examples. This article has online supplementary materials.  相似文献   

3.
Classical robust statistical methods dealing with noisy data are often based on modifications of convex loss functions. In recent years, nonconvex loss-based robust methods have been increasingly popular. A nonconvex loss can provide robust estimation for data contaminated with outliers. The significant challenge is that a nonconvex loss can be numerically difficult to optimize. This article proposes quadratic majorization algorithm for nonconvex (QManc) loss. The QManc can decompose a nonconvex loss into a sequence of simpler optimization problems. Subsequently, the QManc is applied to a powerful machine learning algorithm: quadratic majorization boosting algorithm (QMBA). We develop QMBA for robust classification (binary and multi-category) and regression. In high-dimensional cancer genetics data and simulations, the QMBA is comparable with convex loss-based boosting algorithms for clean data, and outperforms the latter for data contaminated with outliers. The QMBA is also superior to boosting when directly implemented to optimize nonconvex loss functions. Supplementary material for this article is available online.  相似文献   

4.
The multinomial logit model is the most widely used model for the unordered multi-category responses. However, applications are typically restricted to the use of few predictors because in the high-dimensional case maximum likelihood estimates frequently do not exist. In this paper we are developing a boosting technique called multinomBoost that performs variable selection and fits the multinomial logit model also when predictors are high-dimensional. Since in multi-category models the effect of one predictor variable is represented by several parameters one has to distinguish between variable selection and parameter selection. A special feature of the approach is that, in contrast to existing approaches, it selects variables not parameters. The method can also distinguish between mandatory predictors and optional predictors. Moreover, it adapts to metric, binary, nominal and ordinal predictors. Regularization within the algorithm allows to include nominal and ordinal variables which have many categories. In the case of ordinal predictors the order information is used. The performance of boosting technique with respect to mean squared error, prediction error and the identification of relevant variables is investigated in a simulation study. The method is applied to the national Indonesia contraceptive prevalence survey and the identification of glass. Results are also compared with the Lasso approach which selects parameters.  相似文献   

5.
The boosting algorithm is one of the most successful binary classification techniques due to its relative immunity to overfitting and flexible implementation. Several attempts have been made to extend the binary boosting algorithm to multiclass classification. In this article, a novel cost-sensitive multiclass boosting algorithm is proposed that naturally extends the popular binary AdaBoost algorithm and admits unequal misclassification costs. The proposed multiclass boosting algorithm achieves superior classification performance by combining weak candidate models that only need to be better than random guessing. More importantly, the proposed algorithm achieves a large margin separation of the training sample while attaining an L1-norm constraint on the model complexity. Finally, the effectiveness of the proposed algorithm is demonstrated in a number of simulated and real experiments. The supplementary files are available online, including the technical proofs, the implemented R code, and the real datasets.  相似文献   

6.
研究数据集被分割并存储于不同处理器时的特征提取和变量选择问题,其中处理器通过某种网络结构相互连接.提出分布式L_(1/2)正则化方法,基于ADMM算法给出分布式L_(1/2)正则化算法,证明了算法的收敛性.算法通过相邻处理器之间完成信息交互,其变量选择结果与数据集不分割时利用L_(1/2)正则化相同.实验表明,所提出的新算法有效、实用,适合于分布式存储数据处理.  相似文献   

7.
Common approaches to monotonic regression focus on the case of a unidimensional covariate and continuous response variable. Here a general approach is proposed that allows for additive structures where one or more variables have monotone influence on the response variable. In addition the approach allows for response variables from an exponential family, including binary and Poisson distributed response variables. Flexibility of the smooth estimate is gained by expanding the unknown function in monotonic basis functions. For the estimation of coefficients and the selection of basis functions a likelihood-based boosting algorithm is proposed which is simple to implement. Stopping criteria and inference are based on AIC-type measures. The method is applied to several datasets.  相似文献   

8.
We consider a subproblem in parameter estimation using the Gauss-Newton algorithm with regularization for NURBS curve fitting. The NURBS curve is fitted to a set of data points in least-squares sense, where the sum of squared orthogonal distances is minimized. Control-points and weights are estimated. The knot-vector and the degree of the NURBS curve are kept constant. In the Gauss-Newton algorithm, a search direction is obtained from a linear overdetermined system with a Jacobian and a residual vector. Because of the properties of our problem, the Jacobian has a particular sparse structure which is suitable for performing a splitting of variables. We are handling the computational problems and report the obtained accuracy using different methods, and the elapsed real computational time. The splitting of variables is a two times faster method than using plain normal equations.  相似文献   

9.
Area under ROC curve (AUC) is a performance measure for classification models. We propose new distributionally robust AUC models (DR-AUC) that rely on the Kantorovich metric and approximate AUC with the hinge loss function, and derive convex reformulations using duality. The DR-AUC models outperform deterministic AUC and support vector machine models and have superior worst-case out-of-sample performance, thereby showing their robustness. The results are encouraging since the numerical experiments are conducted with small-size training sets conducive to low out-of-sample performance.  相似文献   

10.
In the receiver operating characteristic (ROC) analysis,the area under the ROC curve (AUC) is a popular summary index of discriminatory accuracy of a diagnostic test.Incorporating covariates into ROC analysis can improve the diagnostic accuracy of the test.Regression model for the AUC is a tool to evaluate the effects of the covariates on the diagnostic accuracy.In this paper,empirical likelihood (EL) method is proposed for the AUC regression model.For the regression parameter vector,it can be shown that the asymptotic distribution of its EL ratio statistic is a weighted sum of independent chi-square distributions.Confidence regions are constructed for the parameter vector based on the newly developed empirical likelihood theorem,as well as for the covariate-specific AUC.Simulation studies were conducted to compare the relative performance of the proposed EL-based methods with the existing method in AUC regression.Finally,the proposed methods are illustrated with a real data set.  相似文献   

11.
The main challenge in working with gene expression microarrays is that the sample size is small compared to the large number of variables (genes). In many studies, the main focus is on finding a small subset of the genes, which are the most important ones for differentiating between different types of cancer, for simpler and cheaper diagnostic arrays. In this paper, a sparse Bayesian variable selection method in probit model is proposed for gene selection and classification. We assign a sparse prior for regression parameters and perform variable selection by indexing the covariates of the model with a binary vector. The correlation prior for the binary vector assigned in this paper is able to distinguish models with the same size. The performance of the proposed method is demonstrated with one simulated data and two well known real data sets, and the results show that our method is comparable with other existing methods in variable selection and classification.  相似文献   

12.
Boosting is a successful method for dealing with problems of high-dimensional classification of independent data. However, existing variants do not address the correlations in the context of longitudinal or cluster study-designs with measurements collected across two or more time points or in clusters. This article presents two new variants of boosting with a focus on high-dimensional classification problems with matched-pair binary responses or, more generally, any correlated binary responses. The first method is based on the generic functional gradient descent algorithm and the second method is based on a direct likelihood optimization approach. The performance and the computational requirements of the algorithms were evaluated using simulations. Whereas the performance of the two methods is similar, the computational efficiency of the generic-functional-gradient-descent-based algorithm far exceeds that of the direct-likelihood-optimization-based algorithm. The former method is illustrated using data on gene expression changes in de novo and relapsed childhood acute lymphoblastic leukemia. Computer code implementing the algorithms and the relevant dataset are available online as supplemental materials.  相似文献   

13.
This article presents a likelihood-based boosting approach for fitting binary and ordinal mixed models. In contrast to common procedures, this approach can be used in high-dimensional settings where a large number of potentially influential explanatory variables are available. Constructed as a componentwise boosting method, it is able to perform variable selection with the complexity of the resulting estimator being determined by information criteria. The method is investigated in simulation studies both for cumulative and sequential models and is illustrated by using real datasets. The supplementary materials for the article are available online.  相似文献   

14.
In developing a classification model for assigning observations of unknown class to one of a number of specified classes using the values of a set of features associated with each observation, it is often desirable to base the classifier on a limited number of features. Mathematical programming discriminant analysis methods for developing classification models can be extended for feature selection. Classification accuracy can be used as the feature selection criterion by using a mixed integer programming (MIP) model in which a binary variable is associated with each training sample observation, but the binary variable requirements limit the size of problems to which this approach can be applied. Heuristic feature selection methods for problems with large numbers of observations are developed in this paper. These heuristic procedures, which are based on the MIP model for maximizing classification accuracy, are then applied to three credit scoring data sets.  相似文献   

15.
This paper proposes a new approach for variable selection in partially linear errors-in-variables (EV) models for longitudinal data by penalizing appropriate estimating functions. We apply the SCAD penalty to simultaneously select significant variables and estimate unknown parameters. The rate of convergence and the asymptotic normality of the resulting estimators are established. Furthermore, with proper choice of regularization parameters, we show that the proposed estimators perform as well as the oracle procedure. A new algorithm is proposed for solving penalized estimating equation. The asymptotic results are augmented by a simulation study.  相似文献   

16.
Distance weighted discrimination (DWD) was originally proposed to handle the data piling issue in the support vector machine. In this article, we consider the sparse penalized DWD for high-dimensional classification. The state-of-the-art algorithm for solving the standard DWD is based on second-order cone programming, however such an algorithm does not work well for the sparse penalized DWD with high-dimensional data. To overcome the challenging computation difficulty, we develop a very efficient algorithm to compute the solution path of the sparse DWD at a given fine grid of regularization parameters. We implement the algorithm in a publicly available R package sdwd. We conduct extensive numerical experiments to demonstrate the computational efficiency and classification performance of our method.  相似文献   

17.
We consider a prediction of a scalar variable based on both a function-valued variable and a finite number of real-valued variables. For the estimation of the regression parameters, which include the infinite dimensional function as well as the slope parameters for the real-valued variables, it is inevitable to impose some kind of regularization. We consider two different approaches, which are shown to achieve the same convergence rate of the mean squared prediction error under respective assumptions. One is based on functional principal components regression (FPCR) and the alternative is functional ridge regression (FRR) based on Tikhonov regularization. Also, numerical studies are carried out for a simulation data and a real data.  相似文献   

18.
The phenomenon of the limit of detection (LoD) often happens in many practical situations because of technique and instrument limitations. In the literature, some reports show that, in general, to apply conventional methods to evaluate the diagnostic power of variables while ignoring LoD could be seriously biased. Although the area under the receiver operating characteristic (ROC) curve can be estimated consistently if the distribution of variables are known. In practical situation, such information is usually not available. On the other hand, it has been proved that the area under ROC curve of a variable with a LoD and no distribution assumptions is usually biased no matter what kinds of replacement strategies are used. However, there is a lack of similar studies on the partial area under ROC curve (pAUC), and because this measure is usually preferred in practice, it is of interest to examine whether the estimate of pAUC of a variable measured with a LoD behaves the same. In this study, we found that for some LoD scenarios, and even without distribution assumption, consistent estimate of pAUC can be constructed. When the consistent estimate of pAUC cannot be obtained, the bias can be ineffectual in practical situations, and the proposed estimator can be a good approximation of pAUC. Numerical studies using simulated data sets and real data examples are reported.  相似文献   

19.
While graphical models for continuous data (Gaussian graphical models) and discrete data (Ising models) have been extensively studied, there is little work on graphical models for datasets with both continuous and discrete variables (mixed data), which are common in many scientific applications. We propose a novel graphical model for mixed data, which is simple enough to be suitable for high-dimensional data, yet flexible enough to represent all possible graph structures. We develop a computationally efficient regression-based algorithm for fitting the model by focusing on the conditional log-likelihood of each variable given the rest. The parameters have a natural group structure, and sparsity in the fitted graph is attained by incorporating a group lasso penalty, approximated by a weighted lasso penalty for computational efficiency. We demonstrate the effectiveness of our method through an extensive simulation study and apply it to a music annotation dataset (CAL500), obtaining a sparse and interpretable graphical model relating the continuous features of the audio signal to binary variables such as genre, emotions, and usage associated with particular songs. While we focus on binary discrete variables for the main presentation, we also show that the proposed methodology can be easily extended to general discrete variables.  相似文献   

20.
This paper investigates the feature subset selection problem for the binary classification problem using logistic regression model. We developed a modified discrete particle swarm optimization (PSO) algorithm for the feature subset selection problem. This approach embodies an adaptive feature selection procedure which dynamically accounts for the relevance and dependence of the features included the feature subset. We compare the proposed methodology with the tabu search and scatter search algorithms using publicly available datasets. The results show that the proposed discrete PSO algorithm is competitive in terms of both classification accuracy and computational performance.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号