首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Many least-square problems involve affine equality and inequality constraints. Although there are a variety of methods for solving such problems, most statisticians find constrained estimation challenging. The current article proposes a new path-following algorithm for quadratic programming that replaces hard constraints by what are called exact penalties. Similar penalties arise in l 1 regularization in model selection. In the regularization setting, penalties encapsulate prior knowledge, and penalized parameter estimates represent a trade-off between the observed data and the prior knowledge. Classical penalty methods of optimization, such as the quadratic penalty method, solve a sequence of unconstrained problems that put greater and greater stress on meeting the constraints. In the limit as the penalty constant tends to ∞, one recovers the constrained solution. In the exact penalty method, squared penalties are replaced by absolute value penalties, and the solution is recovered for a finite value of the penalty constant. The exact path-following method starts at the unconstrained solution and follows the solution path as the penalty constant increases. In the process, the solution path hits, slides along, and exits from the various constraints. Path following in Lasso penalized regression, in contrast, starts with a large value of the penalty constant and works its way downward. In both settings, inspection of the entire solution path is revealing. Just as with the Lasso and generalized Lasso, it is possible to plot the effective degrees of freedom along the solution path. For a strictly convex quadratic program, the exact penalty algorithm can be framed entirely in terms of the sweep operator of regression analysis. A few well-chosen examples illustrate the mechanics and potential of path following. This article has supplementary materials available online.  相似文献   

2.
Abstract

Bridge regression, a special family of penalized regressions of a penalty function Σ|βj|γ with γ ≤ 1, considered. A general approach to solve for the bridge estimator is developed. A new algorithm for the lasso (γ = 1) is obtained by studying the structure of the bridge estimators. The shrinkage parameter γ and the tuning parameter λ are selected via generalized cross-validation (GCV). Comparison between the bridge model (γ ≤ 1) and several other shrinkage models, namely the ordinary least squares regression (λ = 0), the lasso (γ = 1) and ridge regression (γ = 2), is made through a simulation study. It is shown that the bridge regression performs well compared to the lasso and ridge regression. These methods are demonstrated through an analysis of a prostate cancer data. Some computational advantages and limitations are discussed.  相似文献   

3.
High angular resolution diffusion imaging (HARDI) has recently been of great interest in mapping the orientation of intravoxel crossing fibers, and such orientation information allows one to infer the connectivity patterns prevalent among different brain regions and possible changes in such connectivity over time for various neurodegenerative and neuropsychiatric diseases. The aim of this article is to propose a penalized multiscale adaptive regression model (PMARM) framework to spatially and adaptively infer the orientation distribution function (ODF) of water diffusion in regions with complex fiber configurations. In PMARM, we reformulate the HARDI imaging reconstruction as a weighted regularized least-square regression (WRLSR) problem. Similarity and distance weights are introduced to account for spatial smoothness of HARDI, while preserving the unknown discontinuities (e.g., edges between white matter and gray matter) of HARDI. The L1 penalty function is introduced to ensure the sparse solutions of ODFs, while a scaled L1 weighted estimator is calculated to correct the bias introduced by the L1 penalty at each voxel. In PMARM, we integrate the multiscale adaptive regression models, the propagation-separation method, and Lasso (least absolute shrinkage and selection operator) to adaptively estimate ODFs across voxels. Experimental results indicate that PMARM can reduce the angle detection errors on fiber crossing area and provide more accurate reconstruction than standard voxel-wise methods. Supplementary materials for this article are available online.  相似文献   

4.
The Lasso is a very well-known penalized regression model, which adds an L1 penalty with parameter λ1 on the coefficients to the squared error loss function. The Fused Lasso extends this model by also putting an L1 penalty with parameter λ2 on the difference of neighboring coefficients, assuming there is a natural ordering. In this article, we develop a path algorithm for solving the Fused Lasso Signal Approximator that computes the solutions for all values of λ1 and λ2. We also present an approximate algorithm that has considerable speed advantages for a moderate trade-off in accuracy. In the Online Supplement for this article, we provide proofs and further details for the methods developed in the article.  相似文献   

5.
In this paper, we consider the issue of variable selection in partial linear single-index models under the assumption that the vector of regression coefficients is sparse. We apply penalized spline to estimate the nonparametric function and SCAD penalty to achieve sparse estimates of regression parameters in both the linear and single-index parts of the model. Under some mild conditions, it is shown that the penalized estimators have oracle property, in the sense that it is asymptotically normal with the same mean and covariance that they would have if zero coefficients are known in advance. Our model owns a least square representation, therefore standard least square programming algorithms can be implemented without extra programming efforts. In the meantime, parametric estimation, variable selection and nonparametric estimation can be realized in one step, which incredibly increases computational stability. The finite sample performance of the penalized estimators is evaluated through Monte Carlo studies and illustrated with a real data set.  相似文献   

6.
We consider a new method for sparse covariance matrix estimation which is motivated by previous results for the so-called Stein-type estimators. Stein proposed a method for regularizing the sample covariance matrix by shrinking together the eigenvalues; the amount of shrinkage is chosen to minimize an unbiased estimate of the risk (UBEOR) under the entropy loss function. The resulting estimator has been shown in simulations to yield significant risk reductions over the maximum likelihood estimator. Our method extends the UBEOR minimization problem by adding an ?1 penalty on the entries of the estimated covariance matrix, which encourages a sparse estimate. For a multivariate Gaussian distribution, zeros in the covariance matrix correspond to marginal independences between variables. Unlike the ?1-penalized Gaussian likelihood function, our penalized UBEOR objective is convex and can be minimized via a simple block coordinate descent procedure. We demonstrate via numerical simulations and an analysis of microarray data from breast cancer patients that our proposed method generally outperforms other methods for sparse covariance matrix estimation and can be computed efficiently even in high dimensions.  相似文献   

7.
In this article, we present a fast and stable algorithm for solving a class of optimization problems that arise in many statistical estimation procedures, such as sparse fused lasso over a graph, convex clustering, and trend filtering, among others. We propose a so-called augmented alternating direction methods of multipliers (ADMM) algorithm to solve this class of problems. Compared to a standard ADMM algorithm, our proposal significantly reduces the computational cost at each iteration while maintaining roughly the same overall convergence speed. We also consider a new varying penalty scheme for the ADMM algorithm, which could further accelerate the convergence, especially when solving a sequence of problems with tuning parameters of different scales. Extensive numerical experiments on the sparse fused lasso problem show that the proposed algorithm is more efficient than the standard ADMM and two other existing state-of-the-art specialized algorithms. Finally, we discuss a possible extension and some interesting connections to two well-known algorithms. Supplementary materials for the article are available online.  相似文献   

8.
We propose and study a new iterative coordinate descent algorithm (QICD) for solving nonconvex penalized quantile regression in high dimension. By permitting different subsets of covariates to be relevant for modeling the response variable at different quantiles, nonconvex penalized quantile regression provides a flexible approach for modeling high-dimensional data with heterogeneity. Although its theory has been investigated recently, its computation remains highly challenging when p is large due to the nonsmoothness of the quantile loss function and the nonconvexity of the penalty function. Existing coordinate descent algorithms for penalized least-squares regression cannot be directly applied. We establish the convergence property of the proposed algorithm under some regularity conditions for a general class of nonconvex penalty functions including popular choices such as SCAD (smoothly clipped absolute deviation) and MCP (minimax concave penalty). Our Monte Carlo study confirms that QICD substantially improves the computational speed in the p ? n setting. We illustrate the application by analyzing a microarray dataset.  相似文献   

9.
We investigate a robust penalized logistic regression algorithm based on a minimum distance criterion. Influential outliers are often associated with the explosion of parameter vector estimates, but in the context of standard logistic regression, the bias due to outliers always causes the parameter vector to implode, that is, shrink toward the zero vector. Thus, using LASSO-like penalties to perform variable selection in the presence of outliers can result in missed detections of relevant covariates. We show that by choosing a minimum distance criterion together with an elastic net penalty, we can simultaneously find a parsimonious model and avoid estimation implosion even in the presence of many outliers in the important small n large p situation. Minimizing the penalized minimum distance criterion is a challenging problem due to its nonconvexity. To meet the challenge, we develop a simple and efficient MM (majorization–minimization) algorithm that can be adapted gracefully to the small n large p context. Performance of our algorithm is evaluated on simulated and real datasets. This article has supplementary materials available online.  相似文献   

10.
Selecting important features in nonlinear kernel spaces is a difficult challenge in both classification and regression problems. This article proposes to achieve feature selection by optimizing a simple criterion: a feature-regularized loss function. Features within the kernel are weighted, and a lasso penalty is placed on these weights to encourage sparsity. This feature-regularized loss function is minimized by estimating the weights in conjunction with the coefficients of the original classification or regression problem, thereby automatically procuring a subset of important features. The algorithm, KerNel Iterative Feature Extraction (KNIFE), is applicable to a wide variety of kernels and high-dimensional kernel problems. In addition, a modification of KNIFE gives a computationally attractive method for graphically depicting nonlinear relationships between features by estimating their feature weights over a range of regularization parameters. The utility of KNIFE in selecting features through simulations and examples for both kernel regression and support vector machines is demonstrated. Feature path realizations also give graphical representations of important features and the nonlinear relationships among variables. Supplementary materials with computer code and an appendix on convergence analysis are available online.  相似文献   

11.
We propose a new algorithm for sparse estimation of eigenvectors in generalized eigenvalue problems (GEPs). The GEP arises in a number of modern data-analytic situations and statistical methods, including principal component analysis (PCA), multiclass linear discriminant analysis (LDA), canonical correlation analysis (CCA), sufficient dimension reduction (SDR), and invariant co-ordinate selection. We propose to modify the standard generalized orthogonal iteration with a sparsity-inducing penalty for the eigenvectors. To achieve this goal, we generalize the equation-solving step of orthogonal iteration to a penalized convex optimization problem. The resulting algorithm, called penalized orthogonal iteration, provides accurate estimation of the true eigenspace, when it is sparse. Also proposed is a computationally more efficient alternative, which works well for PCA and LDA problems. Numerical studies reveal that the proposed algorithms are competitive, and that our tuning procedure works well. We demonstrate applications of the proposed algorithm to obtain sparse estimates for PCA, multiclass LDA, CCA, and SDR. Supplementary materials for this article are available online.  相似文献   

12.
We study the problem of learning a sparse linear regression vector under additional conditions on the structure of its sparsity pattern. This problem is relevant in machine learning, statistics and signal processing. It is well known that a linear regression can benefit from knowledge that the underlying regression vector is sparse. The combinatorial problem of selecting the nonzero components of this vector can be “relaxed” by regularizing the squared error with a convex penalty function like the ?1 norm. However, in many applications, additional conditions on the structure of the regression vector and its sparsity pattern are available. Incorporating this information into the learning method may lead to a significant decrease of the estimation error. In this paper, we present a family of convex penalty functions, which encode prior knowledge on the structure of the vector formed by the absolute values of the regression coefficients. This family subsumes the ?1 norm and is flexible enough to include different models of sparsity patterns, which are of practical and theoretical importance. We establish the basic properties of these penalty functions and discuss some examples where they can be computed explicitly. Moreover, we present a convergent optimization algorithm for solving regularized least squares with these penalty functions. Numerical simulations highlight the benefit of structured sparsity and the advantage offered by our approach over the Lasso method and other related methods.  相似文献   

13.
The EM algorithm is a widely used methodology for penalized likelihood estimation. Provable monotonicity and convergence are the hallmarks of the EM algorithm and these properties are well established for smooth likelihood and smooth penalty functions. However, many relaxed versions of variable selection penalties are not smooth. In this paper, we introduce a new class of space alternating penalized Kullback proximal extensions of the EM algorithm for nonsmooth likelihood inference. We show that the cluster points of the new method are stationary points even when they lie on the boundary of the parameter set. We illustrate the new class of algorithms for the problems of model selection for finite mixtures of regression and of sparse image reconstruction.  相似文献   

14.
We derive a sharp nonasymptotic bound of parameter estimation of the L1/2 regularization.The bound shows that the solutions of the L1/2 regularization can achieve a loss within logarithmic factor of an ideal mean squared error and therefore underlies the feasibility and effectiveness of the L1/2regularization.Interestingly,when applied to compressive sensing,the L1/2 regularization scheme has exhibited a very promising capability of completed recovery from a much less sampling information.As compared with the Lp(0 p 1) penalty,it is appeared that the L1/2 penalty can always yield the most sparse solution among all the Lp penalty when 1/2 ≤ p 1,and when 0 p 1/2,the Lp penalty exhibits the similar properties as the L1/2 penalty.This suggests that the L1/2 regularization scheme can be accepted as the best and therefore the representative of all the Lp(0 p 1) regularization schemes.  相似文献   

15.
Boosting in the context of linear regression has become more attractive with the invention of least angle regression (LARS), where the connection between the lasso and forward stagewise fitting (boosting) has been established. Earlier it has been found that boosting is a functional gradient optimization. Instead of the gradient, we propose a conjugate direction method (CDBoost). As a result, we obtain a fast forward stepwise variable selection algorithm. The conjugate direction of CDBoost is analogous to the constrained gradient in boosting. Using this analogy, we generalize CDBoost to: (1) include small step sizes (shrinkage) which often improves prediction accuracy; and (2) the nonparametric setting with fitting methods such as trees or splines, where least angle regression and the lasso seem to be unfeasible. The step size in CDBoost has a tendency to govern the degree between L0- and L1-penalization. This makes CDBoost surprisingly flexible. We compare the different methods on simulated and real datasets. CDBoost achieves the best predictions mainly in complicated settings with correlated covariates, where it is difficult to determine the contribution of a given covariate to the response. The gain of CDBoost over boosting is especially high in sparse cases with high signal to noise ratio and few effective covariates.  相似文献   

16.
The hybrid Huberized support vector machine (HHSVM) has proved its advantages over the ?1 support vector machine (SVM) in terms of classification and variable selection. Similar to the ?1 SVM, the HHSVM enjoys a piecewise linear path property and can be computed by a least-angle regression (LARS)-type piecewise linear solution path algorithm. In this article, we propose a generalized coordinate descent (GCD) algorithm for computing the solution path of the HHSVM. The GCD algorithm takes advantage of a majorization–minimization trick to make each coordinatewise update simple and efficient. Extensive numerical experiments show that the GCD algorithm is much faster than the LARS-type path algorithm. We further extend the GCD algorithm to solve a class of elastic net penalized large margin classifiers, demonstrating the generality of the GCD algorithm. We have implemented the GCD algorithm in a publicly available R package gcdnet.  相似文献   

17.
We propose a new binary classification and variable selection technique especially designed for high-dimensional predictors. Among many predictors, typically, only a small fraction of them have significant impact on prediction. In such a situation, more interpretable models with better prediction accuracy can be obtained by variable selection along with classification. By adding an ?1-type penalty to the loss function, common classification methods such as logistic regression or support vector machines (SVM) can perform variable selection. Existing penalized SVM methods all attempt to jointly solve all the parameters involved in the penalization problem altogether. When data dimension is very high, the joint optimization problem is very complex and involves a lot of memory allocation. In this article, we propose a new penalized forward search technique that can reduce high-dimensional optimization problems to one-dimensional optimization by iterating the selection steps. The new algorithm can be regarded as a forward selection version of the penalized SVM and its variants. The advantage of optimizing in one dimension is that the location of the optimum solution can be obtained with intelligent search by exploiting convexity and a piecewise linear or quadratic structure of the criterion function. In each step, the predictor that is most able to predict the outcome is chosen in the model. The search is then repeatedly used in an iterative fashion until convergence occurs. Comparison of our new classification rule with ?1-SVM and other common methods show very promising performance, in that the proposed method leads to much leaner models without compromising misclassification rates, particularly for high-dimensional predictors.  相似文献   

18.
We investigate the difference between using an ?1 penalty versus an ?1 constraint in generalized eigenvalue problems arising in multivariate analysis. Our main finding is that the ?1 penalty may fail to provide very sparse solutions; a severe disadvantage for variable selection that can be remedied by using an ?1 constraint. Our claims are supported both by empirical evidence and theoretical analysis. Finally, we illustrate the advantages of the ?1 constraint in the context of discriminant analysis and principal component analysis. Supplementary materials for this article are available online.  相似文献   

19.
In this paper,distributed estimation of high-dimensional sparse precision matrix is proposed based on the debiased D-trace loss penalized lasso and the hard threshold method when samples are distributed into different machines for transelliptical graphical models.At a certain level of sparseness,this method not only achieves the correct selection of non-zero elements of sparse precision matrix,but the error rate can be comparable to the estimator in a non-distributed setting.The numerical results further prove that the proposed distributed method is more effective than the usual average method.  相似文献   

20.
The least angle regression (LAR) was proposed by Efron, Hastie, Johnstone and Tibshirani in the year 2004 for continuous model selection in linear regression. It is motivated by a geometric argument and tracks a path along which the predictors enter successively and the active predictors always maintain the same absolute correlation (angle) with the residual vector. Although it gains popularity quickly, its extensions seem rare compared to the penalty methods. In this expository article, we show that the powerful geometric idea of LAR can be generalized in a fruitful way. We propose a ConvexLAR algorithm that works for any convex loss function and naturally extends to group selection and data adaptive variable selection. After simple modification, it also yields new exact path algorithms for certain penalty methods such as a convex loss function with lasso or group lasso penalty. Variable selection in recurrent event and panel count data analysis, Ada-Boost, and Gaussian graphical model is reconsidered from the ConvexLAR angle. Supplementary materials for this article are available online.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号