首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Learning gradients is one approach for variable selection and feature covariation estimation when dealing with large data of many variables or coordinates. In a classification setting involving a convex loss function, a possible algorithm for gradient learning is implemented by solving convex quadratic programming optimization problems induced by regularization schemes in reproducing kernel Hilbert spaces. The complexity for such an algorithm might be very high when the number of variables or samples is huge. We introduce a gradient descent algorithm for gradient learning in classification. The implementation of this algorithm is simple and its convergence is elegantly studied. Explicit learning rates are presented in terms of the regularization parameter and the step size. Deep analysis for approximation by reproducing kernel Hilbert spaces under some mild conditions on the probability measure for sampling allows us to deal with a general class of convex loss functions.  相似文献   

2.
Two quadratically convergent gradient methods for minimizing an unconstrained function of several variables are examined. The heart of the Fletcher and Powell reformulation of Davidon's method is a variableH-matrix. The eigenvalues and eigenvectors of this matrix for a quadratic function are explored, leading to a proof that the gradient vectors at each step are mutually orthogonal. From this, a geometric interpretation of theH-matrix in terms of the projection of the gradient into a solution subspace is derived. These properties are then used to arrive at the main result, which states that, for a quadratic function, the direction vectors generated by the Davidon algorithm and the conjugate-gradient algorithm of Hestenes and Stiefel are scalar multiples of each other, provided the initial step each takes is in the direction of steepest descent. If one assumes no round-off error and a perfect one-dimensional search, the methods generate identical steps leading to the minimum.It is also shown that, for a quadratic function, the Davidon algorithm has a simplified version which searches in the same directions. However, the unique advantage of the original scheme, that it yields the curvature of the function at the minimum, is sacrificed for simplicity.Although these results apply to the case of a quadratic function, a comparative study of the same algorithms for a general function can be found in a companion paper.This research was carried out under Contract No. NAS 9-4036, NASA-Manned Spacecraft Center, Houston, Texas. The author is indebted to Dr. H. J. Kelley, whose suggestions and encouragement provided the inspiration for this paper.  相似文献   

3.
The gradient sampling (GS) algorithm for minimizing a nonconvex, nonsmooth function was proposed by Burke et al. (SIAM J Optim 15:751–779, 2005), whose most interesting feature is the use of randomly sampled gradients instead of subgradients. In this paper, combining the GS technique with the sequential quadratic programming (SQP) method, we present a feasible SQP-GS algorithm that extends the GS algorithm to nonconvex, nonsmooth constrained optimization. The proposed algorithm generates a sequence of feasible iterates, and guarantees that the objective function is monotonically decreasing. Global convergence is proved in the sense that, with probability one, every cluster point of the iterative sequence is stationary for the improvement function. Finally, some preliminary numerical results show that the proposed algorithm is effective.  相似文献   

4.
本文研究球面上的$\ell_1$正则优化问题,其目标函数由一般光滑函数项和非光滑$\ell_1$正则项构成,且假设光滑函数的随机梯度可由随机一阶oracle估计.这类优化问题被广泛应用在机器学习,图像、信号处理和统计等领域.根据流形临近梯度法和随机梯度估计技术,提出一种球面随机临近梯度算法.基于非光滑函数的全局隐函数定理,分析了子问题解关于参数的Lipschtiz连续性,进而证明了算法的全局收敛性.在基于随机数据集和实际数据集的球面$\ell_1$正则二次规划问题、有限和SPCA问题和球面$\ell_1$正则逻辑回归问题上数值实验结果显示所提出的算法与流形临近梯度法、黎曼随机临近梯度法相比CPU时间上具有一定的优越性.  相似文献   

5.
An algorithm for univariate optimization using a linear lower bounding function is extended to a nonsmooth case by using the generalized gradient instead of the derivative. A convergence theorem is proved under the condition of semismoothness. This approach gives a globally superlinear convergence of algorithm, which is a generalized Newton-type method.   相似文献   

6.
Based on the gradient sampling technique, we present a subgradient algorithm to solve the nondifferentiable convex optimization problem with an extended real-valued objective function. A feature of our algorithm is the approximation of subgradient at a point via random sampling of (relative) gradients at nearby points, and then taking convex combinations of these (relative) gradients. We prove that our algorithm converges to an optimal solution with probability 1. Numerical results demonstrate that our algorithm performs favorably compared with existing subgradient algorithms on applications considered.  相似文献   

7.
The family of feasible methods for minimization with nonlinear constraints includes the nonlinear projected gradient method, the generalized reduced gradient method (GRG), and many variants of the sequential gradient restoration algorithm (SGRA). Generally speaking, a particular iteration of any of these methods proceeds in two phases. In the restoration phase, feasibility is restored by means of the resolution of an auxiliary nonlinear problem, generally a nonlinear system of equations. In the minimization phase, optimality is improved by means of the consideration of the objective function, or its Lagrangian, on the tangent subspace to the constraints. In this paper, minimal assumptions are stated on the restoration phase and the minimization phase that ensure that the resulting algorithm is globally convergent. The key point is the possibility of comparing two successive nonfeasible iterates by means of a suitable merit function that combines feasibility and optimality. The merit function allows one to work with a high degree of infeasibility at the first iterations of the algorithm. Global convergence is proved and a particular implementation of the model algorithm is described.  相似文献   

8.
A stochastic steepest-descent algorithm for function minimization under noisy observations is presented. Function evaluation is done by performing a number of random experiments on a suitable probability space. The number of experiments performed at a point generated by the algorithm reflects a balance between the conflicting requirements of accuracy and computational complexity. The algorithm uses an adaptive precision scheme to determine the number of random experiments at a point; this number tends to increase whenever a stationary point is approached and to decrease otherwise. Two rules are used to determine the number of random experiments at a point; one, in the inner loop of the algorithm, uses the magnitude of the observed gradient of the function to be minimized; and the other, in the outer-loop, uses a measure of accumulated errors in function evaluations at past points generated by the algorithm. Once a stochastic approximation of the function to be minimized is obtained at a point, the algorithm proceeds to generate the next point by using the steepest-descent deterministic methods of Armijo and Polak (Refs. 3, 4). Convergence of the algorithm to stationary points is demonstrated under suitable assumptions.  相似文献   

9.
In the present paper, we propose a novel convergence analysis of the alternating direction method of multipliers, based on its equivalence with the overrelaxed primal–dual hybrid gradient algorithm. We consider the smooth case, where the objective function can be decomposed into one differentiable with Lipschitz continuous gradient part and one strongly convex part. Under these hypotheses, a convergence proof with an optimal parameter choice is given for the primal–dual method, which leads to convergence results for the alternating direction method of multipliers. An accelerated variant of the latter, based on a parameter relaxation, is also proposed, which is shown to converge linearly with same asymptotic rate as the primal–dual algorithm.  相似文献   

10.
In this paper, we study the problem of sampling from a given probability density function that is known to be smooth and strongly log-concave. We analyze several methods of approximate sampling based on discretizations of the (highly overdamped) Langevin diffusion and establish guarantees on its error measured in the Wasserstein-2 distance. Our guarantees improve or extend the state-of-the-art results in three directions. First, we provide an upper bound on the error of the first-order Langevin Monte Carlo (LMC) algorithm with optimized varying step-size. This result has the advantage of being horizon free (we do not need to know in advance the target precision) and to improve by a logarithmic factor the corresponding result for the constant step-size. Second, we study the case where accurate evaluations of the gradient of the log-density are unavailable, but one can have access to approximations of the aforementioned gradient. In such a situation, we consider both deterministic and stochastic approximations of the gradient and provide an upper bound on the sampling error of the first-order LMC that quantifies the impact of the gradient evaluation inaccuracies. Third, we establish upper bounds for two versions of the second-order LMC, which leverage the Hessian of the log-density. We provide non asymptotic guarantees on the sampling error of these second-order LMCs. These guarantees reveal that the second-order LMC algorithms improve on the first-order LMC in ill-conditioned settings.  相似文献   

11.
A fast descent algorithm, resorting to a “stretching” function technique and built on one hybrid method (GRSA) which combines simulated annealing (SA) algorithm and gradient based methods for large scale global optimizations, is proposed. Unlike the previously proposed method in which the original objective functions remain unchanged during the whole course of optimization, the new method firstly constructs an auxiliary function on one local minimizer obtained by gradient based methods and then SA is executed on this constructed auxiliary function instead of on the original objective function in order that we can improve the jumping ability of SA algorithm to escape from the currently discovered local minimum to a better one from which the gradient based methods restart a new local search. The above procedure is repeated until a global minimum is detected. In addition, corresponding to the adopted “stretching” technique, a new next trial point generating scheme is designed. It is verified by simulation especially on large scale problems that the convergence speed is greatly accelerated, which is its main difference from many other reported methods that mostly cope with functions with less than 50 variables and does not apply to large scale optimization problems. Furthermore, the new algorithm functions as a global optimization procedure with a high success probability and high solution precision.  相似文献   

12.
A Conic Trust-Region Method for Nonlinearly Constrained Optimization   总被引:5,自引:0,他引:5  
Trust-region methods are powerful optimization methods. The conic model method is a new type of method with more information available at each iteration than standard quadratic-based methods. Can we combine their advantages to form a more powerful method for constrained optimization? In this paper we give a positive answer and present a conic trust-region algorithm for non-linearly constrained optimization problems. The trust-region subproblem of our method is to minimize a conic function subject to the linearized constraints and the trust region bound. The use of conic functions allows the model to interpolate function values and gradient values of the Lagrange function at both the current point and previous iterate point. Since conic functions are the extension of quadratic functions, they approximate general nonlinear functions better than quadratic functions. At the same time, the new algorithm possesses robust global properties. In this paper we establish the global convergence of the new algorithm under standard conditions.  相似文献   

13.
This paper proposes an algorithm for the unconstrained minimization of a class of nonsmooth and nonconvex functions that can be written as finite-max functions. A gradient and function-based sampling method is proposed which, under special circumstances, either moves superlinearly to a minimizer of the problem of interest or improves the optimality certificate. Global and local convergence analysis are presented, as well as examples that illustrate the obtained theoretical results.  相似文献   

14.
First-order methods with momentum, such as Nesterov’s fast gradient method, are very useful for convex optimization problems, but can exhibit undesirable oscillations yielding slow convergence rates for some applications. An adaptive restarting scheme can improve the convergence rate of the fast gradient method, when the parameter of a strongly convex cost function is unknown or when the iterates of the algorithm enter a locally strongly convex region. Recently, we introduced the optimized gradient method, a first-order algorithm that has an inexpensive per-iteration computational cost similar to that of the fast gradient method, yet has a worst-case cost function rate that is twice faster than that of the fast gradient method and that is optimal for large-dimensional smooth convex problems. Building upon the success of accelerating the fast gradient method using adaptive restart, this paper investigates similar heuristic acceleration of the optimized gradient method. We first derive a new first-order method that resembles the optimized gradient method for strongly convex quadratic problems with known function parameters, yielding a linear convergence rate that is faster than that of the analogous version of the fast gradient method. We then provide a heuristic analysis and numerical experiments that illustrate that adaptive restart can accelerate the convergence of the optimized gradient method. Numerical results also illustrate that adaptive restart is helpful for a proximal version of the optimized gradient method for nonsmooth composite convex functions.  相似文献   

15.
We propose a stochastic gradient descent algorithm for learning the gradient of a regression function from random samples of function values. This is a learning algorithm involving Mercer kernels. By a detailed analysis in reproducing kernel Hilbert spaces, we provide some error bounds to show that the gradient estimated by the algorithm converges to the true gradient, under some natural conditions on the regression function and suitable choices of the step size and regularization parameters.  相似文献   

16.
In this paper, we proposed an implementation of stochastic perturbation of reduced gradient and bisection (SPRGB) method for optimizing a non-convex differentiable function subject to linear equality constraints and non-negativity bounds on the variables. In particular, at each iteration, we compute a search direction by reduced gradient, and optimal line search by bisection algorithm along this direction yields a decrease in the objective value. SPRGB method is desired to establish the global convergence of the algorithm. An implementation and tests of SPRGB algorithm are given, and some numerical results of large-scale problems are presented, which show the efficient of this approach.  相似文献   

17.
In this paper, we propose conjugate gradient path method for solving derivative-free unconstrained optimization. The iterative direction is obtained by constructing and solving quadratic interpolation model of the objective function with conjugate gradient methods. The global convergence and local superlinear convergence rate of the proposed algorithm are established under some reasonable conditions. Finally, the numerical results are reported to show the effectiveness of the proposed algorithm.  相似文献   

18.
Mathematical programming is a rich and well-developed area in operations research. Nevertheless, there remain many challenging problems in this area, one of which is the large-scale optimization problem. In this article, a modified Hestenes and Stiefel (HS) conjugate gradient (CG) algorithm with a nonmonotone line search technique is presented. This algorithm possesses information about not only the gradient value but also the function value. Moreover, the sufficient descent condition holds without any line search. The global convergence is established for nonconvex functions under suitable conditions. Numerical results show that the proposed algorithm is advantageous to existing CG methods for large-scale optimization problems.  相似文献   

19.
In this paper, based on a merit function of the split feasibility problem (SFP), we present a Newton projection method for solving it and analyze the convergence properties of the method. The merit function is differentiable and convex. But its gradient is a linear composite function of the projection operator, so it is nonsmooth in general. We prove that the sequence of iterates converges globally to a solution of the SFP as long as the regularization parameter matrix in the algorithm is chosen properly. Especially, under some local assumptions which are necessary for the case where the projection operator is nonsmooth, we prove that the sequence of iterates generated by the algorithm superlinearly converges to a regular solution of the SFP. Finally, some numerical results are presented.  相似文献   

20.
An algorithm is presented that minimizes a nonlinear function in many variables under equality constraints by generating a monotonically improving sequence of feasible points along curvilinear search paths obeying an initialvalue system of differential equations. The derivation of the differential equations is based on the idea of a steepest descent curve for the objective function on the feasible region. Our method for small stepsize behaves as the generalized reduced gradient algorithm, whereas for large enough stepsize the constrained equivalent of Newton's method for unconstrained minimization is obtained.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号