期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Performance of first-order methods for smooth convex minimization: a novel approach

Yoel Drori Marc Teboulle 《Mathematical Programming》2014,145(1-2):451-482

We introduce a novel approach for analyzing the worst-case performance of first-order black-box optimization methods. We focus on smooth unconstrained convex minimization over the Euclidean space. Our approach relies on the observation that by definition, the worst-case behavior of a black-box optimization method is by itself an optimization problem, which we call the performance estimation problem (PEP). We formulate and analyze the PEP for two classes of first-order algorithms. We first apply this approach on the classical gradient method and derive a new and tight analytical bound on its performance. We then consider a broader class of first-order black-box methods, which among others, include the so-called heavy-ball method and the fast gradient schemes. We show that for this broader class, it is possible to derive new bounds on the performance of these methods by solving an adequately relaxed convex semidefinite PEP. Finally, we show an efficient procedure for finding optimal step sizes which results in a first-order black-box method that achieves best worst-case performance. 相似文献

2.

Exact Worst-Case Convergence Rates of the Proximal Gradient Method for Composite Convex Minimization

Adrien B. Taylor Julien M. Hendrickx François Glineur 《Journal of Optimization Theory and Applications》2018,178(2):455-476

We study the worst-case convergence rates of the proximal gradient method for minimizing the sum of a smooth strongly convex function and a non-smooth convex function, whose proximal operator is available. We establish the exact worst-case convergence rates of the proximal gradient method in this setting for any step size and for different standard performance measures: objective function accuracy, distance to optimality and residual gradient norm. The proof methodology relies on recent developments in performance estimation of first-order methods, based on semidefinite programming. In the case of the proximal gradient method, this methodology allows obtaining exact and non-asymptotic worst-case guarantees that are conceptually very simple, although apparently new. On the way, we discuss how strong convexity can be replaced by weaker assumptions, while preserving the corresponding convergence rates. We also establish that the same fixed step size policy is optimal for all three performance measures. Finally, we extend recent results on the worst-case behavior of gradient descent with exact line search to the proximal case. 相似文献

3.

On the worst-case complexity of the gradient method with exact line search for smooth strongly convex functions

Etienne de Klerk François Glineur Adrien B. Taylor 《Optimization Letters》2017,11(7):1185-1199

We consider the gradient (or steepest) descent method with exact line search applied to a strongly convex function with Lipschitz continuous gradient. We establish the exact worst-case rate of convergence of this scheme, and show that this worst-case behavior is exhibited by a certain convex quadratic function. We also give the tight worst-case complexity bound for a noisy variant of gradient descent method, where exact line-search is performed in a search direction that differs from negative gradient by at most a prescribed relative tolerance. The proofs are computer-assisted, and rely on the resolutions of semidefinite programming performance estimation problems as introduced in the paper (Drori and Teboulle, Math Progr 145(1–2):451–482, 2014). 相似文献

4.

Adaptive Restart of the Optimized Gradient Method for Convex Optimization

Donghwan Kim Jeffrey A. Fessler 《Journal of Optimization Theory and Applications》2018,178(1):240-263

First-order methods with momentum, such as Nesterov’s fast gradient method, are very useful for convex optimization problems, but can exhibit undesirable oscillations yielding slow convergence rates for some applications. An adaptive restarting scheme can improve the convergence rate of the fast gradient method, when the parameter of a strongly convex cost function is unknown or when the iterates of the algorithm enter a locally strongly convex region. Recently, we introduced the optimized gradient method, a first-order algorithm that has an inexpensive per-iteration computational cost similar to that of the fast gradient method, yet has a worst-case cost function rate that is twice faster than that of the fast gradient method and that is optimal for large-dimensional smooth convex problems. Building upon the success of accelerating the fast gradient method using adaptive restart, this paper investigates similar heuristic acceleration of the optimized gradient method. We first derive a new first-order method that resembles the optimized gradient method for strongly convex quadratic problems with known function parameters, yielding a linear convergence rate that is faster than that of the analogous version of the fast gradient method. We then provide a heuristic analysis and numerical experiments that illustrate that adaptive restart can accelerate the convergence of the optimized gradient method. Numerical results also illustrate that adaptive restart is helpful for a proximal version of the optimized gradient method for nonsmooth composite convex functions. 相似文献

5.

Inexact proximal stochastic gradient method for convex composite optimization

Xiao Wang Shuxiong Wang Hongchao Zhang 《Computational Optimization and Applications》2017,68(3):579-618

We study an inexact proximal stochastic gradient (IPSG) method for convex composite optimization, whose objective function is a summation of an average of a large number of smooth convex functions and a convex, but possibly nonsmooth, function. Variance reduction techniques are incorporated in the method to reduce the stochastic gradient variance. The main feature of this IPSG algorithm is to allow solving the proximal subproblems inexactly while still keeping the global convergence with desirable complexity bounds. Different subproblem stopping criteria are proposed. Global convergence and the component gradient complexity bounds are derived for the both cases when the objective function is strongly convex or just generally convex. Preliminary numerical experiment shows the overall efficiency of the IPSG algorithm. 相似文献

6.

An optimal gradient method for smooth strongly convex minimization

Taylor Adrien Drori Yoel 《Mathematical Programming》2023,199(1-2):557-594

Mathematical Programming - We present an optimal gradient method for smooth strongly convex optimization. The method is optimal in the sense that its worst-case bound on the distance to an optimal... 相似文献

7.

First-order methods of smooth convex optimization with inexact oracle

Olivier Devolder François Glineur Yurii Nesterov 《Mathematical Programming》2014,146(1-2):37-75

We introduce the notion of inexact first-order oracle and analyze the behavior of several first-order methods of smooth convex optimization used with such an oracle. This notion of inexact oracle naturally appears in the context of smoothing techniques, Moreau–Yosida regularization, Augmented Lagrangians and many other situations. We derive complexity estimates for primal, dual and fast gradient methods, and study in particular their dependence on the accuracy of the oracle and the desired accuracy of the objective function. We observe that the superiority of fast gradient methods over the classical ones is no longer absolute when an inexact oracle is used. We prove that, contrary to simple gradient schemes, fast gradient methods must necessarily suffer from error accumulation. Finally, we show that the notion of inexact oracle allows the application of first-order methods of smooth convex optimization to solve non-smooth or weakly smooth convex problems. 相似文献

8.

Compositions of convex functions and fully linear models

W. Hare 《Optimization Letters》2017,11(7):1217-1227

Derivative-free optimization (DFO) is the mathematical study of the optimization algorithms that do not use derivatives. One branch of DFO focuses on model-based DFO methods, where an approximation of the objective function is used to guide the optimization algorithm. Proving convergence of such methods often applies an assumption that the approximations form fully linear models—an assumption that requires the true objective function to be smooth. However, some recent methods have loosened this assumption and instead worked with functions that are compositions of smooth functions with simple convex functions (the max-function or the \(\ell _1\) norm). In this paper, we examine the error bounds resulting from the composition of a convex lower semi-continuous function with a smooth vector-valued function when it is possible to provide fully linear models for each component of the vector-valued function. We derive error bounds for the resulting function values and subgradient vectors. 相似文献

9.

梯度Q-线性收敛的光滑凸极小化的一阶算法

叶加青陈倩竹胡海平《运筹学学报》2021,25(1):96-106

受性能估计问题（PEP）方法的启发,通过考察最坏函数误差的收敛边界（即效率）,优化了迭代点对应的梯度满足Q-线性收敛的光滑凸极小化的一阶方法的步长系数。介绍新的有效的一阶方法,称为QGM,具有与优化梯度法（OGM）类似的计算有效形式。相似文献

10.

Gap functions and error bounds for nonsmooth convex vector optimization problem

Joydeep Dutta Poonam Kesarwani Sanjeev Gupta 《Optimization》2017,66(11):1807-1836

Abstract

In this article, our main aim is to develop gap functions and error bounds for a (non-smooth) convex vector optimization problem. We show that by focusing on convexity we are able to quite efficiently compute the gap functions and try to gain insight about the structure of set of weak Pareto minimizers by viewing its graph. We will discuss several properties of gap functions and develop error bounds when the data are strongly convex. We also compare our results with some recent results on weak vector variational inequalities with set-valued maps, and also argue as to why we focus on the convex case. 相似文献

11.

A block coordinate gradient descent method for regularized convex separable optimization and covariance selection

Sangwoon Yun Paul Tseng Kim-Chuan Toh 《Mathematical Programming》2011,129(2):331-355

We consider a class of unconstrained nonsmooth convex optimization problems, in which the objective function is the sum of a convex smooth function on an open subset of matrices and a separable convex function on a set of matrices. This problem includes the covariance selection problem that can be expressed as an ℓ ₁-penalized maximum likelihood estimation problem. In this paper, we propose a block coordinate gradient descent method (abbreviated as BCGD) for solving this class of nonsmooth separable problems with the coordinate block chosen by a Gauss-Seidel rule. The method is simple, highly parallelizable, and suited for large-scale problems. We establish global convergence and, under a local Lipschizian error bound assumption, linear rate of convergence for this method. For the covariance selection problem, the method can terminate in O(n³/e){O(n^3/\epsilon)} iterations with an e{\epsilon}-optimal solution. We compare the performance of the BCGD method with the first-order methods proposed by Lu (SIAM J Optim 19:1807–1827, 2009; SIAM J Matrix Anal Appl 31:2000–2016, 2010) for solving the covariance selection problem on randomly generated instances. Our numerical experience suggests that the BCGD method can be efficient for large-scale covariance selection problems with constraints. 相似文献

12.

The effect of transformations on the approximation of univariate (convex) functions with applications to Pareto curves

A.Y.D. Siem D. den Hertog A.L. Hoffmann 《European Journal of Operational Research》2008

In the literature, methods for the construction of piecewise linear upper and lower bounds for the approximation of univariate convex functions have been proposed. We study the effect of the use of transformations on the approximation of univariate (convex) functions. In this paper, we show that these transformations can be used to construct upper and lower bounds for nonconvex functions. Moreover, we show that by using such transformations of the input variable or the output variable, we obtain tighter upper and lower bounds for the approximation of convex functions than without these approximations. We show that these transformations can be applied to the approximation of a (convex) Pareto curve that is associated with a (convex) bi-objective optimization problem. 相似文献

13.

Comparison of Two Sets of First-Order Conditions as Bases of Interior-Point Newton Methods for Optimization with Simple Bounds

D.C. Jamrog R.A. Tapia Y. Zhang 《Journal of Optimization Theory and Applications》2002,113(1):21-40

In this paper, we compare the behavior of two Newton interior-point methods derived from two different first-order necessary conditions for the same nonlinear optimization problem with simple bounds. One set of conditions was proposed by Coleman and Li; the other is the standard KKT set of conditions. We discuss a perturbation of the CL conditions for problems with one-sided bounds and the difficulties involved in extending this to problems with general bounds. We study the numerical behavior of the Newton method applied to the systems of equations associated with the unperturbed and perturbed necessary conditions. Preliminary numerical results for convex quadratic objective functions indicate that, for this class of problems, the Newton method based on the perturbed KKT formulation appears to be the more robust. 相似文献

14.

A coordinate gradient descent method for nonsmooth separable minimization 总被引：1，自引：0，他引：1

Paul Tseng Sangwoon Yun 《Mathematical Programming》2009,117(1-2):387-423

We consider the problem of minimizing the sum of a smooth function and a separable convex function. This problem includes as special cases bound-constrained optimization and smooth optimization with ?₁-regularization. We propose a (block) coordinate gradient descent method for solving this class of nonsmooth separable problems. We establish global convergence and, under a local Lipschitzian error bound assumption, linear convergence for this method. The local Lipschitzian error bound holds under assumptions analogous to those for constrained smooth optimization, e.g., the convex function is polyhedral and the smooth function is (nonconvex) quadratic or is the composition of a strongly convex function with a linear mapping. We report numerical experience with solving the ?₁-regularization of unconstrained optimization problems from Moré et al. in ACM Trans. Math. Softw. 7, 17–41, 1981 and from the CUTEr set (Gould and Orban in ACM Trans. Math. Softw. 29, 373–394, 2003). Comparison with L-BFGS-B and MINOS, applied to a reformulation of the ?₁-regularized problem as a bound-constrained optimization problem, is also reported. 相似文献

15.

Stochastic intermediate gradient method for convex optimization problems

A. V. Gasnikov P. E. Dvurechensky 《Doklady Mathematics》2016,93(2):148-151

New first-order methods are introduced for solving convex optimization problems from a fairly broad class. For composite optimization problems with an inexact stochastic oracle, a stochastic intermediate gradient method is proposed that allows using an arbitrary norm in the space of variables and a prox-function. The mean rate of convergence of this method and the probability of large deviations from this rate are estimated. For problems with a strongly convex objective function, a modification of this method is proposed and its rate of convergence is estimated. The resulting estimates coincide, up to a multiplicative constant, with lower complexity bounds for the class of composite optimization problems with an inexact stochastic oracle and for all usually considered subclasses of this class. 相似文献

16.

Competitive online algorithms for resource allocation over the positive semidefinite cone

Reza Eghbali James Saunderson Maryam Fazel 《Mathematical Programming》2018,170(1):267-292

We consider a new and general online resource allocation problem, where the goal is to maximize a function of a positive semidefinite (PSD) matrix with a scalar budget constraint. The problem data arrives online, and the algorithm needs to make an irrevocable decision at each step. Of particular interest are classic experiment design problems in the online setting, with the algorithm deciding whether to allocate budget to each experiment as new experiments become available sequentially. We analyze two greedy primal-dual algorithms and provide bounds on their competitive ratios. Our analysis relies on a smooth surrogate of the objective function that needs to satisfy a new diminishing returns (PSD-DR) property (that its gradient is order-reversing with respect to the PSD cone). Using the representation for monotone maps on the PSD cone given by Löwner’s theorem, we obtain a convex parametrization of the family of functions satisfying PSD-DR. We then formulate a convex optimization problem to directly optimize our competitive ratio bound over this set. This design problem can be solved offline before the data start arriving. The online algorithm that uses the designed smoothing is tailored to the given cost function, and enjoys a competitive ratio at least as good as our optimized bound. We provide examples of computing the smooth surrogate for D-optimal and A-optimal experiment design, and demonstrate the performance of the custom-designed algorithm. 相似文献

17.

Bundle methods for sum-functions with “easy” components: applications to multicommodity network design

Antonio Frangioni Enrico Gorgone 《Mathematical Programming》2014,145(1-2):133-161

We propose a version of the bundle scheme for convex nondifferentiable optimization suitable for the case of a sum-function where some of the components are “easy”, that is, they are Lagrangian functions of explicitly known compact convex programs. This corresponds to a stabilized partial Dantzig–Wolfe decomposition, where suitably modified representations of the “easy” convex subproblems are inserted in the master problem as an alternative to iteratively inner-approximating them by extreme points, thus providing the algorithm with exact information about a part of the dual objective function. The resulting master problems are potentially larger and less well-structured than the standard ones, ruling out the available specialized techniques and requiring the use of general-purpose solvers for their solution; this strongly favors piecewise-linear stabilizing terms, as opposed to the more usual quadratic ones, which in turn may have an adverse effect on the convergence speed of the algorithm, so that the overall performance may depend on appropriate tuning of all these aspects. Yet, very good computational results are obtained in at least one relevant application: the computation of tight lower bounds for Fixed-Charge Multicommodity Min-Cost Flow problems. 相似文献

18.

Stochastic compositional gradient descent: algorithms for minimizing compositions of expected-value functions

Mengdi Wang Ethan X. Fang Han Liu 《Mathematical Programming》2017,161(1-2):419-449

Classical stochastic gradient methods are well suited for minimizing expected-value objective functions. However, they do not apply to the minimization of a nonlinear function involving expected values or a composition of two expected-value functions, i.e., the problem \(\min _x \mathbf{E}_v\left[ f_v\big (\mathbf{E}_w [g_w(x)]\big ) \right] .\) In order to solve this stochastic composition problem, we propose a class of stochastic compositional gradient descent (SCGD) algorithms that can be viewed as stochastic versions of quasi-gradient method. SCGD update the solutions based on noisy sample gradients of \(f_v,g_{w}\) and use an auxiliary variable to track the unknown quantity \(\mathbf{E}_w\left[ g_w(x)\right] \). We prove that the SCGD converge almost surely to an optimal solution for convex optimization problems, as long as such a solution exists. The convergence involves the interplay of two iterations with different time scales. For nonsmooth convex problems, the SCGD achieves a convergence rate of \(\mathcal {O}(k^{-1/4})\) in the general case and \(\mathcal {O}(k^{-2/3})\) in the strongly convex case, after taking k samples. For smooth convex problems, the SCGD can be accelerated to converge at a rate of \(\mathcal {O}(k^{-2/7})\) in the general case and \(\mathcal {O}(k^{-4/5})\) in the strongly convex case. For nonconvex problems, we prove that any limit point generated by SCGD is a stationary point, for which we also provide the convergence rate analysis. Indeed, the stochastic setting where one wants to optimize compositions of expected-value functions is very common in practice. The proposed SCGD methods find wide applications in learning, estimation, dynamic programming, etc. 相似文献

19.

On the worst-case evaluation complexity of non-monotone line search algorithms

Geovani N. Grapiglia Ekkehard W. Sachs 《Computational Optimization and Applications》2017,68(3):555-577

A general class of non-monotone line search algorithms has been proposed by Sachs and Sachs (Control Cybern 40:1059–1075, 2011) for smooth unconstrained optimization, generalizing various non-monotone step size rules such as the modified Armijo rule of Zhang and Hager (SIAM J Optim 14:1043–1056, 2004). In this paper, the worst-case complexity of this class of non-monotone algorithms is studied. The analysis is carried out in the context of non-convex, convex and strongly convex objectives with Lipschitz continuous gradients. Despite de nonmonotonicity in the decrease of function values, the complexity bounds obtained agree in order with the bounds already established for monotone algorithms. 相似文献

20.

Sparse trace norm regularization

Jianhui Chen Jieping Ye 《Computational Statistics》2014,29(3-4):623-639

We study the problem of estimating multiple predictive functions from a dictionary of basis functions in the nonparametric regression setting. Our estimation scheme assumes that each predictive function can be estimated in the form of a linear combination of the basis functions. By assuming that the coefficient matrix admits a sparse low-rank structure, we formulate the function estimation problem as a convex program regularized by the trace norm and the \(\ell _1\) -norm simultaneously. We propose to solve the convex program using the accelerated gradient (AG) method; we also develop efficient algorithms to solve the key components in AG. In addition, we conduct theoretical analysis on the proposed function estimation scheme: we derive a key property of the optimal solution to the convex program; based on an assumption on the basis functions, we establish a performance bound of the proposed function estimation scheme (via the composite regularization). Simulation studies demonstrate the effectiveness and efficiency of the proposed algorithms. 相似文献