首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 406 毫秒
1.
We analyze the stochastic average gradient (SAG) method for optimizing the sum of a finite number of smooth convex functions. Like stochastic gradient (SG) methods, the SAG method’s iteration cost is independent of the number of terms in the sum. However, by incorporating a memory of previous gradient values the SAG method achieves a faster convergence rate than black-box SG methods. The convergence rate is improved from \(O(1/\sqrt{k})\) to O(1 / k) in general, and when the sum is strongly-convex the convergence rate is improved from the sub-linear O(1 / k) to a linear convergence rate of the form \(O(\rho ^k)\) for \(\rho < 1\). Further, in many cases the convergence rate of the new method is also faster than black-box deterministic gradient methods, in terms of the number of gradient evaluations. This extends our earlier work Le Roux et al. (Adv Neural Inf Process Syst, 2012), which only lead to a faster rate for well-conditioned strongly-convex problems. Numerical experiments indicate that the new algorithm often dramatically outperforms existing SG and deterministic gradient methods, and that the performance may be further improved through the use of non-uniform sampling strategies.  相似文献   

2.
We consider gradient algorithms for minimizing a quadratic function in ${\mathbb{R}^n}$ with large n. We suggest a particular sequence of step-sizes and demonstrate that the resulting gradient algorithm has a convergence rate comparable with that of Conjugate Gradients and other methods based on the use of Krylov spaces. When the matrix is large and sparse, the proposed algorithm can be more efficient than the Conjugate Gradient algorithm in terms of computational cost, as k iterations of the proposed algorithm only require the computation of O(log k) inner products.  相似文献   

3.
In this Note, we formulate a sparse Krylov-based algorithm for solving large-scale linear systems of algebraic equations arising from the discretization of randomly parametrized (or stochastic) elliptic partial differential equations (SPDEs). We analyze the proposed sparse conjugate gradient (CG) algorithm within the framework of inexact Krylov subspace methods, prove its convergence and study its abstract computational cost. Numerical studies conducted on stochastic diffusion models show that the proposed sparse CG algorithm outperforms the classical CG method when the sought solutions admit a sparse representation in a polynomial chaos basis. In such cases, the sparse CG algorithm recovers almost exactly the sparsity pattern of the exact solutions, which enables accelerated convergence. In the case when the SPDE solution does not admit a sparse representation, the convergence of the proposed algorithm is very similar to the classical CG method.  相似文献   

4.
An optimal method for stochastic composite optimization   总被引:1,自引:0,他引:1  
This paper considers an important class of convex programming (CP) problems, namely, the stochastic composite optimization (SCO), whose objective function is given by the summation of general nonsmooth and smooth stochastic components. Since SCO covers non-smooth, smooth and stochastic CP as certain special cases, a valid lower bound on the rate of convergence for solving these problems is known from the classic complexity theory of convex programming. Note however that the optimization algorithms that can achieve this lower bound had never been developed. In this paper, we show that the simple mirror-descent stochastic approximation method exhibits the best-known rate of convergence for solving these problems. Our major contribution is to introduce the accelerated stochastic approximation (AC-SA) algorithm based on Nesterov’s optimal method for smooth CP (Nesterov in Doklady AN SSSR 269:543–547, 1983; Nesterov in Math Program 103:127–152, 2005), and show that the AC-SA algorithm can achieve the aforementioned lower bound on the rate of convergence for SCO. To the best of our knowledge, it is also the first universally optimal algorithm in the literature for solving non-smooth, smooth and stochastic CP problems. We illustrate the significant advantages of the AC-SA algorithm over existing methods in the context of solving a special but broad class of stochastic programming problems.  相似文献   

5.
Regularization of ill-posed linear inverse problems via ? 1 penalization has been proposed for cases where the solution is known to be (almost) sparse. One way to obtain the minimizer of such an ? 1 penalized functional is via an iterative soft-thresholding algorithm. We propose an alternative implementation to ? 1-constraints, using a gradient method, with projection on ? 1-balls. The corresponding algorithm uses again iterative soft-thresholding, now with a variable thresholding parameter. We also propose accelerated versions of this iterative method, using ingredients of the (linear) steepest descent method. We prove convergence in norm for one of these projected gradient methods, without and with acceleration.  相似文献   

6.
Yang  Minghan  Milzarek  Andre  Wen  Zaiwen  Zhang  Tong 《Mathematical Programming》2022,194(1-2):257-303

In this paper, a novel stochastic extra-step quasi-Newton method is developed to solve a class of nonsmooth nonconvex composite optimization problems. We assume that the gradient of the smooth part of the objective function can only be approximated by stochastic oracles. The proposed method combines general stochastic higher order steps derived from an underlying proximal type fixed-point equation with additional stochastic proximal gradient steps to guarantee convergence. Based on suitable bounds on the step sizes, we establish global convergence to stationary points in expectation and an extension of the approach using variance reduction techniques is discussed. Motivated by large-scale and big data applications, we investigate a stochastic coordinate-type quasi-Newton scheme that allows to generate cheap and tractable stochastic higher order directions. Finally, numerical results on large-scale logistic regression and deep learning problems show that our proposed algorithm compares favorably with other state-of-the-art methods.

  相似文献   

7.
We propose a multi-time scale quasi-Newton based smoothed functional (QN-SF) algorithm for stochastic optimization both with and without inequality constraints. The algorithm combines the smoothed functional (SF) scheme for estimating the gradient with the quasi-Newton method to solve the optimization problem. Newton algorithms typically update the Hessian at each instant and subsequently (a) project them to the space of positive definite and symmetric matrices, and (b) invert the projected Hessian. The latter operation is computationally expensive. In order to save computational effort, we propose in this paper a quasi-Newton SF (QN-SF) algorithm based on the Broyden-Fletcher-Goldfarb-Shanno (BFGS) update rule. In Bhatnagar (ACM TModel Comput S. 18(1): 27–62, 2007), a Jacobi variant of Newton SF (JN-SF) was proposed and implemented to save computational effort. We compare our QN-SF algorithm with gradient SF (G-SF) and JN-SF algorithms on two different problems – first on a simple stochastic function minimization problem and the other on a problem of optimal routing in a queueing network. We observe from the experiments that the QN-SF algorithm performs significantly better than both G-SF and JN-SF algorithms on both the problem settings. Next we extend the QN-SF algorithm to the case of constrained optimization. In this case too, the QN-SF algorithm performs much better than the JN-SF algorithm. Finally we present the proof of convergence for the QN-SF algorithm in both unconstrained and constrained settings.  相似文献   

8.
In this paper we describe an implementation of a cutting plane algorithm for the perfect matching problem which is based on the simplex method. The algorithm has the following features:
  • -It works on very sparse subgraphs ofK n which are determined heuristically, global optimality is checked using the reduced cost criterion.
  • -Cutting plane recognition is usually accomplished by heuristics. Only if these fail, the Padberg-Rao procedure is invoked to guarantee finite convergence.
  • Our computational study shows that—on the average—very few variables and very few cutting planes suffice to find a globally optimal solution. We could solve this way matching problems on complete graphs with up to 1000 nodes. Moreover, it turned out that our cutting plane algorithm is competitive with the fast combinatorial matching algorithms known to date.  相似文献   

    9.
    This work develops a class of stochastic optimization algorithms. It aims to provide numerical procedures for solving threshold-type optimal control problems. The main motivation stems from applications involving optimal or suboptimal hedging policies, for example, production planning of manufacturing systems including random demand and stochastic machine capacity. The proposed algorithm is a constrained stochastic approximation procedure that uses random-direction finite-difference gradient estimates. Under fairly general conditions, the convergence of the algorithm is established and the rate of convergence is also derived. A numerical example is reported to demonstrate the performance of the algorithm.  相似文献   

    10.
    We consider the solution of linear systems of equations Ax=b, with A a symmetric positive-definite matrix in ? n×n , through Richardson-type iterations or, equivalently, the minimization of convex quadratic functions (1/2)(Ax,x)?(b,x) with a gradient algorithm. The use of step-sizes asymptotically distributed with the arcsine distribution on the spectrum of A then yields an asymptotic rate of convergence after k<n iterations, k→∞, that coincides with that of the conjugate-gradient algorithm in the worst case. However, the spectral bounds m and M are generally unknown and thus need to be estimated to allow the construction of simple and cost-effective gradient algorithms with fast convergence. It is the purpose of this paper to analyse the properties of estimators of m and M based on moments of probability measures ν k defined on the spectrum of A and generated by the algorithm on its way towards the optimal solution. A precise analysis of the behavior of the rate of convergence of the algorithm is also given. Two situations are considered: (i) the sequence of step-sizes corresponds to i.i.d. random variables, (ii) they are generated through a dynamical system (fractional parts of the golden ratio) producing a low-discrepancy sequence. In the first case, properties of random walk can be used to prove the convergence of simple spectral bound estimators based on the first moment of ν k . The second option requires a more careful choice of spectral bounds estimators but is shown to produce much less fluctuations for the rate of convergence of the algorithm.  相似文献   

    11.
    In this paper we introduce an iterative algorithm for finding a common element of the fixed point set of an asymptotically strict pseudocontractive mapping S in the intermediate sense and the solution set of the minimization problem (MP) for a convex and continuously Frechet differentiable functional in Hilbert space. The iterative algorithm is based on several well-known methods including the extragradient method, CQ method, Mann-type iterative method and hybrid gradient projection algorithm with regularization. We obtain a strong convergence theorem for three sequences generated by our iterative algorithm. In addition, we also prove a new weak convergence theorem by a modified extragradient method with regularization for the MP and the mapping S.  相似文献   

    12.
    In this paper, we develop a parameterized proximal point algorithm (P-PPA) for solving a class of separable convex programming problems subject to linear and convex constraints. The proposed algorithm is provable to be globally convergent with a worst-case O(1 / t) convergence rate, where t denotes the iteration number. By properly choosing the algorithm parameters, numerical experiments on solving a sparse optimization problem arising from statistical learning show that our P-PPA could perform significantly better than other state-of-the-art methods, such as the alternating direction method of multipliers and the relaxed proximal point algorithm.  相似文献   

    13.
    We present an algorithm for solving stochastic heat equations, whose key ingredient is a non-uniform time discretization of the driving Brownian motion W. For this algorithm we derive an error bound in terms of its number of evaluations of one-dimensional components of W. The rate of convergence depends on the spatial dimension of the heat equation and on the decay of the eigenfunctions of the covariance of W. According to known lower bounds, our algorithm is optimal, up to a constant, and this optimality cannot be achieved by uniform time discretizations. AMS subject classification (2000)  60H15, 60H35, 65C30  相似文献   

    14.
    本文研究球面上的$\ell_1$正则优化问题,其目标函数由一般光滑函数项和非光滑$\ell_1$正则项构成,且假设光滑函数的随机梯度可由随机一阶oracle估计.这类优化问题被广泛应用在机器学习,图像、信号处理和统计等领域.根据流形临近梯度法和随机梯度估计技术,提出一种球面随机临近梯度算法.基于非光滑函数的全局隐函数定理,分析了子问题解关于参数的Lipschtiz连续性,进而证明了算法的全局收敛性.在基于随机数据集和实际数据集的球面$\ell_1$正则二次规划问题、有限和SPCA问题和球面$\ell_1$正则逻辑回归问题上数值实验结果显示所提出的算法与流形临近梯度法、黎曼随机临近梯度法相比CPU时间上具有一定的优越性.  相似文献   

    15.
    In a previous paper we gave a new, natural extension of the calculus of variations/optimal control theory to a (strong) stochastic setting. We now extend the theory of this most fundamental chapter of optimal control in several directions. Most importantly we present a new method of stochastic control, adding Brownian motion which makes the problem “noisy.” Secondly, we show how to obtain efficient solutions: direct stochastic integration for simpler problems and/or efficient and accurate numerical methods with a global a priori error of O(h3/2) for more complex problems. Finally, we include “quiet” constraints, i.e. deterministic relationships between the state and control variables. Our theory and results can be immediately restricted to the non “noisy” (deterministic) case yielding efficient, numerical solution techniques and an a priori error of O(h2). In this event we obtain the most efficient method of solving the (constrained) classical Linear Regulator Problem. Our methods are different from the standard theory of stochastic control. In some cases the solutions coincide or at least are closely related. However, our methods have many advantages including those mentioned above. In addition, our methods more directly follow the motivation and theory of classical (deterministic) optimization which is perhaps the most important area of physical and engineering science. Our results follow from related ideas in the deterministic theory. Thus, our approximation methods follow by guessing at an algorithm, but the proof of global convergence uses stochastic techniques because our trajectories are not differentiable. Along these lines, a general drift term in the trajectory equation is properly viewed as an added constraint and extends ideas given in the deterministic case by the first author.  相似文献   

    16.
    We consider rate swaps which pay a fixed rate against a floating rate in the presence of bid-ask spread costs. Even for simple models of bid-ask spread costs, there is no explicit strategy optimizing an expected function of the hedging error. We here propose an efficient algorithm based on the stochastic gradient method to compute an approximate optimal strategy without solving a stochastic control problem. We validate our algorithm by numerical experiments. We also develop several variants of the algorithm and discuss their performances in terms of the numerical parameters and the liquidity cost.  相似文献   

    17.
    In this work, the optimal adjustment algorithm for p coordinates, which arose from a generalization of the optimal pair adjustment algorithm is used to accelerate the convergence of interior point methods using a hybrid iterative approach for solving the linear systems of the interior point method. Its main advantages are simplicity and fast initial convergence. At each interior point iteration, the preconditioned conjugate gradient method is used in order to solve the normal equation system. The controlled Cholesky factorization is adopted as the preconditioner in the first outer iterations and the splitting preconditioner is adopted in the final outer iterations. The optimal adjustment algorithm is applied in the preconditioner transition in order to improve both speed and robustness. Numerical experiments on a set of linear programming problems showed that this approach reduces the total number of interior point iterations and running time for some classes of problems. Furthermore, some problems were solved only when the optimal adjustment algorithm for p coordinates was used in the change of preconditioners.  相似文献   

    18.
    In this paper, we prove new complexity bounds for methods of convex optimization based only on computation of the function value. The search directions of our schemes are normally distributed random Gaussian vectors. It appears that such methods usually need at most n times more iterations than the standard gradient methods, where n is the dimension of the space of variables. This conclusion is true for both nonsmooth and smooth problems. For the latter class, we present also an accelerated scheme with the expected rate of convergence \(O\Big ({n^2 \over k^2}\Big )\), where k is the iteration counter. For stochastic optimization, we propose a zero-order scheme and justify its expected rate of convergence \(O\Big ({n \over k^{1/2}}\Big )\). We give also some bounds for the rate of convergence of the random gradient-free methods to stationary points of nonconvex functions, for both smooth and nonsmooth cases. Our theoretical results are supported by preliminary computational experiments.  相似文献   

    19.
    We consider the numerical solution of the generalized Lyapunov and Stein equations in \(\mathbb {R}^{n}\), arising respectively from stochastic optimal control in continuous- and discrete-time. Generalizing the Smith method, our algorithms converge quadratically and have an O(n3) computational complexity per iteration and an O(n2) memory requirement. For large-scale problems, when the relevant matrix operators are “sparse”, our algorithm for generalized Stein (or Lyapunov) equations may achieve the complexity and memory requirement of O(n) (or similar to that of the solution of the linear systems associated with the sparse matrix operators). These efficient algorithms can be applied to Newton’s method for the solution of the rational Riccati equations. This contrasts favourably with the naive Newton algorithms of O(n6) complexity or the slower modified Newton’s methods of O(n3) complexity. The convergence and error analysis will be considered and numerical examples provided.  相似文献   

    20.
    A new decomposition optimization algorithm, called path-following gradient-based decomposition, is proposed to solve separable convex optimization problems. Unlike path-following Newton methods considered in the literature, this algorithm does not require any smoothness assumption on the objective function. This allows us to handle more general classes of problems arising in many real applications than in the path-following Newton methods. The new algorithm is a combination of three techniques, namely smoothing, Lagrangian decomposition and path-following gradient framework. The algorithm decomposes the original problem into smaller subproblems by using dual decomposition and smoothing via self-concordant barriers, updates the dual variables using a path-following gradient method and allows one to solve the subproblems in parallel. Moreover, compared to augmented Lagrangian approaches, our algorithmic parameters are updated automatically without any tuning strategy. We prove the global convergence of the new algorithm and analyze its convergence rate. Then, we modify the proposed algorithm by applying Nesterov’s accelerating scheme to get a new variant which has a better convergence rate than the first algorithm. Finally, we present preliminary numerical tests that confirm the theoretical development.  相似文献   

    设为首页 | 免责声明 | 关于勤云 | 加入收藏

    Copyright©北京勤云科技发展有限公司  京ICP备09084417号