首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 50 毫秒
1.
We consider the problem of minimizing the weighted sum of a smooth function f and a convex function P of n real variables subject to m linear equality constraints. We propose a block-coordinate gradient descent method for solving this problem, with the coordinate block chosen by a Gauss-Southwell-q rule based on sufficient predicted descent. We establish global convergence to first-order stationarity for this method and, under a local error bound assumption, linear rate of convergence. If f is convex with Lipschitz continuous gradient, then the method terminates in O(n 2/ε) iterations with an ε-optimal solution. If P is separable, then the Gauss-Southwell-q rule is implementable in O(n) operations when m=1 and in O(n 2) operations when m>1. In the special case of support vector machines training, for which f is convex quadratic, P is separable, and m=1, this complexity bound is comparable to the best known bound for decomposition methods. If f is convex, then, by gradually reducing the weight on P to zero, the method can be adapted to solve the bilevel problem of minimizing P over the set of minima of f+δ X , where X denotes the closure of the feasible set. This has application in the least 1-norm solution of maximum-likelihood estimation. This research was supported by the National Science Foundation, Grant No. DMS-0511283.  相似文献   

2.
We consider a class of unconstrained nonsmooth convex optimization problems, in which the objective function is the sum of a convex smooth function on an open subset of matrices and a separable convex function on a set of matrices. This problem includes the covariance selection problem that can be expressed as an 1-penalized maximum likelihood estimation problem. In this paper, we propose a block coordinate gradient descent method (abbreviated as BCGD) for solving this class of nonsmooth separable problems with the coordinate block chosen by a Gauss-Seidel rule. The method is simple, highly parallelizable, and suited for large-scale problems. We establish global convergence and, under a local Lipschizian error bound assumption, linear rate of convergence for this method. For the covariance selection problem, the method can terminate in O(n3/e){O(n^3/\epsilon)} iterations with an e{\epsilon}-optimal solution. We compare the performance of the BCGD method with the first-order methods proposed by Lu (SIAM J Optim 19:1807–1827, 2009; SIAM J Matrix Anal Appl 31:2000–2016, 2010) for solving the covariance selection problem on randomly generated instances. Our numerical experience suggests that the BCGD method can be efficient for large-scale covariance selection problems with constraints.  相似文献   

3.
In this paper, we provide a unified iteration complexity analysis for a family of general block coordinate descent methods, covering popular methods such as the block coordinate gradient descent and the block coordinate proximal gradient, under various different coordinate update rules. We unify these algorithms under the so-called block successive upper-bound minimization (BSUM) framework, and show that for a broad class of multi-block nonsmooth convex problems, all algorithms covered by the BSUM framework achieve a global sublinear iteration complexity of \(\mathcal{{O}}(1/r)\), where r is the iteration index. Moreover, for the case of block coordinate minimization where each block is minimized exactly, we establish the sublinear convergence rate of O(1/r) without per block strong convexity assumption.  相似文献   

4.
Pegasos: primal estimated sub-gradient solver for SVM   总被引:2,自引:0,他引:2  
We describe and analyze a simple and effective stochastic sub-gradient descent algorithm for solving the optimization problem cast by Support Vector Machines (SVM). We prove that the number of iterations required to obtain a solution of accuracy e{\epsilon} is [(O)\tilde](1 / e){\tilde{O}(1 / \epsilon)}, where each iteration operates on a single training example. In contrast, previous analyses of stochastic gradient descent methods for SVMs require W(1 / e2){\Omega(1 / \epsilon^2)} iterations. As in previously devised SVM solvers, the number of iterations also scales linearly with 1/λ, where λ is the regularization parameter of SVM. For a linear kernel, the total run-time of our method is [(O)\tilde](d/(le)){\tilde{O}(d/(\lambda \epsilon))}, where d is a bound on the number of non-zero features in each example. Since the run-time does not depend directly on the size of the training set, the resulting algorithm is especially suited for learning from large datasets. Our approach also extends to non-linear kernels while working solely on the primal objective function, though in this case the runtime does depend linearly on the training set size. Our algorithm is particularly well suited for large text classification problems, where we demonstrate an order-of-magnitude speedup over previous SVM learning methods.  相似文献   

5.
We describe a procedure to reduce variable bounds in mixed integer nonlinear programming (MINLP) as well as mixed integer linear programming (MILP) problems. The procedure works by combining pairs of inequalities of a linear programming (LP) relaxation of the problem. This bound reduction procedure extends the feasibility based bound reduction technique on linear functions, used in MINLP and MILP. However, it can also be seen as a special case of optimality based bound reduction, a method to infer variable bounds from an LP relaxation of the problem. For an LP relaxation with m constraints and n variables, there are O(m 2) pairs of constraints, and a naïve implementation of our bound reduction scheme has complexity O(n 3) for each pair. Therefore, its overall complexity O(m 2 n 3) can be prohibitive for relatively large problems. We have developed a more efficient procedure that has complexity O(m 2 n 2), and embedded it in two Open-Source solvers: one for MINLP and one for MILP. We provide computational results which substantiate the usefulness of this bound reduction technique for several instances.  相似文献   

6.
A coordinate gradient descent method for nonsmooth separable minimization   总被引:1,自引:0,他引:1  
We consider the problem of minimizing the sum of a smooth function and a separable convex function. This problem includes as special cases bound-constrained optimization and smooth optimization with ?1-regularization. We propose a (block) coordinate gradient descent method for solving this class of nonsmooth separable problems. We establish global convergence and, under a local Lipschitzian error bound assumption, linear convergence for this method. The local Lipschitzian error bound holds under assumptions analogous to those for constrained smooth optimization, e.g., the convex function is polyhedral and the smooth function is (nonconvex) quadratic or is the composition of a strongly convex function with a linear mapping. We report numerical experience with solving the ?1-regularization of unconstrained optimization problems from Moré et al. in ACM Trans. Math. Softw. 7, 17–41, 1981 and from the CUTEr set (Gould and Orban in ACM Trans. Math. Softw. 29, 373–394, 2003). Comparison with L-BFGS-B and MINOS, applied to a reformulation of the ?1-regularized problem as a bound-constrained optimization problem, is also reported.  相似文献   

7.
In applications such as signal processing and statistics, many problems involve finding sparse solutions to under-determined linear systems of equations. These problems can be formulated as a structured nonsmooth optimization problems, i.e., the problem of minimizing 1-regularized linear least squares problems. In this paper, we propose a block coordinate gradient descent method (abbreviated as CGD) to solve the more general 1-regularized convex minimization problems, i.e., the problem of minimizing an 1-regularized convex smooth function. We establish a Q-linear convergence rate for our method when the coordinate block is chosen by a Gauss-Southwell-type rule to ensure sufficient descent. We propose efficient implementations of the CGD method and report numerical results for solving large-scale 1-regularized linear least squares problems arising in compressed sensing and image deconvolution as well as large-scale 1-regularized logistic regression problems for feature selection in data classification. Comparison with several state-of-the-art algorithms specifically designed for solving large-scale 1-regularized linear least squares or logistic regression problems suggests that an efficiently implemented CGD method may outperform these algorithms despite the fact that the CGD method is not specifically designed just to solve these special classes of problems.  相似文献   

8.
In this paper we develop a fast collocation method for second boundary integral equations by the trigonometric polynomials. We propose a convenient way to compress the dense matrix representation of a compact integral operator with a smooth kernel under the Fourier basis and the corresponding collocation functionals. The compression leads to a sparse matrix with only O(nlog2n) number of nonzero entries, where 2n+1 denotes the order of the matrix. Thus we develop a fast Fourier-collocation method. We prove that the fast Fourier-collocation method gives the optimal convergence order up to a logarithmic factor. Moreover, we design a fast scheme for solving the corresponding truncated linear system. We establish that this algorithm preserves the quasi-optimal convergence of the approximate solution with requiring a number of O(nlog3n) multiplications.  相似文献   

9.
We address the problem of recovering an n-vector from m linear measurements lacking sign or phase information. We show that lifting and semidefinite relaxation suffice by themselves for stable recovery in the setting of m=O(nlogn) random sensing vectors, with high probability. The recovery method is optimizationless in the sense that trace minimization in the PhaseLift procedure is unnecessary. That is, PhaseLift reduces to a feasibility problem. The optimizationless perspective allows for a Douglas-Rachford numerical algorithm that is unavailable for PhaseLift. This method exhibits linear convergence with a favorable convergence rate and without any parameter tuning.  相似文献   

10.
We present the interpolation search B-tree (ISB-tree), a new cache-aware indexing scheme that supports update operations (insertions and deletions) in O(1) worst-case block transfers and search operations in O(logBlogn) expected block transfers, where B represents the disk block size and n denotes the number of stored elements. The expected search bound holds with high probability for a large class of (unknown) input distributions. The worst-case search bound of our indexing scheme is O(logBn) block transfers. Our update and expected search bounds constitute a considerable improvement over the O(logBn) worst-case block transfer bounds for search and update operations achieved by the B-tree and its numerous variants. This is also verified by an accompanying experimental study.  相似文献   

11.
The job-shop problem is one of the most difficult NP-hard scheduling problems. A 10×10-problem published in 1963 has been solved only recently by Carlier and Pinson using a branch and bound method. Other branch and bound algorithms have been developed recently. The efficiency of all these branch and bound methods relies on the concept of immediate selection which allows to introduce order relations on the setI of all operations to be processed on the same machine before branching. We present new algorithms for immediate selection. Among them are
  1. anO(max {n logn,f})-algorithm for fixing all disjunctions induced by cliques;
  2. anO(n 2)-algorithm based on concepts which are different from those used by Carlier and Pinson.
Here,n is the number of operations inI andf is the number of induced order relations.  相似文献   

12.
We develop a fast fully discrete Fourier-Galerkin method for solving a class of singular boundary integral equations. We prove that the number of multiplications used in generating the compressed matrix is O(nlog3n), and the solution of the proposed method preserves the optimal convergence order O(nt), where n is the order of the Fourier basis functions used in the method and t denotes the degree of regularity of the exact solution. Moreover, we propose a preconditioning which ensures the numerical stability when solving the preconditioned linear system. Numerical examples are presented to confirm the theoretical estimates and to demonstrate the approximation accuracy and computational efficiency of the proposed algorithm.  相似文献   

13.
We consider the numerical solution of the generalized Lyapunov and Stein equations in \(\mathbb {R}^{n}\), arising respectively from stochastic optimal control in continuous- and discrete-time. Generalizing the Smith method, our algorithms converge quadratically and have an O(n3) computational complexity per iteration and an O(n2) memory requirement. For large-scale problems, when the relevant matrix operators are “sparse”, our algorithm for generalized Stein (or Lyapunov) equations may achieve the complexity and memory requirement of O(n) (or similar to that of the solution of the linear systems associated with the sparse matrix operators). These efficient algorithms can be applied to Newton’s method for the solution of the rational Riccati equations. This contrasts favourably with the naive Newton algorithms of O(n6) complexity or the slower modified Newton’s methods of O(n3) complexity. The convergence and error analysis will be considered and numerical examples provided.  相似文献   

14.
Ma and Spinrad have shown that every transitive orientation of a chordal comparability graph is the intersection of four linear orders. That is, chordal comparability graphs are comparability graphs of posets of dimension four. Among other uses, this gives an implicit representation of a chordal comparability graph using O(n) integers so that, given two vertices, it can be determined in O(1) time whether they are adjacent, no matter how dense the graph is. We give a linear time algorithm for finding the four linear orders, improving on their bound of O(n2).  相似文献   

15.
The application of the fast gradient method to the dual quadratic programming (QP) problem leads to the dual fast projected gradient (DFPG) method. The DFPG converges with O(k ?2) rate, where k > 0 is the number of steps. At each step, it requires O(nm) operations. Therefore for a given ${\varepsilon > 0}$ an ${\varepsilon}$ -approximation to the optimal dual function value one achieves in ${O(nm\varepsilon^{-\frac{1}{2}})}$ operations. We present numerical results which strongly corroborate the theory. In particular, we demonstrate high efficiency of the DFPG for large scale QP.  相似文献   

16.
We presented a new logarithmic-quadratic proximal alternating direction scheme for the separable constrained convex programming problem. The predictor is obtained by solving series of related systems of non-linear equations in a parallel wise. The new iterate is obtained by searching the optimal step size along a new descent direction. The new direction is obtained by the linear combination of two descent directions. Global convergence of the proposed method is proved under certain assumptions. We show the O(1 / t) convergence rate for the parallel LQP alternating direction method.  相似文献   

17.

For solving the large-scale linear least-squares problem, we propose a block version of the randomized extended Kaczmarz method, called the two-subspace randomized extended Kaczmarz method, which does not require any row or column paving. Theoretical analysis and numerical results show that the two-subspace randomized extended Kaczmarz method is much more efficient than the randomized extended Kaczmarz method. When the coefficient matrix is of full column rank, the two-subspace randomized extended Kaczmarz method can also outperform the randomized coordinate descent method. If the linear system is consistent, we remove one of the iteration sequences in the two-subspace randomized extended Kaczmarz method, which approximates the projection of the right-hand side vector onto the orthogonal complement space of the range space of the coefficient matrix, and obtain the generalized two-subspace randomized Kaczmarz method, which is actually a generalization of the two-subspace randomized Kaczmarz method without the assumptions of unit row norms and full column rank on the coefficient matrix. We give the upper bound for the convergence rate of the generalized two-subspace randomized Kaczmarz method which also leads to a better upper bound for the convergence rate of the two-subspace randomized Kaczmarz method.

  相似文献   

18.
In this paper we present algorithms, which given a circular arrangement of n uniquely numbered processes, determine the maximum number in a distributive manner. We begin with a simple unidirectional algorithm, in which the number of messages passed is bounded by 2 n log n + O(n). By making several improvements to the simple algorithm, we obtain a unidirectional algorithm in which the number of messages passed is bounded by 1.5nlogn + O(n). These algorithms disprove Hirschberg and Sinclair's conjecture that O(n2) is a lower bound on the number of messages passed in undirectional algorithms for this problem. At the end of the paper we indicate how our methods can be used to improve an algorithm due to Peterson, to obtain a unidirectional algorithm using at most 1.356nlogn + O(n) messages. This is the best bound so far on the number of messages passed in both the bidirectional and unidirectional cases.  相似文献   

19.
In Demmel et al. (Numer. Math. 106(2), 199–224, 2007) we showed that a large class of fast recursive matrix multiplication algorithms is stable in a normwise sense, and that in fact if multiplication of n-by-n matrices can be done by any algorithm in O(n ω+η ) operations for any η >  0, then it can be done stably in O(n ω+η ) operations for any η >  0. Here we extend this result to show that essentially all standard linear algebra operations, including LU decomposition, QR decomposition, linear equation solving, matrix inversion, solving least squares problems, (generalized) eigenvalue problems and the singular value decomposition can also be done stably (in a normwise sense) in O(n ω+η ) operations. J. Demmel acknowledges support of NSF under grants CCF-0444486, ACI-00090127, CNS-0325873 and of DOE under grant DE-FC02-01ER25478.  相似文献   

20.
We propose a new Broyden method for solving systems of nonlinear equations, which uses the first derivatives, but is more efficient than the Newton method (measured by the computational time) for larger dense systems. The new method updates QR or LU decompositions of nonsymmetric approximations of the Jacobian matrix, so it requires O(n 2) arithmetic operations per iteration in contrast with the Newton method, which requires O(n 3) operations per iteration. Computational experiments confirm the high efficiency of the new method.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号