Optimization Online


First-Order Algorithms Converge Faster than $O(1/k)$ on Convex Problems

Ching-pei Lee (ching-pei***at***cs.wisc.edu)
Stephen Wright (swright***at***cs.wisc.edu)

Abstract: It is well known that both gradient descent and stochastic coordinate descent achieve a global convergence rate of $O(1/k)$ in the objective value, when applied to a scheme for minimizing a Lipschitz-continuously differentiable, unconstrained convex function. In this work, we improve this rate to $o(1/k)$. We extend the result to proximal gradient and proximal coordinate descent on regularized problems to show similar $o(1/k)$ convergence rates. The result is tight in the sense that a rate of $O(1/k^{1+\epsilon})$ is not generally attainable for any $\epsilon>0$, for any of these methods.

Keywords: Gradient descent methods; Coordinate descent methods; Proximal gradient methods; Convex optimization; Complexity

Category 1: Convex and Nonsmooth Optimization

Citation: The 36th International Conference on Machine Learning

Download: [PDF]

Entry Submitted: 12/20/2018
Entry Accepted: 12/20/2018
Entry Last Modified: 05/13/2019

Modify/Update this entry

  Visitors Authors More about us Links
  Subscribe, Unsubscribe
Digest Archive
Search, Browse the Repository


Coordinator's Board
Classification Scheme
Give us feedback
Optimization Journals, Sites, Societies
Mathematical Optimization Society