Optimization Online


Adaptive Accelerated Gradient Converging Methods under Holderian Error Bound Condition

Liu Mingrui (mingrui-liu***at***uiowa.edu)
Yang Tianbao (tianbao-yang***at***uiowa.edu)

Abstract: In this paper, we focus our study on the convergence of (proximal) gradient methods and accelerated (proximal) gradient methods for smooth (composite) optimization under a H\"{o}lderian error bound (HEB) condition. We first show that proximal gradient (PG) method is automatically adaptive to HEB while accelerated proximal gradient (APG) method can be adaptive to HEB by restart with an improved iteration complexity. In particular, PG and adaptive APG method enjoy an $O(Lc^2\max(\frac{1}{\epsilon^{1-2\theta}}, \log(\frac{1}{\epsilon})))$ and $O(\sqrt{L}c\max(\frac{1}{\epsilon^{1/2-\theta}}, \log(\frac{1}{\epsilon})))$ iteration complexity for the convergence of objective value, respectively, where $\theta\in(0,1/2]$ is the exponent and $c$ is the constant factor in the HEB condition. However, the number of iterations to restart APG hinges on a possibly unknown parameter $c$. To address this issue, we propose to develop adaptive gradient converging methods, i.e., using the magnitude of gradient as a criterion for restart and termination. We develop adaptive accelerated gradient converging (adaAGC) methods for solving smooth (composite) optimization under the HEB condition with an explicit exponent $\theta$, and establish an iteration complexity of $O(\sqrt{L}c^{1/2(1-\theta)}\max(\frac{1}{\epsilon^{(1-2\theta)/2(1-\theta)}}, \log(\frac{1}{\epsilon})))$ for the convergence of (proximal) gradient's norm. We also show that a slightly modified variant of PG has an iteration complexity of $O(Lc^{1/(1-\theta)}\max(\frac{1}{\epsilon^{(1-2\theta)/(1-\theta)}}, \log(\frac{1}{\epsilon})))$ for the convergence of (proximal) gradient's norm. Furthermore, we demonstrate that these results have important implication and applications in machine learning: (i) if the considered objective function is coercive and semi-algebraic, PG's convergence speed is essentially $o(\frac{1}{t})$, where $t$ is the total number of iterations; (ii) if the objective function consists of an $\ell_1$, $\ell_\infty$, $\ell_{1,\infty}$, or huber norm regularization and a convex smooth piecewise quadratic loss, which includes least-squares loss, smoothed hinge loss and huber loss, the proposed adaAGC is parameter-free and enjoys a faster linear convergence than PG without any other assumptions (e.g., restricted eigen-value condition). It is notable that for all aforementioned problems, our linear convergence results are global instead of local. To the best of our knowledge, these improved results are the first shown in this work.

Keywords: Holderian error bound; adaptive convergence; accelerated proximal gradient method; smooth composite optimization

Category 1: Convex and Nonsmooth Optimization (Convex Optimization )


Download: [PDF]

Entry Submitted: 11/22/2016
Entry Accepted: 11/22/2016
Entry Last Modified: 05/14/2017

Modify/Update this entry

  Visitors Authors More about us Links
  Subscribe, Unsubscribe
Digest Archive
Search, Browse the Repository


Coordinator's Board
Classification Scheme
Give us feedback
Optimization Journals, Sites, Societies
Mathematical Optimization Society