  


Adaptive Accelerated Gradient Converging Methods under Holderian Error Bound Condition
Liu Mingrui (mingruiliuuiowa.edu) Abstract: In this paper, we focus our study on the convergence of (proximal) gradient methods and accelerated (proximal) gradient methods for smooth (composite) optimization under a H\"{o}lderian error bound (HEB) condition. We first show that proximal gradient (PG) method is automatically adaptive to HEB while accelerated proximal gradient (APG) method can be adaptive to HEB by restart with an improved iteration complexity. In particular, PG and adaptive APG method enjoy an $O(Lc^2\max(\frac{1}{\epsilon^{12\theta}}, \log(\frac{1}{\epsilon})))$ and $O(\sqrt{L}c\max(\frac{1}{\epsilon^{1/2\theta}}, \log(\frac{1}{\epsilon})))$ iteration complexity for the convergence of objective value, respectively, where $\theta\in(0,1/2]$ is the exponent and $c$ is the constant factor in the HEB condition. However, the number of iterations to restart APG hinges on a possibly unknown parameter $c$. To address this issue, we propose to develop adaptive gradient converging methods, i.e., using the magnitude of gradient as a criterion for restart and termination. We develop adaptive accelerated gradient converging (adaAGC) methods for solving smooth (composite) optimization under the HEB condition with an explicit exponent $\theta$, and establish an iteration complexity of $O(\sqrt{L}c^{1/2(1\theta)}\max(\frac{1}{\epsilon^{(12\theta)/2(1\theta)}}, \log(\frac{1}{\epsilon})))$ for the convergence of (proximal) gradient's norm. We also show that a slightly modified variant of PG has an iteration complexity of $O(Lc^{1/(1\theta)}\max(\frac{1}{\epsilon^{(12\theta)/(1\theta)}}, \log(\frac{1}{\epsilon})))$ for the convergence of (proximal) gradient's norm. Furthermore, we demonstrate that these results have important implication and applications in machine learning: (i) if the considered objective function is coercive and semialgebraic, PG's convergence speed is essentially $o(\frac{1}{t})$, where $t$ is the total number of iterations; (ii) if the objective function consists of an $\ell_1$, $\ell_\infty$, $\ell_{1,\infty}$, or huber norm regularization and a convex smooth piecewise quadratic loss, which includes leastsquares loss, smoothed hinge loss and huber loss, the proposed adaAGC is parameterfree and enjoys a faster linear convergence than PG without any other assumptions (e.g., restricted eigenvalue condition). It is notable that for all aforementioned problems, our linear convergence results are global instead of local. To the best of our knowledge, these improved results are the first shown in this work. Keywords: Holderian error bound; adaptive convergence; accelerated proximal gradient method; smooth composite optimization Category 1: Convex and Nonsmooth Optimization (Convex Optimization ) Citation: Download: [PDF] Entry Submitted: 11/22/2016 Modify/Update this entry  
Visitors  Authors  More about us  Links  
Subscribe, Unsubscribe Digest Archive Search, Browse the Repository

Submit Update Policies 
Coordinator's Board Classification Scheme Credits Give us feedback 
Optimization Journals, Sites, Societies  