Optimization Online


Global Convergence in Deep Learning with Variable Splitting via the Kurdyka-{\L}ojasiewicz Property

J Zeng (jsh.zeng***at***gmail.com)
S Ouyang (oyskang***at***163.com)
T Lau (timlautk***at***gmail.com)
S Lin (sblin1983***at***gmail.com)
Y Yuan (yuany***at***ust.hk)

Abstract: Deep learning has recently attracted a significant amount of attention due to its great empirical success. However, the effectiveness in training deep neural networks (DNNs) remains a mystery in the associated nonconvex optimizations. In this paper, we aim to provide some theoretical understanding on such optimization problems. In particular, the Kurdyka-{\L}ojasiewicz (KL) property is established for DNN training with variable splitting schemes, which leads to the global convergence of block coordinate descent (BCD) type algorithms to a critical point of objective functions under natural conditions of DNNs. Some existing BCD algorithms can be viewed as special cases in this framework.

Keywords: Deep learning, Kurdyka-{\L}ojasiewicz inequality, Block coordinate descent, Global convergence

Category 1: Applications -- Science and Engineering (Data-Mining )

Category 2: Convex and Nonsmooth Optimization (Other )



Entry Submitted: 10/22/2018
Entry Accepted: 10/22/2018
Entry Last Modified: 07/05/2019

Modify/Update this entry

  Visitors Authors More about us Links
  Subscribe, Unsubscribe
Digest Archive
Search, Browse the Repository


Coordinator's Board
Classification Scheme
Give us feedback
Optimization Journals, Sites, Societies
Mathematical Optimization Society