Global Convergence in Deep Learning with Variable Splitting via the Kurdyka-{\L}ojasiewicz Property

J Zeng (jsh.zeng***at***gmail.com)
S Ouyang (oyskang***at***163.com)
T Lau (timlautk***at***gmail.com)
S Lin (sblin1983***at***gmail.com)
Y Yuan (yuany***at***ust.hk)

Abstract: Deep learning has recently attracted a significant amount of attention due to its great empirical success. However, the effectiveness in training deep neural networks (DNNs) remains a mystery in the associated nonconvex optimizations. In this paper, we aim to provide some theoretical understanding on such optimization problems. In particular, the Kurdyka-{\L}ojasiewicz (KL) property is established for DNN training with variable splitting schemes, which leads to the global convergence of block coordinate descent (BCD) type algorithms to a critical point of objective functions under natural conditions of DNNs. Some existing BCD algorithms can be viewed as special cases in this framework.

Keywords: Deep learning, Kurdyka-{\L}ojasiewicz inequality, Block coordinate descent, Global convergence

Category 1: Applications -- Science and Engineering (Data-Mining )

Category 2: Convex and Nonsmooth Optimization (Other )



Entry Submitted: 10/22/2018
Entry Accepted: 10/22/2018
Entry Last Modified: 07/05/2019

