  


Gradient Descent using Duality Structures
Thomas Flynn (tflynngradcenter.cuny.edu) Abstract: Gradient descent is commonly used to solve optimization problems arising in machine learning, such as training neural networks. Although it seems to be effective for many different neural network training problems, it is unclear if the effectiveness of gradient descent can be explained using existing performance guarantees for the algorithm. We argue that existing analyses of gradient descent rely on assumptions that are too strong to be applicable in the case of multilayer neural networks. To address this, we propose an algorithm, duality structure gradient descent (DSGD), that is amenable to a nonasymptotic performance analysis, under mild assumptions on the training set and network architecture. The algorithm can be viewed as a form of layerwise coordinate descent, where at each iteration the algorithm chooses one layer of the network to update. The decision of what layer to update is done in a greedy fashion, based on a rigorous lower bound of the function decrease for each possible choice of layer. In the analysis, we bound the time required to reach approximate stationary points, in both the deterministic and stochastic settings. The convergence is measured in terms of a Finsler geometry that is derived from the network architecture and designed to confirm a Lipschitzlike property on the gradient of the training objective function. Numerical experiments in both the full batch and minibatch settings suggest that the algorithm is a promising step towards methods for training neural networks that are both rigorous and efficient. Keywords: optimization, neural networks, Category 1: Nonlinear Optimization (Unconstrained Optimization ) Citation: Download: [PDF] Entry Submitted: 08/01/2017 Modify/Update this entry  
Visitors  Authors  More about us  Links  
Subscribe, Unsubscribe Digest Archive Search, Browse the Repository

Submit Update Policies 
Coordinator's Board Classification Scheme Credits Give us feedback 
Optimization Journals, Sites, Societies  