Optimization Online


Understanding Deep Neural Networks with Rectified Linear Units

Raman Arora(arora***at***cs.jhu.edu)
Amitabh Basu(basu.amitabh***at***jhu.edu)
Poorya Mianjy(mianjy***at***jhu.edu)
Anirbit Mukherjee(amukhe14***at***jhu.edu)

Abstract: In this paper we investigate the family of functions representable by deep neural networks (DNN) with rectified linear units (ReLU). We give the first-ever polynomial time (in the size of data) algorithm to train a ReLU DNN with one hidden layer to {\em global optimality}. This follows from our complete characterization of the ReLU DNN function class whereby we show that a $\R^n \to \R$ function is representable by a ReLU DNN {\em if and only if} it is a continuous piecewise linear function. The main tool used to prove this characterization is an elegant result from tropical geometry. Further, for the $n=1$ case, we show that a single hidden layer suffices to express all piecewise linear functions, and we give tight bounds for the size of such a ReLU DNN. We follow up with gap results showing that there is a smoothly parameterized family of $\R\to \R$ ``hard'' functions that lead to an exponential blow-up in size, if the number of layers is decreased by a small amount. An example consequence of our gap theorem is that for every natural number $N$, there exists a function representable by a ReLU DNN with depth $N^2+1$ and total size $N^3$, such that any ReLU DNN with depth at most $N + 1$ will require at least $\frac12N^{N+1}-1$ total nodes. Finally, we construct a family of $\R^n\to \R$ functions for $n\geq 2$ (also smoothly parameterized), whose number of affine pieces scales exponentially with the dimension $n$ at any fixed size and depth. To the best of our knowledge, such a construction with exponential dependence on $n$ has not been achieved by previous families of ``hard'' functions in the neural nets literature. This construction utilizes the theory of zonotopes from polyhedral theory.

Keywords: deep neural networks; rectified linear units (ReLU); training neural nets; depth v/s size trade-off; zonotopes

Category 1: Other Topics (Other )

Category 2: Combinatorial Optimization (Polyhedra )

Category 3: Applications -- Science and Engineering (Other )


Download: [PDF]

Entry Submitted: 12/21/2016
Entry Accepted: 12/21/2016
Entry Last Modified: 12/21/2016

Modify/Update this entry

  Visitors Authors More about us Links
  Subscribe, Unsubscribe
Digest Archive
Search, Browse the Repository


Coordinator's Board
Classification Scheme
Give us feedback
Optimization Journals, Sites, Societies
Mathematical Optimization Society