ImageNet with deep convolutional neural network

ReLU Nonlinearity

Training time of saturating non-linearities like $f(x) = \tanh(x)$ is very slow. Non-saturating non-linearities like $f(x) = max(x,0)$ by Nair and Hinton which is referred as ReLU non-linearity trains several times faster than their equivalent $\tanh(x)$ units.