arXiv 1606.08415
Gaussian Error Linear Units (GELUs)
By Dan Hendrycks and Kevin Gimpel
Published 2016-06-27
Discussion
Read the public discussion and references gathered around this paper.
We propose the Gaussian Error Linear Unit (GELU), a high-performing neural network activation function. The GELU activation function is , where the standard Gaussian cumulative distribution function. The GELU nonlinearity weights inputs by their value, rather than gates inputs by their sign as in ReLUs ( ). We perform an empirical evaluation of the GELU nonlinearity against the ReLU and ELU activations and find perf…