arXiv 1606.08415

Gaussian Error Linear Units (GELUs)

By Dan Hendrycks and Kevin Gimpel

Published 2016-06-27

Citation lineage

Review the prior work and downstream research connected to this paper.

We propose the Gaussian Error Linear Unit (GELU), a high-performing neural network activation function. The GELU activation function is , where the standard Gaussian cumulative distribution function. The GELU nonlinearity weights inputs by their value, rather than gates inputs by their sign as in ReLUs ( ). We perform an empirical evaluation of the GELU nonlinearity against the ReLU and ELU activations and find perf…

View the original paper on arXiv