arXiv 1606.08415

Gaussian Error Linear Units (GELUs)

By Dan Hendrycks and Kevin Gimpel

Published 2016-06-27

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

We propose the Gaussian Error Linear Unit (GELU), a high-performing neural network activation function. The GELU activation function is , where the standard Gaussian cumulative distribution function. The GELU nonlinearity weights inputs by their value, rather than gates inputs by their sign as in ReLUs ( ). We perform an empirical evaluation of the GELU nonlinearity against the ReLU and ELU activations and find perf…

View the original paper on arXiv