arXiv 1706.03762

Attention Is All You Need

By Ashish Vaswani, Noam Shazeer, et al.

Published 2017-06-12

Discussion

Read the public discussion and references gathered around this paper.

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machin…

View the original paper on arXiv