arXiv 1706.03762
Attention Is All You Need
By Ashish Vaswani, Noam Shazeer, et al.
Published 2017-06-12
Discussion
Read the public discussion and references gathered around this paper.
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machin…