Attention Is All You Need
Short summary
Until now there have been mostly complex Recurrent Neural Networks or Convolutional Neural Networks that inlcude an encoder and a decoder. The best methods also use an attention mechanism.
The paper suggests a new network architecture that only uses the Attention mechanism and leaves out any recurrence and convolutions.
More details
Model Architecture
The Transformer Architecture model has two main parts, the Encoder and the Decoder.
The Scaled Dot-Product Attention
Transclude of Transformer-Model-Architecture.canvas