Attention Is All You Need

Short summary

Until now there have been mostly complex Recurrent Neural Networks or Convolutional Neural Networks that inlcude an encoder and a decoder. The best methods also use an attention mechanism.

The paper suggests a new network architecture that only uses the Attention mechanism and leaves out any recurrence and convolutions.

The Transformer Architecture model has two main parts, the Encoder and the Decoder.

Transclude of Transformer-Model-Architecture.canvas