Attention Is All You Need

Short summary

Until now there have been mostly complex Recurrent Neural Networks or Convolutional Neural Networks that inlcude an encoder and a decoder. The best methods also use an attention mechanism.

The paper suggests a new network architecture that only uses the Attention mechanism and leaves out any recurrence and convolutions.

More details

Model Architecture

The Transformer Architecture model has two main parts, the Encoder and the Decoder.

The Multi-Head Attention

The Scaled Dot-Product Attention

Transclude of Transformer-Model-Architecture.canvas

Other References