Transformer Encoder

An Encoder that takes the embedded text as input and applies Multi-Head Attention to it on a stack of identical layers. The output of it is then fed into the Transformer Decoder.

Each layer has two sublayers

Each of the sublayers has a Residual Connection that lets information go past each layer. They are then added onto the output of the normalized sublayers. All sublayers as well as the Word Embedding layers produce outputs of dimension .