Attention As seen in Transformer Architecture and Attention Is All You Need. Multi-Head Attention The Scaled Dot-Product Attention