Multi-Head Attention Uses The Scaled Dot-Product Attention in multiple layers. Transclude of Multi-Head-Attention.canvas