Common Bag Of Words
Architecture for a single word
![[CleanShot 2023-10-03 at 21.52.58@2x.png]]
The hidden layer does not have an Activation Function. The output layer computes the Softmax.
Architecture for multiple words
You can interpret the second / middle word vector here as the current word and the first and last word vectors are its context which has influence on the embedding.
![[CleanShot 2023-10-03 at 21.57.06@2x.png]]