Tokenization
A tokenizer splits text into individual components (e.g. words, characters, etc.) and generates an Integer Encoding from a vocab of possible tokens.
This encoding can then be used to create an Embedding.
A tokenizer splits text into individual components (e.g. words, characters, etc.) and generates an Integer Encoding from a vocab of possible tokens.
This encoding can then be used to create an Embedding.