Tokenization

A tokenizer splits text into individual components (e.g. words, characters, etc.) and generates an Integer Encoding from a vocab of possible tokens.

This encoding can then be used to create an Embedding.