Inverse Document Frequency

Similar to the normalized Term Frequency we want to find out how much a given word is found in a collection of documents .

Intuition This formula is near when a word is present in nearly all documents. For example, the word “the” will most likely have a value of zero. This will ensure that “meaningless” words will not be used for Information Retrieval.