Expected Information

where is the probability that tuple in belongs to class which can be estimated by . The Expectation of Surprisal over every possible outome.

It is used to calculate Information Gain and Perplexity.

Python Implementation

def information(dataset: pd.DataFrame, target_attribute: str) -> float:
	p = dataset.value_counts(target_attribute) / dataset.shape[0]
	return -sum([pi * log(pi, 2) for pi in p])