Information Gain
An Attribute Selection Method which selects the attribute with the highest information gain.
where is the Expected Information also called Entropy.
So for a Binary Classification we have
Calculate Expected Information for every attribute and data partition.
def information_partitioned(dataset: pd.DataFrame, target_attribute: str, partition_attribute: str) -> float:
weights = dataset.value_counts(partition_attribute) / dataset.shape[0]
return sum(
[
weight * information(
dataset[dataset[partition_attribute] == index],
target_attribute
)
for index, weight in weights.items()
]
)
→ See Expected Information for information()
def information_gain(dataset: pd.DataFrame, target_attribute: str, partition_attribute: str) -> float:
return information(dataset, target_attribute) - information_partitioned(dataset, target_attribute, partition_attribute)
Select attribute with the highest gain first.
Disadvantages
- Biased towards multi-valued attributes
- favors attributes with large numbers