Information Gain

An Attribute Selection Method which selects the attribute with the highest information gain.

where is the Expected Information also called Entropy.

So for a Binary Classification we have

Calculate Expected Information for every attribute and data partition.

def information_partitioned(dataset: pd.DataFrame, target_attribute: str, partition_attribute: str) -> float:
 
    weights = dataset.value_counts(partition_attribute) / dataset.shape[0]
    return sum(
        [
	        weight * information(
	            dataset[dataset[partition_attribute] == index],
	            target_attribute
            )
	        for index, weight in weights.items()
        ]
    )

→ See Expected Information for information()

def information_gain(dataset: pd.DataFrame, target_attribute: str, partition_attribute: str) -> float:
 
    return information(dataset, target_attribute) - information_partitioned(dataset, target_attribute, partition_attribute)

Select attribute with the highest gain first.

Disadvantages

  • Biased towards multi-valued attributes
  • favors attributes with large numbers