Decision Tree — Entropy and Information Gain for 3 Outcomes

Calculate entropy and information gain using the logarithm base 3 (log of 3)

In a decision tree scenario where a parent node is divided into three child nodes, you would calculate entropy and information gain using the logarithm base 3 (log of 3). This is because there are three possible outcomes (classes) at each split.

Here’s the formula and calculation for entropy and information gain:

Entropy Formula:

Entropy measures the impurity or disorder in a dataset.

For a multi-class problem with three classes, the entropy formula is:

Entropy = - Σ(p(i) * log3(p(i)))

where
- “p(i)” is the probability of a data point belonging to class “i.”
- i : three classes

Information Gain Formula:

Information gain measures how much the entropy is reduced after a split. The formula for information gain in the case of three outcome is:

Information Gain = Entropy(parent) - Σ(Weight(child) / Total Weight) * Entropy(child)


Here,
"Entropy(parent)" is the impurity of the parent node before splitting,
"Weight(child)" is the number of data points in each child node,
"Entropy(child)" is the impurity of each child node after splitting, and
"Total Weight" is the total number of data points in the parent node.

You would use the base-3 logarithm (log of 3) because there are three classes (outcomes) being considered in the calculation. The logarithm base should match the number of possible outcomes in the classification problem.

Additional Blogs by Author

  1. Python Function: Type of Arguments in a Function

2. Understanding Python’s init Method: Object Initialization in Depth

3. Python’s main: Setting the Stage for Your Code

4. Understanding Python’s Try-Except Statements: A Safety Net for Your Code

5. Exploring Python Classes and Object-Oriented Programming

6. Lambda Functions in Python

7. Python Pandas: Creative Data Manipulation and Analysis

8. Python OOP Concepts Made Simple

--

--

No responses yet