0% found this document useful (0 votes)
3 views

weather forecasting example (2)

The document discusses the ID3 algorithm, which is used to build decision trees in machine learning by recursively partitioning data based on attributes. It includes a dataset on weather conditions and play outcomes, and demonstrates calculations of entropy and information gain for different attributes like outlook and wind. Key results show the entropy values for various conditions, which are essential for determining the best attribute to split the data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

weather forecasting example (2)

The document discusses the ID3 algorithm, which is used to build decision trees in machine learning by recursively partitioning data based on attributes. It includes a dataset on weather conditions and play outcomes, and demonstrates calculations of entropy and information gain for different attributes like outlook and wind. Key results show the entropy values for various conditions, which are essential for determining the best attribute to split the data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

In [1]: #In machine learning, "ID3" stands for "Iterative Dichotomizer 3", which refers to a dec

#Key points about ID3:


#Function: It builds a decision tree by recursively partitioning data based on the attri
#Developed by: Ross Quinlan

#Key concept: Uses entropy as a measure of uncertainty to calculate information gain.


import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [2]: play_data = pd.read_csv('tennis.csv')

In [3]: play_data

Out[3]: outlook temp humidity windy play

0 sunny hot high False no

1 sunny hot high True no

2 overcast hot high False yes

3 rainy mild high False yes

4 rainy cool normal False yes

5 rainy cool normal True no

6 overcast cool normal True yes

7 sunny mild high False no

8 sunny cool normal False yes

9 rainy mild normal False yes

10 sunny mild normal True yes

11 overcast mild high True yes

12 overcast hot normal False yes

13 rainy mild high True no

In [4]: play_data.play.value_counts()

play
Out[4]:
yes 9
no 5
Name: count, dtype: int64

In [5]: Entropy_Play = -(9/14)*np.log2(9/14) -(5/14)*np.log2(5/14)

In [6]: Entropy_Play
0.9402859586706311
Out[6]:

In [7]: play_data[play_data.outlook == 'sunny']

Out[7]: outlook temp humidity windy play

0 sunny hot high False no

1 sunny hot high True no


7 sunny mild high False no

8 sunny cool normal False yes

10 sunny mild normal True yes

In [15]: # Entropy(Play|Outlook=Sunny)
Entropy_Play_Outlook_Sunny =-(3/5)*np.log2(3/5) -(2/5)*np.log2(2/5)

In [16]: Entropy_Play_Outlook_Sunny

0.9709505944546686
Out[16]:

In [17]: play_data[play_data.outlook == 'overcast']

Out[17]: outlook temp humidity windy play

2 overcast hot high False yes

6 overcast cool normal True yes

11 overcast mild high True yes

12 overcast hot normal False yes

In [18]: # Entropy(Play|Outlook=overcast)
# Since, it's a homogenous data entropy will be 0

In [19]: play_data[play_data.outlook == 'rainy']

Out[19]: outlook temp humidity windy play

3 rainy mild high False yes

4 rainy cool normal False yes

5 rainy cool normal True no

9 rainy mild normal False yes

13 rainy mild high True no

In [20]: # Entropy(Play|Outlook=rainy)
Entropy_Play_Outlook_Rain = -(2/5)*np.log2(2/5) - (3/5)*np.log2(3/5)

In [21]: Entropy_Play_Outlook_Rain

0.9709505944546686
Out[21]:

In [22]: #Gain on splitting by attribute outlook

In [23]: #Gain(Play, Outlook) = Entropy(Play) – [ p(Play|Outlook=Sunny) . Entropy(Play|Outlook=Su


#[ p(Play|Outlook=Overcast) . Entropy(Play|Outlook=Overcast) ] – [ p(Play|Outlook=Rain)

Entropy_Play - (5/14)*Entropy_Play_Outlook_Sunny - (4/14)*0 - (5/14) * Entropy_Play_Outl

0.24674981977443933
Out[23]:

In [24]: play_data[play_data.outlook == 'overcast']


Out[24]: outlook temp humidity windy play

2 overcast hot high False yes

6 overcast cool normal True yes

11 overcast mild high True yes

12 overcast hot normal False yes

In [25]: play_data[play_data.outlook == 'sunny']

Out[25]: outlook temp humidity windy play

0 sunny hot high False no

1 sunny hot high True no

7 sunny mild high False no

8 sunny cool normal False yes

10 sunny mild normal True yes

In [26]: # Entropy(Play_Sunny|)
Entropy_Play_Outlook_Sunny =-(3/5)*np.log2(3/5) -(2/5)*np.log2(2/5)

In [27]: Entropy_Play_Outlook_Sunny
0.9709505944546686
Out[27]:

In [28]: Entropy_Wind_False = -(1/3)*np.log2(1/3) - (2/3)*np.log2(2/3)

In [29]: Entropy_Wind_False
0.9182958340544896
Out[29]:

In [ ]:

You might also like