midem_ML_regular_solution
midem_ML_regular_solution
Q.2 As a part of efforts to improve students’ performance in the exams, you have been
given the data showing number of study hours spent by students, their gender and
their final results as pass or fail. Using this sample dataset, apply Naïve Bayes
classification technique, to classify the test case {No of study hours = 3.5,
Gender=”male”} either as “Pass”, or “Fail”. [5 Marks]
8 Female Pass
9 Male Pass
Solution:
1. Prior: [1M]
p(y=Pass) p(y=Fail)
0.444444 0.555556
2. No of study hours –X1: continuous variable, applying class conditional PDF [1M]
3.
Variance mean
Pass class 2.945 7.2
Fail class 4.64 3.9
4. X1=3.5, X2=”male” [3M]
𝑌̂ ← argmax 𝑃(𝑌 = 𝑦𝑘 )Π𝑖 𝑃(𝑋𝑖 |𝑌 = 𝑦𝑘 )
𝑦𝑘
p(X1/ y=Pass) 0.105614
p(X1/ y=Fail) 0.184564
P(y=Pass/X) 0.02346969
P(y=Fail/X) 0.061521395
Class : Fail
Q.3 The 2-input AND gate is implemented using logistic regression classifier with
gradient descent optimization algorithm. The model parameters at time t are given
by 0=0, 1=0, and 2=0. Given binary input (x1,x2), [2+3 = 5 Marks]
Solution:
Actual
Target y.ln(yhat)+(1-y)ln(1-
x1 x2 Output-
y yhat)
yhat
0 0 0 0.5 0*ln0.5+(1-0)*ln(1-0.5)
0 1 0 0.5 0*ln0.5+(1-0)*ln(1-0.5)
1 0 0 0.5 0*ln0.5+(1-0)*ln(1-0.5)
1 1 1 0.5 1*ln(0.5)
b) What will be the values of 0, 1 and 2 at (t+1) with learning rate =1 and L2
regularization constant λ=1? [3M]
Solution:
Cost function
Apply gradient descent update rule
y-hat y yhat-y x0 (yhat-y)x0 w0-new
0.5 0 0.5 1 0.5 -0.25
0.5 0 0.5 1 0.5
0.5 0 0.5 1 0.5
0.5 1 -0.5 1 -0.5
x y
0 1
1 3
2 7
5 31
Solution:
Polynomial Regression: y = 0 + 1 x + 2 x 2 [2M]
0 = 1, 1 = 1 and 2 =1
Q.6 Consider the dataset of binary values in terms of attribute-value pairs where F is the
value, and A,B, C are attributes. What is the entropy of the dataset? Fill in the columns
for A and B, if it is known that A has maximum information gain and B has minimum
information gain. Give mathematical justification for your answer. [4 Marks]
A B C F
0 0
1 1
0 1
1 0
0 1
1 1
0 1
1 1
Solution:
For the entropy problem, the column F is the output attribute.
2 Marks:
Let column A = column F, so that the 𝑖th entries of the two column match each other. The
information gain can be written as
|𝑆𝐴 |
𝐼𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛𝐺𝑎𝑖𝑛(𝑆, 𝐴) = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆) − ∑ |𝑆|
𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆𝐴 ) where the sum is over the
attribute values of A.
Since the column entries of A match with those of F we see that the set 𝑆𝐴=0 is full of 0s and
𝑆𝐴=1 is full of 1s, so that 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆𝐴=0 ) is 0 and so is 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆𝐴=1 ). From the equation on
Information Gain we can see that we get the maximum information gain possible in this
case.
2 Marks:
The information gain with respect to column B can be written as 𝐼𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛𝐺𝑎𝑖𝑛(𝑆, 𝐵) =
|𝑆𝐵 |
𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆) − ∑ |𝑆|
𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆𝐵 ).
For minimum information gain we see that if let the column B be the column of all 1s, then
we have 𝑆𝐵=1 = 𝑆 and 𝑆𝐵=0 = 𝜙. Once again plugging this into the information gain equation
shows that the information gain with respect to B is 0.
The arguments above work for maximum information gain when A is taken to be
complement of F, and B is taken to be all zeroes rather than all 1s.