0% found this document useful (0 votes)
6 views

midterm-2017

The document discusses the application of a naive Bayes classifier to predict binary output y based on input variables A, B, and C, providing calculations for probabilities. It also contrasts maximum likelihood and maximum a posteriori hypotheses, and evaluates mean squared errors for linear regression on given datasets. Additionally, it addresses decision tree learning for categorical attributes, including the XOR function and the relevance of attributes in decision trees.

Uploaded by

ttywat61
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

midterm-2017

The document discusses the application of a naive Bayes classifier to predict binary output y based on input variables A, B, and C, providing calculations for probabilities. It also contrasts maximum likelihood and maximum a posteriori hypotheses, and evaluates mean squared errors for linear regression on given datasets. Additionally, it addresses decision tree learning for categorical attributes, including the XOR function and the relevance of attributes in decision trees.

Uploaded by

ttywat61
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Question 1

P(A=0|y=0)=2/3 a) Suppose we are given the following dataset, where A,B,C are input binary random variables,
P(A=0|y=1)=1/4 and y is a binary output whose value we want to predict.

P(B=0|y=0)=1/3 A: 3 0/ 4 1 total 7
P(B=0|y=1)=2/4 B: 3 0/ 4 1 total 7
C:4 0/ 3 1 total 7
P(C=1|y=0)=1/3
P(C=1|y=1)=2/4 y: 3 0/ 4 1 total 7

P(A=0|y=0)*P(B=0|y=0)*P(C=1|y=0)*P(y)
=2/3*1/3*1/3*(3/7)=0.0317
How would a naive Bayes classifier predict y given this input: A = 0,B = 0,C = 1.
P(A=0|y=1)*P(B=0|y=1)*P(C=1|y=1)*P(y)
=1/4*2/4*2/4*(4/7)=0.0357

2.5 grades

b) Briefly describe the difference between a maximum likelihood hypothesis and


a maximum a posteriori hypothesis.

2 grade

likelihood >> P(x|c)


posterior >> P(c|x)

since eno max:


Cmap=argmax P(c|x)
=argmax (P(x|c)*P(c)/P(x))
=argmax (P(x|c)*P(C))
Question 2
a) Consider the following data set with one input and one output

1) What is the mean squared training set error of


running linear regression this data (using the model
y )?

Zero
0.5 grade

2) What is the mean squared test set error of running linear regression this data.
Assuming the rightmost three points are in the test set.and the other in the
training data.
0.5 grade
zero

b) Consider the following data with one input and one output

1- What is the mean squared training set


error of running regression on this data
(using the model h (x) (
Hint : by symmetry it is clear that the
best fit to the three data points is a
dah m3nah en theta1=slope=0 horizontal line)
2 grade
Θ0= 1 , Θ1=0 h (x) =1

1 n m
MSE=  i Y  h ( x i )  2
  ˆi2
n i 1 i 1

MSE = 1/3(0+1+0)= 1/3 Θ0= 1.5 , Θ1=0 h (x) =1

1 n m
 Yi  h ( xi )   ˆi2
Or 2
MSE=
n i 1 i 1

MSE = 1/3(0.25+0.25+0.25)= 1/4


c) Suppose we plan to do regression with the following basis
functions

1.5 grade
Question 3

a) Give the decision tree that represent XOR function

A B Class 1.5 grade


1 1 0
1 0 1
0 1 1
0 0 0

Answer:- the information gain for each feature =0


So you can use any one

b) Suppose that X1, ..., Xm are categorical input attributes and Y is


categorical output attribute. Suppose we plan to learn a decision
tree.

For the following sentences, state which one is true or false with
reason(s)
1.5 grade for each

1) If Xi and Y are independent in the distribution that generated this dataset,


then Xi will not appear in the decision tree.

Answer: False (because the attribute may become relevant further down
the tree when the records are restricted to some value of another attribute)
(e.g. XOR)

2) If G(Y j,Xi) = 0 according to the values of entropy and conditional entropy


computed from the data, then Xi will not appear in the decision tree.

Answer: False (because the attribute may become relevant further down
the tree when the records are restricted to some value of another attribute)
(e.g. XOR)

3) The maximum depth of the decision tree must be less than m+1

Answer: True because the attributes are categorical and can each be split
only once

You might also like