Final s05 Sols
Final s05 Sols
• Write your answers legibly in the space provided on the examination sheet. If you use the
back of a sheet, indicate clearly that you have done so on the front.
• Write your name and Andrew ID on this page and your Andrew ID on the top of each
successive page in the space provided.
• There are 10 questions in this exam. Some of them are marked with the words “more difficult”.
Please budget your time using this information.
• Calculators are allowed, but laptops and PDAs are not allowed.
• Good luck!
1
Andrew ID: 2
(a) 1/2
(b) 1
(c) 2
(d) Not enough information / cannot be determined
2. Circle one (2 pts): What is the expected value of the node labeled Q?
Answer: (d)
(a) 1/2
(b) 1
(c) 3/2
(d) 2
(e) 3
(f) Not enough information / cannot be determined
Andrew ID: 3
3. Circle one (2 pts): What is the expected value of the node labeled R?
Answer: (b)
(a) 2
(b) 7/4
(c) 1
(d) 2/3
(e) 1/4
(f) Not enough information / cannot be determined
(a) 1
(b) 7/4
(c) 2
(d) 3
(e) Not enough information / cannot be determined
5. True or False (2 pts): You have been provided with enough information so that you could
modify the alpha-beta pruning algorithm to work on this game tree.
Answer: True
Andrew ID: 4
Iterative lengthening search is an iterative version of uniform cost search. The main idea of this
algorithm is that we use increasing limits on path cost.
• We start searching as though we were performing uniform cost search.
• If we generate a node with a path cost that is greater than the current limit, we immediately
discard that node.
• We set the limit for the current iteration to be equal to the lowest path cost of any node that
we discarded in the previous iteration.
• We initialize the path cost limit to be equal to the smallest cost between any two nodes in
the tree. So, for a uniform cost tree, we initialize the path cost limit to 1.
1. True or False (3 pts): When this algorithm first encounters the goal, the goal will be on
the path with the cheapest cost.
Answer: True
2. Circle one (3 pts): Suppose we are searching through a uniform tree with branching factor
b, solution depth d, and unit step costs (that is, the cost to move between two nodes is always
one). In the worst case, how many iterations will iterative lengthening require?
Answer: (b)
(a) b iterations
(b) d iterations
(c) bd iterations
(d) db iterations
(e) Not enough information / cannot be determined
3. (More Difficult) (4 pts) Now suppose that we are using the same tree as in the previous
question, but our step costs are drawn from the continuous range [0, 1] with a minimum
positive cost ². Our path cost limit will be initialized to ². Consider the following statements:
• I. Iterative lengthening will not find the optimal solution for this tree.
• II. In the worst case, running iterative lengthening on this tree will require less time
than running DFS on this tree.
• III. In the worst case, iterative deepening search would require less time to run on this
tree than iterative lengthening.
Andrew ID: 5
(a) I only
(b) II only
(c) III only
(d) I and II
(e) I and III
(f) Neither I, II, nor III
Andrew ID: 6
1. Circle one (2 pts): Which of the following is the largest? (Note that we are not asking for
exact values. You may solve this problem by simply inspecting the table.)
Answer: (c)
What is IG(Alive|Color)?
Answer: 0.36
4. Circle one (2 pts): Suppose we wanted to turn Number of eyes into a binary attribute for
the purpose of building a decision tree. Which of the following binary categorial splits results
in the larger value of IG(Alive—Number of eyes)? (Note that we are not are not asking for
exact values. You may solve this problem by simply inspecting the table.)
Answer: (b)
Andrew ID: 7
5. (6 pts) Suppose we were going to build a decision tree for this data:
• First, we split using the attribute you chose in the previous question.
• Second, we split on Color.
How would this tree classify the following objects? (In case of a tie at a leaf node, classify
the object as Not alive.) NOTE: This should not be a very complicated tree.
(a) Circle one (3 pts): (Alive or Not alive) A red object with 23 eyes
(b) Circle one (3 pts): (Alive or Not alive) A black object with 1.5 eyes
Answer:
Tom is a CMU student. Recently, his mood has been highly influenced by two factors: the weather
(W) and his study (S). Naturally, he likes good weather and hates bad weather. More impor-
tantly, Tom worries about his exams. Tom feels happy if he passes exams and not happy if he fails
them. Now Tom wants to predict his happiness according to these two factors using his previous
experience. Tables A and B show this data.
(a) Using Table A: If today’s situation is W=Good, S=Pass, and Tom uses a Naive Bayes classifier,
how would he predict his happiness? Please show your computations and the classifier’s prediction.
(2 pts)
Answer:
P (W = G|H = 0)P (S = P |H = 0)P (H = 0) = 3/40
P (W = G|H = 1)P (S = P |H = 1)P (H = 1) = 1/8
predict Happy.
(b) Using Table A: If today’s situation is W=Bad, S=Fail, and Tom uses a Naive Bayes classifier,
how would he predict his happiness? Please show your computations and the classifier’s prediction.
(2 pts)
Answer:
P (W = B|H = 0)P (S = F |H = 0)P (H = 0) = 1/5
P (W = B|H = 1)P (S = F |H = 1)P (H = 1) = 0
predict Unhappy.
Tom also notices that his neighbor always goes for a walk if the weather is good and stays at home
if the weather is bad. Tom thinks it wouldn’t hurt to have more information, so he adds one more
Andrew ID: 9
factor, Neighbor (N), to the table. The new table is shown as Table B. You can see that whenever
W=Good, N=Out, and whenever W=Bad, N=home.
(c) Using Table B: Now, if W=Good, S=Pass, N=Out, and Tom uses a Naive Bayes Classifier,
how would he predict his happiness? Please show your computations and the classifier’s prediction.
(2 pts)
Answer:
P (W = G|H = 0)P (S = P |H = 0)P (H = 0) = 9/200
P (W = G|H = 1)P (S = P |H = 1)P (H = 1) = 1/24
predict Unhappy.
(d) Will the new factor improve the performance of the Naive Bayes classifier? Why or why not?
(2 pts)
Answer: No. Since weather and neighbor are not conditionally independent. The assumption
of Naive Bayes doesn’t hold anymore.
(e) Using Table B: Now, if Tom uses a Bayes Classifier instead of a Naive Bayes Classifier, and we
still assume W=Good, S=Pass, N=Out, how would he predict his happiness? Please show your
computations and the classifier’s prediction. (2 pts)
Answer:
P (W = G, S = P, N = O|H = 0)P (H = 0) = 0
P (W = G, S = P, N = O|H = 1)P (H = 1) = 1/8
predict Happy.
Andrew ID: 10
5 K-Means (7 points)
Run K-means manually on the following dataset. Trace through the first six iterations of the K-
means algorithm or until convergence is reached. Circles are data points and squares are the initial
cluster centers. Draw the cluster centers and the approximate decision boundaries that define each
cluster. (NOTE: It is not necessary to draw the exact location of the squares, but it should be clear
from your placement of the squares that you understand how K-means performs quantitatively.)
Andrew ID: 11
• We compute the distance from q to every data point in the training dataset. We typically
use Euclidian distance.
• We classify q by giving it the same class as the majority of those K closest data points (q is
classified as the majority class).
Now, suppose you are running a K-nearest-neighbor classifier on the following training set. The
training set is shown below. It consists of 9 data points. The black dots have class label 1, and the
white dots have class label 0.
y y
0 x 0 x
(a) Use the figure on the left: If we use 1-nearest-neighbor, what is the leave-one-out Cross-
Validation error? (Report the error as a ratio.) (2 pts)
Answer: 29
(b) Use the figure on the left: If we use 3-nearest-neighbor, what is the leave-one-out Cross-
Validation error? (Report the error as a ratio.) (2 pts)
Answer: 19
(c) Use the figure on the right: What is the two-fold Cross-Validation error for 1-nearest-neighbor?
Assume we separate the data into two sets, A and B, as shown in the figure. (Report the error as
a ratio.) (3 pts)
Answer: 49
(d) Use the figure on the right: What is the two-fold Cross-Validation error for 3-nearest-neighbor?
Assume we separate the data into two sets, A and B, as shown in the figure. (Report the error as
Andrew ID: 12
a ratio.) (3 pts)
Answer: 89
Andrew ID: 13
P(E)
E
0.5
B C P(D)
E P(C)
+ + 0.5
A C P(B) + 0.3
+ - 0.2
+ + 0.6 - 0.8 C
- + 0.2
+ - 0.1
P(A) - - 0.5
- + 0.1
0.5
- - 0.6
A B D
Note: In the following, we’ll use the notations: A ⊥ B means A is independent of B; A ⊥ B|C
means A is conditionally independent of B given C.
(a) Please judge if the following independence assumptions are correct or not:
1. (True or False) (1 pt): B ⊥ E|C
Answer: True
2. (True or False) (1 pt): A ⊥ D
Answer: False
3. (True or False) (2 pts): A ⊥ D|B
Answer: False
4. (True or False) (2 pts): A ⊥ D|B, C
Answer: True
(b) Compute the value of P (C) (2 pts)
Answer: 0.55
P=1.0 P=1.0
a1 a1
S1 P=1.0 S2
R=+10 R=-10 a2
a2
P=1/3 P=1.0
a1
P=2/3 P=1.0
S3
R=+20
a2
(a) How many distinct policies are there in the above MDP? (1 pt)
Answer: 23 = 8
(b) U π0 (i) is the expected sum of discounted rewards if we start at state i and follow the policy
π0 . The initial policy π0 is the one that assigns a1 to every state. Write down the numerical values
of the expected discounted rewards of each of the following states in the MDP. (3 pts)
1. U π0 (S1 ) = 100
2. U π0 (S2 ) = -100
3. U π0 (S3 ) = 200
(c) Continuing from part (b), suppose we run policy iteration with π0 as the initial policy. Define
π1 as the updated policy after one iteration of policy iteration, and write down the updated policy
for each of the states (2 pts):
1. π1 (S1 ) = a1
2. π1 (S2 ) = a2
3. π1 (S3 ) = a1
Andrew ID: 15
(d) Suppose we run value iteration, U k (i) is the expected sum of discounted rewards if we start at
state i after (k − 1)-step of value iterations. Starting from the initial value of U 1 (i), which is the
reward at state i, please write down the updated U 2 (i) after one value iteration (3 pts).
1. U 2 (S1 ) = 19
2. U 2 (S2 ) = 8
3. U 2 (S3 ) = 38
Andrew ID: 16
(a) Can the network distinguish the two classes in the cases illustrated below? Why/Why not in
each case? (2 pts)
Case I Case II
Answer:
Yes for case 1 and no for case 2. Without a constant term, the decision boundary must go through
the origin.
(b) Assume that we augment the network with an additional input unit with constant value 1 (we’ll
call the weight on the additional connection W). How does the answer to 1 change and why? (2 pts)
Answer:
With a constant term the decision boundary can go anywhere though it must still be linear. So
both cases I and II can now be discriminated correctly.
a(w0 + w1 x1 + w2 x2 + . . . wk xk ), where
0, if z < −1
a(z) = (z + 1)/2, if − 1 ≤ z ≤ 1 .
1, if z > 1
Pat is very pleased with the computational savings, but there is a serious problem with this idea.
Please explain the problem. If you wish, you may use the diagram of a(z) below to help you explain
the problem with Pat’s idea. . (3 pts)
a(z)
Answer:
The problem is that a(z) is undifferentiable at two places, which will hurt gradient descent. Worse,
it has zero derivative over a large range, and so gradient descent will have no information about
which direction to alter the weights.
Andrew ID: 18
X Y Output
0 0 1
1 0 0
0 1 0
1 1 1
(d) Please draw a simple diagram and use one or two sentences to illustrate that this function
cannot be learned by a single linear perceptron. (3 pts)
Answer:
1 0
0 1
There is no single straight line that discriminate the 1’s from the 0’s
(e) (True/False) Would adding one hidden unit to the model allow to learn the above described
function? (2 pts)
Answer: False
At least two hidden units are needed. One hidden unit would merely provide an extra nonlinearity
on the output.
Andrew ID: 19
(a) Draw the (very simple) diagram corresponding to this MDP. Answer by inspection of the
diagram: What is the optimal policy? (2 pts)
A B
Answer: With the added conditions on the probabilities, the optimal policy is π(A) = L.
(b) Assume that the agent knows neither the world (transition probabilities) nor the utilities of the
states. Assume that the agent, for some reason, happens to follow the optimal policy. The rewards
received at states A and B are the same as described above.. In the process of executing this policy,
the agent execute four trials and, in each trial, it stops after reaching state B. The following state
sequences are recorded during the trials: AAAB, AAB, AB, AB. What is the estimate of T (., ., .)?
What is the estimate of U (A), assuming a discount factor of γ = 0.5? (2 pts)
Answer:
T (A, L, A) = 3/7 and T (A, L, B) = 4/7
Note that T (A, S, A) cannot be computed from the data given in the text and it is not needed since
we assume that we follow the optimal policy.
U (A) = R(A) + γ(T (A, L, A)U (A) + T (A, L, B)U (B))
U (A) = −0.1 + 0.5 × (3/7 × U (A) + 4/7 × 1)
11/14 × U (A) = −0.1 + 4/14
U (A) = 26/110 = 0.2364
(c) Assume now that the agent is executing only one trial yielding the sequence of states AAB.
Compute the estimate of the utility U (A) using TD (temporal differencing). Use discount γ = 0.5,
and learning rate α = 0.5. (2 pts)
Answer:
Andrew ID: 20
Transition A to A:
U new (A) = U old (A) + α(R(A) + γU old (A) − U old (A))
U new (A) = −0.1 + 0.5 × (−0.1 + 0.5 × −0.1 − (−0.1)) = −0.125
Transition A to B:
U new (A) = U old (A) + α(R(A) + γU (B) − U old (A))
U new (A) = −0.125 + 0.5 × (−0.1 + 0.5 × 1 − (−0.125)) = 0.1375
Note that the question did not specify the starting values for U . Alternative solutions (e.g., with
U = 0) were also accepted as long as the formulas were correct.
Part II.
We are using Q-learning to learn a policy in an MDP with two states S1 and S2 and two actions a
and b. Assume that γ = 0.8 and α = 0.2, and that the current values of Q are:
Suppose that, when we were in state S1 , we took action b, received reward 1.0 and moved to state
S2 . Which item of the Q table will change and what is the new value? (2 pts)
Answer:
Q(S1 , b) is the affected entry.
Qnew (S1 , b) = Qold (S1 , b) + α(R(S1 ) + γM axaction Q(S2 , action) − Qold (S1 , b))
Qnew (S1 , b) = 2 + 0.2 × (1 + 0.8 × 4 − 2) = 2.44
Note: A common mistake is to forget the “Max” and to use 0.8×2 instead of the correct expression.