Practice Final CS61c
Practice Final CS61c
Spring 2014
Introduction to
Articial Intelligence Practice Final
To earn the extra credit, one of the following has to hold true. Please circle and sign.
A I spent 3 or more hours on the practice nal.
B I spent fewer than 3 hours on the practice nal, but I believe I have solved all the questions.
Signature:
Follow the directions on the website to submit the practice nal and receive the extra credit. The normal instructions
for the nal follow on the next page.
1
Exam Instructions:
You have approximately 2 hours and 50 minutes.
The exam is closed book, closed notes except your one-page crib sheet.
Please use non-programmable calculators only.
Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a
brief explanation. All short answer sections can be successfully answered in a few sentences AT MOST.
First name
Last name
SID
edX username
First and last name of student to your left
First and last name of student to your right
For sta use only:
Q1. Search: Short Questions /4
Q2. MDPs: Short Questions /8
Q3. Other Short Questions /14
Q4. Bayes Nets: Conditional Independence /8
Q5. Elimination Sequence /6
Q6. Hidden Markov Models /13
Q7. Variable Elimination Ordering /22
Q8. Bidirectional A* Search /14
Q9. Classication and Separating Hyperplanes /11
Total /100
2
THIS PAGE IS INTENTIONALLY LEFT BLANK
Q1. [4 pts] Search: Short Questions
(a)
!
#
$
%
&
'
1
2
2
2
1
1
5
5
Answer the following questions about the search problem shown above. S is the start-state, G is the (only)
goal-state. Break any ties alphabetically. For the questions that ask for a path, please give your answers in the
form S AD G.
(i) [1 pt] What path would breadth-rst graph search return for this search problem?
(ii) [1 pt] What path would uniform cost graph search return for this search problem?
(iii) [1 pt] What path would depth-rst graph search return for this search problem?
(iv) [1 pt] What path would A* graph search, using a consistent heuristic, return for this search problem?
4
Q2. [8 pts] MDPs: Short Questions
(a) Each True/False question is worth 2 points. Leaving a question blank is worth 0 points. Answering incor-
rectly is worth 2 points.
For the questions that are not True/False, justify your answer concisely.
(i) [2 pts] [true or false] If the only dierence between two MDPs is the value of the discount factor then
they must have the same optimal policy.
(ii) [2 pts] [true or false] When using features to represent the Q-function it is guaranteed that this feature-
based Q-learning nds the same Q-function, Q
X
1
+ 3X
2
|O
1
= A, O
2
= B].
a
1
a
2
a
3
a
4
a
5
a
6
a
7
a
8
a
9
a
10
0.134 0.847 0.764 0.255 0.495 0.449 0.652 0.789 0.094 0.028
(d) [2 pts] [ true false ] In the case that there is no evidence, particle ltering using a single particle is equivalent
to rejection sampling. Explain your answer.
(e) [2 pts] [ true false ] Performing particle ltering twice, each time with 50 particles, is equivalent to performing
particle ltering once with 100 particles. Explain your answer.
(f ) [2 pts] [ true false ] Variable elimination is generally more accurate than the Forward algorithm. Explain
your answer.
11
Q7. [22 pts] Variable Elimination Ordering
Assume all random variables are binary valued.
(a) [6 pts] The Ordering Matters. Consider the sequence of graphs below. For each, regardless of the elimination
ordering, the largest factor produced in nding p(X) will have a table with 2
2
entries.
X X X
, , ... ,
Now draw a sequence of graphs such that, if you used the best elimination ordering for each graph, the largest
factor table produced in variable elimination would have a constant number of entries, but if you used the worst
elimination ordering for each graph, the number of entries in the largest factor table would grow exponentially
as you move down the sequence. Provide (i) the sequence of graphs, (ii) the sequence of queries for which
variable elimination is done, (iii) the best ordering, (iv) the worst ordering.
(b) Search Space Formulation for Finding an Ordering. Having established that ordering matters, lets
investigate search methods that can nd a good elimination ordering. The idea is to step through the process
of variable elimination for various orderings of elimination of the hidden variablesand while doing so, only
keep track of (i) which factors are present in each step, and (ii) for each factor which variables participate; but
not actually compute and store the tables corresponding to each factor. (It is the join and the summation that
are the expensive steps in variable eliminationcomputing which variables would participate in the new factor
formed after the join and summation is relatively cheap.) We will use the following search-space formulation.
We assume the hidden variables are called H
1
, H
2
, . . . , H
n
, and that all variables are binary.
set of states S: a state s consists of the current set of factors, including the variables participating in each
factor but not the corresponding tables, and any subset of {H
1
, H
2
, . . . , H
n
} to track which variables yet
have to be eliminated.
successor function: choose any of the not yet eliminated variables, and update factors and list of not yet
eliminated variables to account for the new elimination.
cost function: the number of entries in the table representation of the new factor that is generated from
the elimination of the current variable
goal test: test whether the set of not yet eliminated hidden variables is empty.
start state: set of conditional probability tables and set of all hidden variables.
12
(i) [4 pts] Complete Search Tree. Consider the query P(D| + e, +c). Draw the complete search tree for
this problem. Annotate nodes with states, and annotate costs and actions on the edges. Hint: the start
state is ({P(A), P(B|A), P(+c|B), P(D|B), P(+e| +c, D)}, {A, B}).
!
#
$
%
&
(ii) [4 pts] Solving the Search for the Example Problem.
(a) Clearly mark all optimal plans in the search tree above.
(b) What is the cost of an optimal plan to a goal state?
13
(iii) [8 pts] Questions about this Search Formulation in General.
For each of the following heuristics state whether they are admissible or not. Justify your answer. (No
credit if there is no justication.) Notation: H is the set of hidden variables not yet eliminated. Q is the
set of query variables. #H is the number of hidden variables not yet eliminated. #Q is the number of
query variables. Again we assume that all variables are binary-valued.
(a) h
1
: max
HiH
{ size of factor generated when eliminating H
i
next }
Admissible Not Admissible
(b) h
2
: min
HiH
{ size of factor generated when eliminating H
i
next }
Admissible Not Admissible
(c) h
3
: 2
#H1
Admissible Not Admissible
(d) h
4
: if the current largest factor is of size 2
k
and k > #Q, then 2
k1
+ 2
k2
+. . . 2
#Q
; otherwise, 0.
Admissible Not Admissible
14
Q8. [14 pts] Bidirectional A* Search
If a search problem has only a single goal state, it is common to perform bidirectional search. In bidirectional search
you build two search trees at the same time: the forward search tree is the one we have always worked with
in CS188, the backward search tree is one that starts from the goal state, and calls a predecessor (rather than
successor) function to work its way back to the start state. Both searches use the same cost function for transitioning
between two states. There will now also be a backward heuristic, which for each state estimates the distance to the
start state. Bidirectional search can result in signicant computational advantages: the size of the search tree built
grows exponentially with the depth of the search tree. If growing a tree from start and goal to each other, these two
trees could meet in the middle, and one ends up with a computational complexity of just twice searching a tree of
half the depth, which are very signicant savings.
Recall the pseudo-code for a standard A* graph search
function Graph-Search(problem)
forward-closed <-- empty set
forward-priority-queue <-- Insert(Make-Node(Start-State(problem)), forward-priority-queue)
LOOP DO
IF forward-priority-queue is empty THEN return failure
IF forward-priority-queue is not empty THEN
node <-- pop(forward-priority-queue)
IF (State(node) == Goal-State(problem) ) THEN return node
IF State(node) is not in forward-closed THEN
add State(node) to forward-closed
forward-priority-queue <-- Insert-All(ExpandForward(node, problem), forward-priority-queue)
END // LOOP
15
Now consider the following tentative pseudo-code for bidirectional A* search. We assume a consistent forward
heuristic, and a consistent backward heuristic. Concatenation is a function that builds a plan that goes from start
state to goal state by combining a forward partial plan and a backward partial plan that end in the same state.
function Bidirectional-Graph-Search(problem)
forward-closed <-- empty set
backward-closed <-- empty set
forward-priority-queue <-- Insert(Make-Node(Start-State(problem)), forward-priority-queue)
backward-priority-queue <-- Insert(Make-Node(Goal-State(problem)), backward-priority-queue)
LOOP DO
IF forward-priority-queue is empty AND backward-priority-queue is empty THEN return failure
1 IF there exist a node n1 in forward-priority-queue and a node n2 in backward priority queue ...
1 such that State(n1) == State(n2) THEN
1 return Concatenation of n1 and n2
IF forward-priority-queue is not empty THEN
node <-- pop(forward-priority-queue)
IF ( State(node) == Goal-State(problem) ) THEN return node
2 IF ( State(node) is in backward-priority-queue ) THEN
2 return Concatenation of node and matching node in backward-priority-queue
3 IF ( State(node) is in backward-closed ) THEN
3 return Concatenation of node and matching node in backward-closed
IF State(node) is not in forward-closed THEN
add State(node) to forward-closed
forward-priority-queue <-- Insert-All(ExpandForward(node, problem), forward-priority-queue)
IF backward-priority-queue is not empty THEN
node <-- pop(backward-priority-queue)
IF ( State(node) == Start-State(problem) ) THEN return node
4 IF ( State(node) is in forward-priority-queue ) THEN
4 return Concatenation of node and matching node in forward-priority-queue
5 IF ( State(node) is in forward-closed ) THEN
5 return Concatenation of node and matching node in forward-closed
IF State(node) is not in backward-closed THEN
add State(node) to backward-closed
backward-priority-queue <-- Insert-All(ExpandBackward(node, problem), backward-priority-queue)
END // LOOP
16
(a) The IF statements labeled 1, 2, 3, 4, 5 are modications to try to connect both search trees.
(i) [2 pts] If cutting out all lines of code labeled 1, 2, 3, 4, or 5, will Bidirectional-Graph-Search return an
optimal solution? Briey justify your answer.
(ii) [2 pts] If amongst the numbered lines of code we only retain 1, is Bidirectional-Graph-Search guaranteed
to be optimal? Briey justify your answer.
(iii) [2 pts] If amongst the numbered lines of code we only retain 2, is Bidirectional-Graph-Search guaranteed
to be optimal? Briey justify your answer.
(iv) [2 pts] If amongst the numbered lines of code we only retain 3, is Bidirectional-Graph-Search guaranteed
to be optimal? Briey justify your answer.
(v) [2 pts] If amongst the numbered lines of code we only retain 4, is Bidirectional-Graph-Search guaranteed
to be optimal? Briey justify your answer.
(vi) [2 pts] If amongst the numbered lines of code we only retain 5, is Bidirectional-Graph-Search guaranteed
to be optimal? Briey justify your answer.
(vii) [2 pts] Which numbered code section(s) should be retained to maximally benet from the bidirectional
search and at the same time retain optimality guarantees?
17
Q9. [11 pts] Classication and Separating Hyperplanes
For this rst part, we will be deciding what makes a good feature-mapping for dierent datasets, as well as nding
feature weights that make the data separable.
1 0 1
1
0
1
x
1
x
2
1 0 1
1
0
1
x
1
x
2
(a) (b)
Figure 1: Sets of points separated into positive examples (xs) and negative examples (os). In plot (a), the dotted
line is given by f(x
1
) = x
3
1
x
1
.
We begin with a series of true/false questions on what kernels can separate the datasets given. We always assume a
point x is represented without a bias term, so that x =
x
1
x
2
z = x z.
B. The shifted linear kernel K
bias
(x, z) = 1 +x
z = 1 +x z.
C. The quadratic kernel K
quad
(x, z) = (1 +x
z)
2
= (1 +x z)
2
.
D. The cubic kernel K
cub
(x, z) = (1 +x
z)
3
= (1 +x z)
3
.
(a) (i) [1 pt] [ true false ] The kernel K
lin
can separate the dataset in Fig. 1(b).
(ii) [1 pt] [ true false ] The kernel K
bias
can separate the dataset in Fig. 1(b).
(iii) [1 pt] [ true false ] The kernel K
cub
can separate the dataset in Fig. 1(b).
(iv) [1 pt] [ true false ] The kernel K
lin
can separate the datset in Fig. 1(a).
(v) [1 pt] [ true false ] The kernel K
quad
can separate the datset in Fig. 1(a).
(vi) [1 pt] [ true false ] The kernel K
cub
can separate the datset in Fig. 1(a).
(b) [2 pts] Now imagine that instead of simply using x R
2
as input to our learning algorithm, we use a feature
mapping : x (x) R
k
, where k 2, so that we can learn more powerful classiers. Specically, suppose
that we use the feature mapping
(x) =
1 x
1
x
2
x
2
1
x
2
2
x
3
1
x
3
2
(1)
so that (x) R
7
. Give a weight vector w that separates the x points from the o points in Fig. 1(a), that is,
w
(x) = w (x) should be > 0 for x points and < 0 for o points.
18
(c) [1 pt] Using the feature mapping (1), give a weight vector w that separates the x points from the o points in
Fig. 1(b), assuming that the line given by f(x
1
) = ax
1
+b lies completely between the two sets of points.
Now its time to test your understanding of training error, test error, and the number of samples required to learn a
classier. Imagine you are learning a linear classier of the form sign(w