0% found this document useful (0 votes)
159 views

Practice Final CS61c

The document discusses the benefits of exercise for mental health. Regular physical activity can help reduce anxiety and depression and improve mood and cognitive function. Exercise causes chemical changes in the brain that may help protect against mental illness and improve symptoms.

Uploaded by

Edward Yixin Guo
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
159 views

Practice Final CS61c

The document discusses the benefits of exercise for mental health. Regular physical activity can help reduce anxiety and depression and improve mood and cognitive function. Exercise causes chemical changes in the brain that may help protect against mental illness and improve symptoms.

Uploaded by

Edward Yixin Guo
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

CS 188

Spring 2014
Introduction to
Articial Intelligence Practice Final
To earn the extra credit, one of the following has to hold true. Please circle and sign.
A I spent 3 or more hours on the practice nal.
B I spent fewer than 3 hours on the practice nal, but I believe I have solved all the questions.
Signature:
Follow the directions on the website to submit the practice nal and receive the extra credit. The normal instructions
for the nal follow on the next page.
1
Exam Instructions:
You have approximately 2 hours and 50 minutes.
The exam is closed book, closed notes except your one-page crib sheet.
Please use non-programmable calculators only.
Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a
brief explanation. All short answer sections can be successfully answered in a few sentences AT MOST.
First name
Last name
SID
edX username
First and last name of student to your left
First and last name of student to your right
For sta use only:
Q1. Search: Short Questions /4
Q2. MDPs: Short Questions /8
Q3. Other Short Questions /14
Q4. Bayes Nets: Conditional Independence /8
Q5. Elimination Sequence /6
Q6. Hidden Markov Models /13
Q7. Variable Elimination Ordering /22
Q8. Bidirectional A* Search /14
Q9. Classication and Separating Hyperplanes /11
Total /100
2
THIS PAGE IS INTENTIONALLY LEFT BLANK
Q1. [4 pts] Search: Short Questions
(a)
!
#
$
%
&
'
1
2
2
2
1
1
5
5
Answer the following questions about the search problem shown above. S is the start-state, G is the (only)
goal-state. Break any ties alphabetically. For the questions that ask for a path, please give your answers in the
form S AD G.
(i) [1 pt] What path would breadth-rst graph search return for this search problem?
(ii) [1 pt] What path would uniform cost graph search return for this search problem?
(iii) [1 pt] What path would depth-rst graph search return for this search problem?
(iv) [1 pt] What path would A* graph search, using a consistent heuristic, return for this search problem?
4
Q2. [8 pts] MDPs: Short Questions
(a) Each True/False question is worth 2 points. Leaving a question blank is worth 0 points. Answering incor-
rectly is worth 2 points.
For the questions that are not True/False, justify your answer concisely.
(i) [2 pts] [true or false] If the only dierence between two MDPs is the value of the discount factor then
they must have the same optimal policy.
(ii) [2 pts] [true or false] When using features to represent the Q-function it is guaranteed that this feature-
based Q-learning nds the same Q-function, Q

, as would be found when using a tabular representation


for the Q-function.
(iii) [2 pts] [true or false] For an innite horizon MDP with a nite number of states and actions and with a
discount factor , with 0 < < 1, value iteration is guaranteed to converge.
(iv) [2 pts] [true or false] When getting to act only for a nite number of steps in an MDP, the optimal policy
is stationary. (A stationary policy is a policy that takes the same action in a given state, independent of
at what time the agent is in that state.)
5
Q3. [14 pts] Other Short Questions
(a) CSPs
(i) [4 pts] CSP Formulation.
PacStudent (S), PacBaby (B), PacMom (M), PacDad (D), GrandPac (P), and a friendly Ghost (G) are
lining up next to each other. The positions are numbered 1, 2, 3, 4, 5, 6, where 1 neighbors 2, 2 neighbors
1 and 3, 3 neighbors 2 and 4, 4 neighbors 3 and 5, 5 neighbors 4 and 6, and 6 neighbors 5. Each one of
them takes up exactly one spot. PacBaby (B) needs to be next to PacMom (M) one one side and PacDad
(D) on the other side. GrandPac (P) needs to be next to the Ghost (G). PacStudent (S) needs to be at
1 or 2. Formulate this problem as a CSP: list the variables, their domains, and the constraints. Encode
unary constraints as a constraint rather than pruning the domain. (No need to solve the problem, just
provide variables, domains and implicit constraints.)
Variables:
Domains:
Constraints:
(ii) [2 pts] Consider a CSP with variables X, Y with domains {1, 2, 3, 4, 5, 6} for X and {2, 4, 6} for Y , and
constraints X < Y and X + Y > 8. List the values that will remain in the domain of X after enforcing
arc consistency for the arc X Y (recall arc consistency for a specic arc only prunes the domain of the
tail variable, in this case X).
(b) [2 pts] Pruning: Short Questions
Consider the game tree shown below. For what range of U will the indicated pruning take place?
5 U 2
x
?
6
(c) [3 pts] Bayes Nets: Representation
Consider the joint distribution P(A, B, C, D) dened by the Bayes net below.
!
#
$ %
# $ &'$(#)
*+ *, -./
*+ 0, -.1
0+ *, -.2
0+ 0, -.2
# % &'%(#)
*+ *3 -.4
*+ 03 -.5
0+ *3 -.6
0+ 03 -.7
! # &'#(!)
*8 *+ -./
*8 0+ -.1
08 *+ -.2
08 0+ -.2
! &'!)
*8 -./
08 -.1
Compute the following quantities:
P(A = +a) =
P(A = +a, B = b, C = c, D = +d) =
P(A = +a|B = b, C = c, D = +d) =
(d) [3 pts] Naive Bayes
Describe the naive Bayes bag-of-words model for document classication. Draw the Bayes net graph, annotate
the class label node and the feature nodes, describe what the domain of the features is, describe any properties
of the conditional probability tables that are specic to the bag-of-words (and not necessarily true in all naive
Bayes models). For simplicity it is OK to assume that every document in consideration has exactly N words
and that words come from a dictionary of D words.
7
Q4. [8 pts] Bayes Nets: Conditional Independence
(a) [8 pts]
Based only on the structure of the (new) Bayes Net given below, circle whether the following conditional
independence assertions are guaranteed to be true, guaranteed to be false, or cannot be determined by the
structure alone.Note: The ordering of the three answer columns might have been switched relative to previous
exams!
1 A C Guaranteed false Cannot be determined Guaranteed true
2 A C | E Guaranteed false Cannot be determined Guaranteed true
3 A C | G Guaranteed false Cannot be determined Guaranteed true
4 A K Guaranteed false Cannot be determined Guaranteed true
5 A G | D, E, F Guaranteed false Cannot be determined Guaranteed true
6 A B | D, E, F Guaranteed false Cannot be determined Guaranteed true
7 A C | D, F, K Guaranteed false Cannot be determined Guaranteed true
8 A G | D Guaranteed false Cannot be determined Guaranteed true
8
Q5. [6 pts] Elimination Sequence
(a) For the Bayes net shown below, consider the query P(A|H = +h), and the variable elimination ordering
B, E, C, F, D.
(i) [4 pts] In the table below ll in the factor generated at each step we did the rst row for you.
! # $
% & '
(
Variable Factor Current
Eliminated Generated Factors
(no variable eliminated yet) (no factor generated) P(A), P(B), P(C), P(D|A), P(E|B), P(F|C), P(+h|D, E, F)
B f
1
(E) P(A), P(C), P(D|A), P(F|C), P(+h|D, E, F), f
1
(E)
E
C
F
D
(ii) [2 pts] Which is the largest factor generated? Assuming all variables have binary-valued domains, how
many entries does the corresponding table have?
9
Q6. [13 pts] Hidden Markov Models
Consider the following Hidden Markov Model.
X
1
O
1
X
2
O
2
X
1
Pr(X
1
)
0 0.3
1 0.7
X
t
X
t+1
Pr(X
t+1
|X
t
)
0 0 0.4
0 1 0.6
1 0 0.8
1 1 0.2
X
t
O
t
Pr(O
t
|X
t
)
0 A 0.9
0 B 0.1
1 A 0.5
1 B 0.5
Suppose that O
1
= A and O
2
= B is observed.
(a) [2 pts] Use the Forward algorithm to compute the probability distribution Pr(X
2
, O
1
= A, O
2
= B). Show your
work. You do not need to evaluate arithmetic expressions involving only numbers.
(b) [2 pts] Compute the probability Pr(X
1
= 1|O
1
= A, O
2
= B). Show your work.
10
For the next two questions, use the specied sequence of random numbers {a
i
} generated independently and uniformly
at random from [0, 1) to perform sampling. Specically, to obtain a sample from a distribution over a variable
Y {0, 1} using the random number a
i
, pick Y = 0 if a
i
< Pr(Y = 0), and pick Y = 1 if a
i
Pr(Y = 0). Similarly,
to obtain a sample from a distribution over a variable Z {A, B} using the random number a
i
, pick Z = A if
a
i
< Pr(Z = A), and pick Z = B if a
i
Pr(Z = A). Use the random numbers {a
i
} in order starting from a
1
, using
a new random number each time a sample needs to be obtained.
(c) [3 pts] Use likelihood-weighted sampling to obtain 2 samples from the distribution Pr(X
1
, X
2
|O
1
= A, O
2
= B),
and then use these samples to estimate E[

X
1
+ 3X
2
|O
1
= A, O
2
= B].
a
1
a
2
a
3
a
4
a
5
a
6
a
7
a
8
a
9
a
10
0.134 0.847 0.764 0.255 0.495 0.449 0.652 0.789 0.094 0.028
(d) [2 pts] [ true false ] In the case that there is no evidence, particle ltering using a single particle is equivalent
to rejection sampling. Explain your answer.
(e) [2 pts] [ true false ] Performing particle ltering twice, each time with 50 particles, is equivalent to performing
particle ltering once with 100 particles. Explain your answer.
(f ) [2 pts] [ true false ] Variable elimination is generally more accurate than the Forward algorithm. Explain
your answer.
11
Q7. [22 pts] Variable Elimination Ordering
Assume all random variables are binary valued.
(a) [6 pts] The Ordering Matters. Consider the sequence of graphs below. For each, regardless of the elimination
ordering, the largest factor produced in nding p(X) will have a table with 2
2
entries.
X X X
, , ... ,
Now draw a sequence of graphs such that, if you used the best elimination ordering for each graph, the largest
factor table produced in variable elimination would have a constant number of entries, but if you used the worst
elimination ordering for each graph, the number of entries in the largest factor table would grow exponentially
as you move down the sequence. Provide (i) the sequence of graphs, (ii) the sequence of queries for which
variable elimination is done, (iii) the best ordering, (iv) the worst ordering.
(b) Search Space Formulation for Finding an Ordering. Having established that ordering matters, lets
investigate search methods that can nd a good elimination ordering. The idea is to step through the process
of variable elimination for various orderings of elimination of the hidden variablesand while doing so, only
keep track of (i) which factors are present in each step, and (ii) for each factor which variables participate; but
not actually compute and store the tables corresponding to each factor. (It is the join and the summation that
are the expensive steps in variable eliminationcomputing which variables would participate in the new factor
formed after the join and summation is relatively cheap.) We will use the following search-space formulation.
We assume the hidden variables are called H
1
, H
2
, . . . , H
n
, and that all variables are binary.
set of states S: a state s consists of the current set of factors, including the variables participating in each
factor but not the corresponding tables, and any subset of {H
1
, H
2
, . . . , H
n
} to track which variables yet
have to be eliminated.
successor function: choose any of the not yet eliminated variables, and update factors and list of not yet
eliminated variables to account for the new elimination.
cost function: the number of entries in the table representation of the new factor that is generated from
the elimination of the current variable
goal test: test whether the set of not yet eliminated hidden variables is empty.
start state: set of conditional probability tables and set of all hidden variables.
12
(i) [4 pts] Complete Search Tree. Consider the query P(D| + e, +c). Draw the complete search tree for
this problem. Annotate nodes with states, and annotate costs and actions on the edges. Hint: the start
state is ({P(A), P(B|A), P(+c|B), P(D|B), P(+e| +c, D)}, {A, B}).
!
#
$
%
&
(ii) [4 pts] Solving the Search for the Example Problem.
(a) Clearly mark all optimal plans in the search tree above.
(b) What is the cost of an optimal plan to a goal state?
13
(iii) [8 pts] Questions about this Search Formulation in General.
For each of the following heuristics state whether they are admissible or not. Justify your answer. (No
credit if there is no justication.) Notation: H is the set of hidden variables not yet eliminated. Q is the
set of query variables. #H is the number of hidden variables not yet eliminated. #Q is the number of
query variables. Again we assume that all variables are binary-valued.
(a) h
1
: max
HiH
{ size of factor generated when eliminating H
i
next }
Admissible Not Admissible
(b) h
2
: min
HiH
{ size of factor generated when eliminating H
i
next }
Admissible Not Admissible
(c) h
3
: 2
#H1
Admissible Not Admissible
(d) h
4
: if the current largest factor is of size 2
k
and k > #Q, then 2
k1
+ 2
k2
+. . . 2
#Q
; otherwise, 0.
Admissible Not Admissible
14
Q8. [14 pts] Bidirectional A* Search
If a search problem has only a single goal state, it is common to perform bidirectional search. In bidirectional search
you build two search trees at the same time: the forward search tree is the one we have always worked with
in CS188, the backward search tree is one that starts from the goal state, and calls a predecessor (rather than
successor) function to work its way back to the start state. Both searches use the same cost function for transitioning
between two states. There will now also be a backward heuristic, which for each state estimates the distance to the
start state. Bidirectional search can result in signicant computational advantages: the size of the search tree built
grows exponentially with the depth of the search tree. If growing a tree from start and goal to each other, these two
trees could meet in the middle, and one ends up with a computational complexity of just twice searching a tree of
half the depth, which are very signicant savings.
Recall the pseudo-code for a standard A* graph search
function Graph-Search(problem)
forward-closed <-- empty set
forward-priority-queue <-- Insert(Make-Node(Start-State(problem)), forward-priority-queue)
LOOP DO
IF forward-priority-queue is empty THEN return failure
IF forward-priority-queue is not empty THEN
node <-- pop(forward-priority-queue)
IF (State(node) == Goal-State(problem) ) THEN return node
IF State(node) is not in forward-closed THEN
add State(node) to forward-closed
forward-priority-queue <-- Insert-All(ExpandForward(node, problem), forward-priority-queue)
END // LOOP
15
Now consider the following tentative pseudo-code for bidirectional A* search. We assume a consistent forward
heuristic, and a consistent backward heuristic. Concatenation is a function that builds a plan that goes from start
state to goal state by combining a forward partial plan and a backward partial plan that end in the same state.
function Bidirectional-Graph-Search(problem)
forward-closed <-- empty set
backward-closed <-- empty set
forward-priority-queue <-- Insert(Make-Node(Start-State(problem)), forward-priority-queue)
backward-priority-queue <-- Insert(Make-Node(Goal-State(problem)), backward-priority-queue)
LOOP DO
IF forward-priority-queue is empty AND backward-priority-queue is empty THEN return failure
1 IF there exist a node n1 in forward-priority-queue and a node n2 in backward priority queue ...
1 such that State(n1) == State(n2) THEN
1 return Concatenation of n1 and n2
IF forward-priority-queue is not empty THEN
node <-- pop(forward-priority-queue)
IF ( State(node) == Goal-State(problem) ) THEN return node
2 IF ( State(node) is in backward-priority-queue ) THEN
2 return Concatenation of node and matching node in backward-priority-queue
3 IF ( State(node) is in backward-closed ) THEN
3 return Concatenation of node and matching node in backward-closed
IF State(node) is not in forward-closed THEN
add State(node) to forward-closed
forward-priority-queue <-- Insert-All(ExpandForward(node, problem), forward-priority-queue)
IF backward-priority-queue is not empty THEN
node <-- pop(backward-priority-queue)
IF ( State(node) == Start-State(problem) ) THEN return node
4 IF ( State(node) is in forward-priority-queue ) THEN
4 return Concatenation of node and matching node in forward-priority-queue
5 IF ( State(node) is in forward-closed ) THEN
5 return Concatenation of node and matching node in forward-closed
IF State(node) is not in backward-closed THEN
add State(node) to backward-closed
backward-priority-queue <-- Insert-All(ExpandBackward(node, problem), backward-priority-queue)
END // LOOP
16
(a) The IF statements labeled 1, 2, 3, 4, 5 are modications to try to connect both search trees.
(i) [2 pts] If cutting out all lines of code labeled 1, 2, 3, 4, or 5, will Bidirectional-Graph-Search return an
optimal solution? Briey justify your answer.
(ii) [2 pts] If amongst the numbered lines of code we only retain 1, is Bidirectional-Graph-Search guaranteed
to be optimal? Briey justify your answer.
(iii) [2 pts] If amongst the numbered lines of code we only retain 2, is Bidirectional-Graph-Search guaranteed
to be optimal? Briey justify your answer.
(iv) [2 pts] If amongst the numbered lines of code we only retain 3, is Bidirectional-Graph-Search guaranteed
to be optimal? Briey justify your answer.
(v) [2 pts] If amongst the numbered lines of code we only retain 4, is Bidirectional-Graph-Search guaranteed
to be optimal? Briey justify your answer.
(vi) [2 pts] If amongst the numbered lines of code we only retain 5, is Bidirectional-Graph-Search guaranteed
to be optimal? Briey justify your answer.
(vii) [2 pts] Which numbered code section(s) should be retained to maximally benet from the bidirectional
search and at the same time retain optimality guarantees?
17
Q9. [11 pts] Classication and Separating Hyperplanes
For this rst part, we will be deciding what makes a good feature-mapping for dierent datasets, as well as nding
feature weights that make the data separable.
1 0 1
1
0
1
x
1
x
2


1 0 1
1
0
1
x
1
x
2
(a) (b)
Figure 1: Sets of points separated into positive examples (xs) and negative examples (os). In plot (a), the dotted
line is given by f(x
1
) = x
3
1
x
1
.
We begin with a series of true/false questions on what kernels can separate the datasets given. We always assume a
point x is represented without a bias term, so that x =

x
1
x
2

. We will consider the following four kernels:


A. The linear kernel K
lin
(x, z) = x

z = x z.
B. The shifted linear kernel K
bias
(x, z) = 1 +x

z = 1 +x z.
C. The quadratic kernel K
quad
(x, z) = (1 +x

z)
2
= (1 +x z)
2
.
D. The cubic kernel K
cub
(x, z) = (1 +x

z)
3
= (1 +x z)
3
.
(a) (i) [1 pt] [ true false ] The kernel K
lin
can separate the dataset in Fig. 1(b).
(ii) [1 pt] [ true false ] The kernel K
bias
can separate the dataset in Fig. 1(b).
(iii) [1 pt] [ true false ] The kernel K
cub
can separate the dataset in Fig. 1(b).
(iv) [1 pt] [ true false ] The kernel K
lin
can separate the datset in Fig. 1(a).
(v) [1 pt] [ true false ] The kernel K
quad
can separate the datset in Fig. 1(a).
(vi) [1 pt] [ true false ] The kernel K
cub
can separate the datset in Fig. 1(a).
(b) [2 pts] Now imagine that instead of simply using x R
2
as input to our learning algorithm, we use a feature
mapping : x (x) R
k
, where k 2, so that we can learn more powerful classiers. Specically, suppose
that we use the feature mapping
(x) =

1 x
1
x
2
x
2
1
x
2
2
x
3
1
x
3
2

(1)
so that (x) R
7
. Give a weight vector w that separates the x points from the o points in Fig. 1(a), that is,
w

(x) = w (x) should be > 0 for x points and < 0 for o points.
18
(c) [1 pt] Using the feature mapping (1), give a weight vector w that separates the x points from the o points in
Fig. 1(b), assuming that the line given by f(x
1
) = ax
1
+b lies completely between the two sets of points.
Now its time to test your understanding of training error, test error, and the number of samples required to learn a
classier. Imagine you are learning a linear classier of the form sign(w

(x)), as in the binary Perceptron or SVM,


and you are trying to decide how many features to use in your feature mapping (x).
0 5 10 15 20 25 30 35 40
Number of features
N
u
m
b
e
r

o
f

s
a
m
p
l
e
s

r
e
q
u
i
r
e
d
0 5 10 15 20 25 30 35 40
Number of features
N
u
m
b
e
r

o
f

s
a
m
p
l
e
s

r
e
q
u
i
r
e
d
0 5 10 15 20 25 30 35 40
Number of features
N
u
m
b
e
r

o
f

s
a
m
p
l
e
s

r
e
q
u
i
r
e
d
(a) (b) (c)
Figure 2: Number of samples required to learn a linear classier sign(w

(x)) as a function of the number of features


used in the feature mapping (x).
(d) [1 pt] Which of the plots (a), (b), and (c) in Fig. 2 is most likely to reect the number of samples necessary to
learn a classier with good generalization properties as a function of the number of features used in (x)?
5 10 15 20 25 30 35 40
0
0.1
0.2
0.3
0.4
0.5
Number of features
T
r
a
i
n
i
n
g

e
r
r
o
r

r
a
t
e
5 10 15 20 25 30 35 40
0
0.1
0.2
0.3
0.4
0.5
Number of features
T
e
s
t
i
n
g

e
r
r
o
r

r
a
t
e
5 10 15 20 25 30 35 40
0
0.1
0.2
0.3
0.4
0.5
Number of features
T
e
s
t
i
n
g

e
r
r
o
r

r
a
t
e
5 10 15 20 25 30 35 40
0
0.1
0.2
0.3
0.4
0.5
Number of features
T
e
s
t
i
n
g

e
r
r
o
r

r
a
t
e
(a) (b) (c)
Figure 3: Leftmost plot: training error of your classier as a function of the number of features.
(e) [1 pt] You notice in training your classier that the training error rate you achieve, as a function of the number
of features, looks like the left-most plot in Fig. 3. Which of the plots (a), (b), or (c) in Fig. 3 is most likely to
reect the error rate of your classier on a held-out validation set (as a function of the number of features)?
19

You might also like