0% found this document useful (0 votes)

3 views

class-test-1

The document outlines the first class test for CS 337, focusing on AI and machine learning concepts. It includes questions on the growth of AI, the Perceptron Learning Algorithm, gradient descent, and decision trees, along with their solutions. The test emphasizes the importance of justifications and calculations in the answers.

Uploaded by

Yug Sharma

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

class-test-1

Uploaded by

Yug Sharma

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

CS 337 (Spring 2019): Class Test 1

Instructor: Shivaram Kalyanakrishnan

2.00 p.m. – 3.15 p.m., February 8, 2019, LH 301

Total marks: 20

Note. Provide brief justifications and/or calculations along with each answer to illustrate how you
arrived at the answer.

Question 1. Describe two factors that have contributed to the dramatic growth of AI in the last
5–10 years. Keep your answer brief: ideally 2–3 lines. [2 marks]

Question 2. Answer these questions relating to the Perceptron Learning Algorithm, as discussed
in class. Assume that the input data set comprises two classes which are separable using an origin-
centred hyperplane.

2a. Recall that the Perceptron Learning Algorithm is free to pick an arbitrary point from the
currently misclassified set at each iteration, and update the weight vector based on that point.
Hence, for a given data set, we can end up with different output weight vectors depending
on how we resolve the choice at each iteration. Concretely, let w1 , w2 , . . . , wL be the output
weight vectors produced by L separate runs of the Perceptron Learning Algorithm. As we
have already shown, each of these weight vectors guarantees perfect separation of the training
data points. Now consider the “average” of the output vectors:
w1 + w2 + · · · + wL
wavg = .
L
Is wavg also guaranteed to separate the training points perfectly? Prove that your answer is
correct. [2 marks]
2b. Consider a change to the Perceptron Learning Algorithm, wherein we use a “learning rate”
αk = k1 for k ≥ 1. In other words, the update made by this variant at the k-th iteration is

wk+1 ← wk + αk y j xj ,

with (xj , y j ) being the arbitrarily-chosen misclassified point. By contrast, in class we had
used a constant learning rate of 1.
The use of a harmonically-annealed learning rate is common in algorithms such as gradient
descent. How does it affect the Perceptron Learning Algorithm? Does the algorithm still
converge; if so, does it still yield a separating hyperplane? Prove that your answer is correct.
[5 marks]
Question 3. We consider a 1-dimensional example of gradient descent. For w ∈ R, let

Error(w) = 3w4 − 4w3 − 12w2 + 50.

3a. What is G(w) = ∇w Error(w)? [1 mark]

3b. Consider a procedure that begins with some initial guess w0 ∈ R, and progressively obtains
iterates through gradient descent: for t ≥ 0,
1
wt+1 ← wt − G(wt ).
t+1

Draw a plot with w0 on the x axis and limt→∞ Error(wt ) on the y axis. [4 marks]

3c. What is G2 (w) = ∇w G(w)? Based on your answer to 3b, suggest why this function might be
useful to compute. [2 marks]

Question 4. Consider a data set containing every possible tuple of three binary variables x1 ,
x2 , and x3 . The label associated with each tuple is encoded by the decision tree T1 shown below.
Observe that the variables and the labels both take values in {0, 1}.

x1
0 1
x2 x3
0 1 0 1

1 0 0 1

Figure 1: Decision tree T1 . Internal nodes are shown as circles, and leaves as squares.

Draw a decision tree T2 that assigns each possible (x1 , x2 , x3 )-tuple the same label as assigned
by T1 , but which splits on the variable x2 at its root node. Use the least number of internal nodes
possible to construct T2 , and argue why you cannot reduce this number further. [4 marks]
Solutions
1. The growth of the Internet has made it possible to collect large amounts of data, which it is also
possible now to store cheaply. Computing and memory have also become orders of magnitude more
efficient in the last few years, allowing for algorithms to process stored data. Cameras and other
sensors—also ubiquitous and cheap!—have made different types of data available for processing.
From a technical standpoint, the maturing of machine learning as a field has led to many “off-the-
shelf” solutions and libraries for AI. The resurgence of neural networks as an effective model for
tasks in domains such as vision and speech has also been the reason behind many success stories.

2a. We know from our proof in class that for each i ∈ {1, 2, . . . , n} and l ∈ {1, 2, . . . , L},
y i (wl · xi ) > 0.
It follows that for each i ∈ {1, 2, . . . , n},
y i (w1 · xi ) + y i (w2 · xi ) + · · · + y i (wL · xi )
y i (wavg · xi ) = > 0,
L
implying that wavg also achieves perfect separation of the data.

2b. We follow essentially the same steps as done in our original proof, albeit with the different
learning rate. First, we observe
wk+1 · w⋆ = (wk + αk y j xj ) · w⋆
= wk · w⋆ + αk y j (xj · w⋆ )
≥ wk · w⋆ + αk γ.

It follows by induction that wk+1 · w⋆ ≥ (α1 + α2 + · · · + αk )γ. Since wk+1 · w⋆ ≤ kwk+1 kkw⋆ k =
kwk+1 k, we get
kwk+1 k ≥ (α1 + α2 + · · · + αk )γ. (1)
We can also upper-bound kwk+1 k as follows.
kwk+1 k2 = kwk + αk y j xj k2
= kwk k2 + kαk y j xj k2 + 2(wk · xj )αk y j
= kwk k2 + (αk )2 kxj k2 + 2(wk · xj )αk y j
≤ kwk k2 + (αk )2 kxj k2
≤ kwk k2 + (αk )2 R2 ,
from which it follows by induction that
kwk+1 k2 ≤ ((α1 )2 + (α2 )2 + · · · + (αk )2 )R2 . (2)
For our particular choice of sequence αk = k1 , we observe that
1. α1 + α2 + · · · + αk > ln(k), and
2. there is a constant C such that (α1 )2 + (α2 )2 + · · · + (αk )2 < C.
√
It follows that (γ ln(k))2 ≤ kwk+1 k2 ≤ CR2 , which implies k ≤ exp( C R γ ). Hence, the algo-
rithm can only make a finite number of iterations; by construction, termination implies correctness.
3a. G(w) = ∇w (3w4 − 4w3 − 12w2 + 50) = 12w3 − 12w2 − 24w.

3b. It is easy to see that G(w) factorises as 12w(w − 2)(w + 1), implying that Error(w) has its
local optima (maxima or minima) at −1, 0, and 2. By plotting Error(w), we observe that indeed
−1 and 2 are local minima, while 0 is a local maximum.
It follows that if we perform gradient descent with a “small enough” learning rate,

1. starting with w0 > 0 should eventually converge to w∞ = 2 (and Error(−2) = 34);

2. starting with w0 < 0 should eventually converge to w∞ = −1 (and Error(−2) = 45);

3. starting with w0 = 0 will lead to w∞ = 0 (and Error(0) = 50).

Unfortunately, there is a bug in the question, wherein the learning rate used is not small enough
compared to the gradient.1 Hence, although starting points in the vicinity of the local minima will
converge to these minima (and starting at the local maximum will keep the process there for ever),
starting from other points could take the process through hard-to-characterise sequences, and in
fact even to divergence.

3c. G2 (w) = ∇w (12w3 − 12w2 − 24w) = 36w2 − 24w − 24. This second derivative lets us determine
whether a given local optimum is a local maximum or a local minimum. Observe that G2 (−1) and
G2 (2) are positive, while G2 (0) is negative: implying that Error achieves local minima at −1 and
2, and a local maximum at 0. In the unusual but plausible event that we initialise the procedure
at a local maximum, note that the gradient will be 0, and thus we would have converged. Knowing
the second derivative would inform us whether we are indeed at a local minimum, or whether we
can do better by starting with a small perturbation of the initial point.

4. The labels assigned by T1 to data points are as follows.

x1 x2 x3 y
0 0 0 1
0 0 1 1
0 1 0 0
0 1 1 0
1 0 0 0
1 0 1 1
1 1 0 0
1 1 1 1

We are to replicate the same labelling with T2 , which has x2 at its root. The left subtree of
T2 , containing all points with x2 = 0, cannot be represented with just one node since the labels
of the four points cannot all be predicted correctly based on either x1 or x3 alone. Hence, the left
subtree needs at least two nodes. For the same reason, the right subtree also needs at least two
nodes. It can be seen that all four trees shown below use exactly two nodes in each of the left and
right subtrees, and they classify all the data points exactly as done by T1 . Any one of them can be
provided as the answer.
1
We acknowledge Utkarsh Gupta for pointing out this bug.
Possibility 1 for T2 (all paths of tree shown).
0 0
• x2 −
→ x1 −
→ 1.
0 1 0
• x2 −
→ x1 −
→ x3 −
→ 0.
0 1 1
• x2 −
→ x1 −
→ x3 −
→ 1.
1 0
• x2 −
→ x1 −
→ 0.
1 1 0
• x2 −
→ x1 −
→ x3 −
→ 0.
1 1 1
• x2 −
→ x1 −
→ x3 −
→ 1.
Possibility 2.
0 0
• x2 −
→ x1 −
→ 1.
0 1 0
• x2 −
→ x1 −
→ x3 −
→ 0.
0 1 1
• x2 −
→ x1 −
→ x3 −
→ 1.
1 0
• x2 −
→ x3 −
→ 0.
1 1 0
• x2 −
→ x3 −
→ x1 −
→ 0.
1 1 1
• x2 −
→ x3 −
→ x1 −
→ 1.
Possibility 3.
0 0 0
• x2 −
→ x3 −
→ x1 −
→1
0 1 1
• x2 −
→ x3 −
→ x1 −
→ 0.
0 1
• x2 −
→ x3 −
→ 1.
1 0
• x2 −
→ x1 −
→ 0.
1 1 0
• x2 −
→ x1 −
→ x3 −
→ 0.
1 1 1
• x2 −
→ x1 −
→ x3 −
→ 1.
Possibility 4.
0 0 0
• x2 −
→ x3 −
→ x1 −
→1
0 1 1
• x2 −
→ x3 −
→ x1 −
→ 0.
0 1
• x2 −
→ x3 −
→ 1.
1 0
• x2 −
→ x3 −
→ 0.
1 1 0
• x2 −
→ x3 −
→ x1 −
→ 0.
1 1 1
• x2 −
→ x3 −
→ x1 −
→ 1.

Chapter 6 (CONT') : Application: Powers of Matrices and Their Applications. 1 Powers of Matrices
No ratings yet
Chapter 6 (CONT') : Application: Powers of Matrices and Their Applications. 1 Powers of Matrices
9 pages
A Test For Comparing Diversities Based On The Shannon Formula
No ratings yet
A Test For Comparing Diversities Based On The Shannon Formula
4 pages
Calculadoras
No ratings yet
Calculadoras
6 pages
Nef 2023 Tutorial
No ratings yet
Nef 2023 Tutorial
9 pages
Eigen Solu Beer 3dca9775 - Hw4soln
No ratings yet
Eigen Solu Beer 3dca9775 - Hw4soln
13 pages
Lecturenotes Perceptron
No ratings yet
Lecturenotes Perceptron
7 pages
PRML Test 2
No ratings yet
PRML Test 2
3 pages
COMP90038 2022S1 A1 Solutions
No ratings yet
COMP90038 2022S1 A1 Solutions
4 pages
Homework2 - Tran Anh Vu
No ratings yet
Homework2 - Tran Anh Vu
3 pages
sol4
No ratings yet
sol4
7 pages
WB2235-Exam-June-2023
No ratings yet
WB2235-Exam-June-2023
24 pages
Lecture 2, Part 2: Backpropagation: Roger Grosse
No ratings yet
Lecture 2, Part 2: Backpropagation: Roger Grosse
9 pages
Threshold Logic
No ratings yet
Threshold Logic
8 pages
Lecture6 Notes
No ratings yet
Lecture6 Notes
5 pages
At Univ Prince Edward Island On July 5, 2015 Downloaded From
No ratings yet
At Univ Prince Edward Island On July 5, 2015 Downloaded From
10 pages
Lecture 9: Linear Programming Duality: 1.1 Max Flow
No ratings yet
Lecture 9: Linear Programming Duality: 1.1 Max Flow
6 pages
reviewofproblem
No ratings yet
reviewofproblem
2 pages
223 COE 292 FinalExam Concept
No ratings yet
223 COE 292 FinalExam Concept
17 pages
2. Given Information:: 3. Calculate ε
No ratings yet
2. Given Information:: 3. Calculate ε
9 pages
Topic 5
No ratings yet
Topic 5
32 pages
Learning Rules of ANN
No ratings yet
Learning Rules of ANN
25 pages
Project Report - Lafitte
No ratings yet
Project Report - Lafitte
10 pages
Dis11 Sol
No ratings yet
Dis11 Sol
5 pages
Numerical Methods: Marisa Villano, Tom Fagan, Dave Fairburn, Chris Savino, David Goldberg, Daniel Rave
No ratings yet
Numerical Methods: Marisa Villano, Tom Fagan, Dave Fairburn, Chris Savino, David Goldberg, Daniel Rave
44 pages
OmNarayanSingh CC306 IS Final
No ratings yet
OmNarayanSingh CC306 IS Final
15 pages
(FREE PDF Sample) Optimization Models Instructor S Solution Manual Solutions 1st Edition Giuseppe C. Calafiore Ebooks
100% (10)
(FREE PDF Sample) Optimization Models Instructor S Solution Manual Solutions 1st Edition Giuseppe C. Calafiore Ebooks
84 pages
Cap 15
No ratings yet
Cap 15
13 pages
1 Online Linear Regression: COS 511: Theoretical Machine Learning
No ratings yet
1 Online Linear Regression: COS 511: Theoretical Machine Learning
7 pages
Keller Box Paper
No ratings yet
Keller Box Paper
20 pages
CS1
No ratings yet
CS1
16 pages
Assignment 5: E1 244 - Detection and Estimation Theory (Jan 2023) Due Date: April 02, 2023 Total Marks: 55
No ratings yet
Assignment 5: E1 244 - Detection and Estimation Theory (Jan 2023) Due Date: April 02, 2023 Total Marks: 55
2 pages
Sheet #6 Ensemble + Neural Nets + Linear Regression + Backpropagation + CNN
No ratings yet
Sheet #6 Ensemble + Neural Nets + Linear Regression + Backpropagation + CNN
4 pages
Discret Wavelet Transform
No ratings yet
Discret Wavelet Transform
35 pages
ADALINE For Pattern Classification: Polytechnic University Department of Computer and Information Science
No ratings yet
ADALINE For Pattern Classification: Polytechnic University Department of Computer and Information Science
27 pages
DOC-20240809-WA00
No ratings yet
DOC-20240809-WA00
20 pages
ECON 809: Problem Set 1
No ratings yet
ECON 809: Problem Set 1
18 pages
(Ebook) Optimization Models (Instructor's Solution Manual) (Solutions) by Giuseppe C. Calafiore, Laurent El Ghaoui ISBN 9781107050877, 1107050871 pdf download
100% (1)
(Ebook) Optimization Models (Instructor's Solution Manual) (Solutions) by Giuseppe C. Calafiore, Laurent El Ghaoui ISBN 9781107050877, 1107050871 pdf download
47 pages
WB2235-Exam-June-2024
No ratings yet
WB2235-Exam-June-2024
22 pages
Wavelets and Image Compression: Vlad Balan, Cosmin Condea January 30, 2003
No ratings yet
Wavelets and Image Compression: Vlad Balan, Cosmin Condea January 30, 2003
20 pages
4 Multilayer Perceptrons and Radial Basis Functions
No ratings yet
4 Multilayer Perceptrons and Radial Basis Functions
6 pages
Lecture19 20 21
No ratings yet
Lecture19 20 21
14 pages
CH605 2023 24tutorial3
No ratings yet
CH605 2023 24tutorial3
2 pages
Vector Calc Summary
No ratings yet
Vector Calc Summary
7 pages
Load Flow Studies Notes
No ratings yet
Load Flow Studies Notes
12 pages
11 - Numerical Issues #1: The Complications of Continuity: V (X, T) That Maps From The Continuous Domain of X To
No ratings yet
11 - Numerical Issues #1: The Complications of Continuity: V (X, T) That Maps From The Continuous Domain of X To
24 pages
Cs224n Practice Midterm 1 Sol
No ratings yet
Cs224n Practice Midterm 1 Sol
9 pages
MIT8 20iap21 Pset1
No ratings yet
MIT8 20iap21 Pset1
5 pages
Sample Question BCSE
No ratings yet
Sample Question BCSE
4 pages
Hydraulics Lab Manual
No ratings yet
Hydraulics Lab Manual
45 pages
Refresher: Perceptron Training Algorithm
No ratings yet
Refresher: Perceptron Training Algorithm
12 pages
ATP - 2021 - 2022 - Exam - Resit - Solutions - Final
No ratings yet
ATP - 2021 - 2022 - Exam - Resit - Solutions - Final
12 pages
Problem 1: 2.2.7: With (Detools) : With (Plots)
No ratings yet
Problem 1: 2.2.7: With (Detools) : With (Plots)
14 pages
MIT5 73F18 PSet1
No ratings yet
MIT5 73F18 PSet1
4 pages
2007 02 01b Janecek Perceptron
No ratings yet
2007 02 01b Janecek Perceptron
37 pages
Chapter 2 P1
No ratings yet
Chapter 2 P1
20 pages
Ps 3
No ratings yet
Ps 3
15 pages
CH5530-Simulation Lab: MATLAB Additional Graded Session 01/04/2020
No ratings yet
CH5530-Simulation Lab: MATLAB Additional Graded Session 01/04/2020
3 pages
Dynamic Optimization
No ratings yet
Dynamic Optimization
14 pages
Week 1 (1)
No ratings yet
Week 1 (1)
5 pages
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
A Use Case For Lorden's Inequality - by Tarek Samaali - Mar, 2022 - Towards Data Science
No ratings yet
A Use Case For Lorden's Inequality - by Tarek Samaali - Mar, 2022 - Towards Data Science
10 pages
A Deep Learning Approach For The Estimation of Middleton Class-A Impulsive Noise Parameters
No ratings yet
A Deep Learning Approach For The Estimation of Middleton Class-A Impulsive Noise Parameters
6 pages
Module 4 Lecture 5 Inverse Laplace Transforms
No ratings yet
Module 4 Lecture 5 Inverse Laplace Transforms
5 pages
The Use of Matrix in Production Flow Analysis
No ratings yet
The Use of Matrix in Production Flow Analysis
9 pages
A Powerful Genetic Algorithm For Traveling Salesman Problem
No ratings yet
A Powerful Genetic Algorithm For Traveling Salesman Problem
5 pages
Authorship Identification of Romanian Texts With Controversial Paternity
No ratings yet
Authorship Identification of Romanian Texts With Controversial Paternity
6 pages
JNTU Old Question Papers 2007
No ratings yet
JNTU Old Question Papers 2007
7 pages
Pure Birth Process PDF
No ratings yet
Pure Birth Process PDF
7 pages
Which ML Algo Should I Use SAS
No ratings yet
Which ML Algo Should I Use SAS
20 pages
Peyton-Manning Dataset Fbprophet Daily Australia Temperature Dataset Beijing PM2.5 Dataset
No ratings yet
Peyton-Manning Dataset Fbprophet Daily Australia Temperature Dataset Beijing PM2.5 Dataset
2 pages
Recurrence Relation and Recursion
No ratings yet
Recurrence Relation and Recursion
39 pages
MA2213 Summary
No ratings yet
MA2213 Summary
2 pages
24 Nov v Reg+ Kt c Scheme Extc Gazzete
No ratings yet
24 Nov v Reg+ Kt c Scheme Extc Gazzete
24 pages
PRLpaper Preprint2018
No ratings yet
PRLpaper Preprint2018
9 pages
Rapport PFE balsam bendhif
No ratings yet
Rapport PFE balsam bendhif
73 pages
【工程中的矩阵理论】习题答案整理
No ratings yet
【工程中的矩阵理论】习题答案整理
9 pages
CRC
No ratings yet
CRC
7 pages
Binary Search of Unsorted Array
No ratings yet
Binary Search of Unsorted Array
3 pages
Chapter 5
No ratings yet
Chapter 5
62 pages
Regression Analysis MCQ
No ratings yet
Regression Analysis MCQ
15 pages
NI-Tutorial-6163, Build A Hybrid Control System With NI LabVIEW
No ratings yet
NI-Tutorial-6163, Build A Hybrid Control System With NI LabVIEW
4 pages
Algorithm Design Methods: Greedy Method. Divide and Conquer. Dynamic Programming. Backtracking. Branch and Bound
No ratings yet
Algorithm Design Methods: Greedy Method. Divide and Conquer. Dynamic Programming. Backtracking. Branch and Bound
24 pages
Classification of Parkinsons Disease Based On Acoustic Characteristics Using Fea
No ratings yet
Classification of Parkinsons Disease Based On Acoustic Characteristics Using Fea
6 pages
Data Structure Notes
No ratings yet
Data Structure Notes
5 pages
IEEE A Recurrent Neural Network Approach To Model Failure Rate Considering Random and Deteriorating Failures
No ratings yet
IEEE A Recurrent Neural Network Approach To Model Failure Rate Considering Random and Deteriorating Failures
7 pages
Analysis of Educational
No ratings yet
Analysis of Educational
5 pages
ML Unit-Iv
No ratings yet
ML Unit-Iv
19 pages
The 9 Deep Learning Papers You Need To Know About 3
No ratings yet
The 9 Deep Learning Papers You Need To Know About 3
19 pages
Unit2 PHP MCQ
No ratings yet
Unit2 PHP MCQ
12 pages

class-test-1

Uploaded by

class-test-1

Uploaded by

CS 337 (Spring 2019): Class Test 1

Instructor: Shivaram Kalyanakrishnan

2.00 p.m. – 3.15 p.m., February 8, 2019, LH 301

Error(w) = 3w4 − 4w3 − 12w2 + 50.

3a. What is G(w) = ∇w Error(w)? [1 mark]

1. starting with w0 > 0 should eventually converge to w∞ = 2 (and Error(−2) = 34);

2. starting with w0 < 0 should eventually converge to w∞ = −1 (and Error(−2) = 45);

3. starting with w0 = 0 will lead to w∞ = 0 (and Error(0) = 50).

4. The labels assigned by T1 to data points are as follows.

You might also like