hw1

The document outlines the homework assignment for the course 10-701 Machine Learning, Fall 2012, which includes tasks on decision trees, maximum likelihood estimation, and comparisons between Naive Bayes and Logistic Regression. It consists of multiple problems that require the application of machine learning concepts to predict habitability of planets, estimate parameters from noisy data, and implement classification algorithms. Students are expected to derive equations, prove concepts, implement algorithms, and analyze performance based on varying training data sizes.

Uploaded by

Thuan College

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

hw1

Uploaded by

Thuan College

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

10-701 Machine Learning, Fall 2012: Homework 1

Due 9/26 at the beginning of class.

1 Decision Trees, [25pt, Martin]

1. [10 points] As of September 2012, 800 extrasolar planets have been identified in our galaxy.
Super-secret surveying spaceships sent to all these planets have established whether they are
habitable for humans or not, but sending a spaceship to each planet is expensive. In this
problem, you will come up with decision trees to predict if a planet is habitable based only
on features observable using telescopes.

(a) In Table 1 you are given the data from all 800 planets surveyed so far. The features
observed by telescope are Size (“Big” or “Small”), and Orbit (“Near” or “Far”). Each
row indicates the values of the features and habitability, and how many times that set
of values was observed. So, for example, there were 20 “Big” planets “Near” their star
that were habitable. Derive and draw the decision tree learned by ID3 on this data
(use the maximum information gain criterion for splits, don’t do any pruning). Make
sure to clearly mark at each node what attribute you are splitting on, and which value
corresponds to which branch. By each leaf node of the tree, write in the number of
habitable and inhabitable planets in the training data (i.e. the data in Table 1) that
belong to that node.
(b) For just 9 of the planets, a third feature, Temperature (in Kelvin), has been measured, as
shown in Table 2. Redo all the steps from part (a) on this data using all three features.
For the Temperature feature, in each iteration you must maximize over all possible
binary thresholding splits (such as T ≤ 250 v.s. T > 250, for example). According to
your decision tree, would a planet with the features (Big, Near, 280) be predicted to be
habitable or not habitable?

Table 1: Planet size and orbit vs. habitability.

Size Orbit Habitable Count
Big Near Yes 20
Big Far Yes 170
Small Near Yes 139
Small Far Yes 45
Big Near No 130
Big Far No 30
Small Near No 11
Small Far No 255

1
Table 2: Planet size, orbit, and temperature vs. habitability.
Size Orbit Temperature Habitable
Big Far 205 No
Big Near 205 No
Big Near 260 Yes
Big Near 380 Yes
Small Far 205 No
Small Far 260 Yes
Small Near 260 Yes
Small Near 380 No
Small Near 380 No

2. [15 points] In this problem you’ll see why simple feature-wise (i.e. coordinate-wise) splitting
of the data isn’t always the best approach to classification. Throughout the problem, assume
that each feature can be used for splitting the data multiple times in a decision tree. Suppose
you are given n non-overlapping points in the unit square [0, 1] × [0, 1], each labeled either +
or −.
(a) Prove that there exists a decision tree of depth at most log2 n that correctly labels all
n points. At each node the decision tree should only perform a binary threshold split
on a single coordinate. (Note that a binary tree of depth log2 n can have as many as
2log2 n = n internal nodes, i.e. splits.)
(b) Describe (either mathematically, or in a few concise sentences) a set of n points in
[0, 1] × [0, 1], along with corresponding + or − labels, so that the smallest decision tree
that correctly labels them all has at least n − 1 splits. (Hint: if you can do it with n = 3,
you can do it with arbitrary n.)
(c) Describe n points and corresponding labels that, as in part (b), can only be correctly
labeled by a tree with at least n − 1 splits, with the additional condition that the points
labeled + and the points labeled − must be separable by a straight line. In other words,
there must exist a line segment splitting the unit square in two (not necessarily parallel
to either axis), so that all points labeled + are in one part, and all points labeled − are
in the other. (You will soon see classifiers that would have had a much easier time with
this type of data.)

2 Maximum Likelihood Estimation, [25pt, Avi]

Figure 2 shows a system S which takes two inputs x1 , x2 (which are deterministic ) and outputs a
linear combination of those two inputs, c1 x1 +c2 x2 , introduces an additive error which is a random
variable following some distribution. Thus the output y that you observe is given by equation 1.
Assume that you have n > 2 instances < xj1 , xj2 , yj >j=1,...,n or equivalently < xj , yj >, where
xj = [xj1 , xj2 ].

y = c1 x1 + c2 x2 + (1)

In other words having n equations in your hand is equivalent to having n equations of the
following form: yj = c1 xj1 + c2 xj2 + j , j = 1 . . . n. The goal is to estimate c1 , c2 from those

2
Figure 1: Exercise 2

measurements by maximizing conditional log-likelihood given the input, under different assumptions
for the noise. Specifically:

1. [10 points] Assume that the i for i = 1 . . . n are iid Gaussian random variables with zero
mean and variance σ 2 .

(a) Find the conditional distribution of each yi given the inputs

(b) Compute the loglikelihood of y given the inputs.
(c) Maximixe the likelihood above to get cls

2. [10 points] Assume that the i for i = 1 . . . n are independent Gaussian random variable
with zero mean and variance V ar(i ) = σi .

(a) Find the conditional distribution of each yi given the inputs

(b) Compute the loglikelihood of y given the inputs.
(c) Maximixe the likelihood above to get cwls
1
3. [5 points] Assume that i for i = 1 . . . n has density fi (x) = f (x) = 2b exp(− |x|
b ). In other
words our noise is iid following Laplace distribution with location parameter µ = 0 and scale
parameter b.

(a) Find the conditional distribution of each yi given the inputs

(b) Compute the loglikelihood of y given the inputs.
(c) Comment on why this model leads to more robust solution.

3 Naive Bayes vs Logistic Regression, [25pt, Derry]

In this problem you will implement Naive Bayes and Logistic Regression, and compare their perfor-
mance on a classification task. The data for this task is given (http://www.cs.cmu.edu/ epxing/Class/10701/hw1-
data.txt). The data is comma-separated (no header), with the first column being the class name.
There are 2 classes: A and B, and 16 features. Each feature can take a value: 1, 2, or 3.

1. [3 points] Provide descriptions of Naive Bayes and Logistic Regression algorithms for the
dataset above, deriving

3
(a) P (Y = A|X1 ...X16 ) and P (Y = B|X1 ...X16 )
(b) how to classify a new example (i.e. the classification rule)
(c) how to estimate the model parameters

Note: you only need to derive the equation, no need to plug in the actual values.

2. [5 points] In class, we showed that logistic regression is the discriminative counterpart to

a Gaussian Naive Bayes classifier for continuous data. Consider the case where each Xi
is boolean. Prove also for this case that P (Y |X) in logistic regression follows the same
form (and hence that Logistic Regression is also the discriminative counterpart to a Naive
Xi
Bayes with Boolean features). Hint: represent P (Xi |Y = A) = θiA (1 − θiA )1−Xi , where
θiA = P (Xi = 1|Y = A) and hence 1 − θiA = P (Xi = 0|Y = A).

3. [4 points] Since Logistic Regression is the discriminative counterpart to a Gaussian Naive

Bayes (we showed in class that the parameters wi in Logistic Regression can be expressed in
terms of the Gaussian Naive Bayes parameters), then

(a) asymptotically (as the number of training examples grows toward infinity), do you think
Logistic Regression and the Gaussian Naive Bayes will converge toward identical classi-
fiers? Comment on why.
(b) Naive Bayes has the assumption of conditional independence and may not work well
when the data violates this assumption. Do you think Logistic Regression also faces this
problem? If not, why?

4. [10 points] Implement Logistic Regression and Naive Bayes for the dataset above. Use add-
one smoothing when estimating the parameters of your Naive Bayes classifier. For logistic
regression, use a step size around .0001. To train and test, follow these steps:

(a) Randomly split dataset into 2/3 training set, 1/3 testing set.
(b) Choose a random subset of the training data to train, with training sizes m from 2 to
200 (with an increment of 1 or close to 1).
(c) After training each subset, test against the held-out testing set. Calculate the classifi-
cation error as the ratio of incorrectly classified to the total testing set size.
(d) Repeat 100 times from beginning, averaging the classification error over the 100 runs.
(e) Plot the average error vs. the training sizes m, comparing Logistic Regression and Naive
Bayes.

Submit your code online. Submit your printed plot along with your homework.

5. [3 points] Which model performs better:

(a) at the beginning when there is little training data?

(b) as there are more data?
(c) which model would you prefer when there is little training data and which do you prefer
when there is more training data and why? (Hint: Naive Bayes and Logistic Regression
converge toward their asymptotic accuracies at different rates. Naive Bayes converge
toward their asymptotic values in order n = log d examples, where d is the dimension
of X. Logistic regression converges more slowly, in order n = d examples)

Cs229 Midterm Aut2015
No ratings yet
Cs229 Midterm Aut2015
21 pages
Homework 1
0% (1)
Homework 1
4 pages
Practice Midterm
No ratings yet
Practice Midterm
4 pages
Assignment 1
No ratings yet
Assignment 1
6 pages
hw3_red
No ratings yet
hw3_red
4 pages
Midterm 2006
No ratings yet
Midterm 2006
11 pages
ML Midterm Question Pool
No ratings yet
ML Midterm Question Pool
7 pages
Merging Result-Merged
No ratings yet
Merging Result-Merged
14 pages
MLvsMAP Merged
No ratings yet
MLvsMAP Merged
208 pages
cs675 SS2022 Midterm Solution PDF
No ratings yet
cs675 SS2022 Midterm Solution PDF
10 pages
Midterm Practice Questions
No ratings yet
Midterm Practice Questions
14 pages
178 hw3
No ratings yet
178 hw3
3 pages
178 hw1
No ratings yet
178 hw1
4 pages
2023-24 AIML ML Mid-Semester Regular QP Anwer-Keys
No ratings yet
2023-24 AIML ML Mid-Semester Regular QP Anwer-Keys
4 pages
07au Midterm
No ratings yet
07au Midterm
17 pages
Machine Learning Foundations and Applications Assignment 1 Due Date: 10 October, 2021
No ratings yet
Machine Learning Foundations and Applications Assignment 1 Due Date: 10 October, 2021
3 pages
Midterm Sp16 Solutions
100% (1)
Midterm Sp16 Solutions
17 pages
Machine Learning
No ratings yet
Machine Learning
7 pages
CS725 2020 Midsem
No ratings yet
CS725 2020 Midsem
3 pages
Midterm 2010 F
No ratings yet
Midterm 2010 F
15 pages
Kernel PCA
No ratings yet
Kernel PCA
13 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
56 pages
601 sp09 Midterm Solutions
No ratings yet
601 sp09 Midterm Solutions
14 pages
ps1
No ratings yet
ps1
9 pages
Machine Learning PYQ 2021
No ratings yet
Machine Learning PYQ 2021
4 pages
CSE 455/555 Spring 2012 Homework 1: Bayes ∗ ω
100% (1)
CSE 455/555 Spring 2012 Homework 1: Bayes ∗ ω
3 pages
Machine 2021 Jul-Dec
No ratings yet
Machine 2021 Jul-Dec
46 pages
hw2 2020
No ratings yet
hw2 2020
3 pages
Midterm Solutions PDF
No ratings yet
Midterm Solutions PDF
17 pages
Midterm Solutions Machine
100% (1)
Midterm Solutions Machine
17 pages
CMU 2018s NinaBALCAN HW3
No ratings yet
CMU 2018s NinaBALCAN HW3
7 pages
ML Assignment 2 2019 Nptel
No ratings yet
ML Assignment 2 2019 Nptel
34 pages
hw5_1
No ratings yet
hw5_1
6 pages
Solution of Final Exam: 10-701/15-781 Machine Learning: Fall 2004 Dec. 12th 2004
No ratings yet
Solution of Final Exam: 10-701/15-781 Machine Learning: Fall 2004 Dec. 12th 2004
27 pages
HW 2
No ratings yet
HW 2
4 pages
10-701/15-781 Machine Learning - Midterm Exam, Fall 2010: Aarti Singh Carnegie Mellon University
No ratings yet
10-701/15-781 Machine Learning - Midterm Exam, Fall 2010: Aarti Singh Carnegie Mellon University
16 pages
Ps 2
No ratings yet
Ps 2
11 pages
hw2 311
No ratings yet
hw2 311
4 pages
ML Assignment
No ratings yet
ML Assignment
7 pages
Ps 1
No ratings yet
Ps 1
25 pages
Exam1 Practice Solutions
No ratings yet
Exam1 Practice Solutions
25 pages
ML 2023a Midsem Solution
No ratings yet
ML 2023a Midsem Solution
9 pages
Midterm 2002
No ratings yet
Midterm 2002
10 pages
HW1 Solution PDF
No ratings yet
HW1 Solution PDF
13 pages
MachineLearning MidTerm UMT Spring 2021
No ratings yet
MachineLearning MidTerm UMT Spring 2021
12 pages
MedTerm Machine Learning
No ratings yet
MedTerm Machine Learning
14 pages
Solutions: 10-601 Machine Learning, Midterm Exam: Spring 2008 Solutions
No ratings yet
Solutions: 10-601 Machine Learning, Midterm Exam: Spring 2008 Solutions
8 pages
Midterm Solutions
No ratings yet
Midterm Solutions
8 pages
hw2_red
No ratings yet
hw2_red
4 pages
Machine 2021 Jan-Apr
No ratings yet
Machine 2021 Jan-Apr
45 pages
Midterm Solutions For Machine Learning
No ratings yet
Midterm Solutions For Machine Learning
13 pages
Statistical Methods for ML
No ratings yet
Statistical Methods for ML
24 pages
ML Question CMU
No ratings yet
ML Question CMU
12 pages
Machine Learning Assignments and Answers
No ratings yet
Machine Learning Assignments and Answers
35 pages
Sci ML Mock Exam 2023
No ratings yet
Sci ML Mock Exam 2023
8 pages
A2 Sol
No ratings yet
A2 Sol
17 pages
Machine 2020 Jul-Dec
No ratings yet
Machine 2020 Jul-Dec
45 pages
quiz2_B
No ratings yet
quiz2_B
6 pages
Pre-Calculus Essentials
From Everand
Pre-Calculus Essentials
Ernest Woodward
No ratings yet
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)
City 11+ Maths Sample Paper.109675041
No ratings yet
City 11+ Maths Sample Paper.109675041
18 pages
Automated RTG: The Yard Revolution
No ratings yet
Automated RTG: The Yard Revolution
12 pages
Baker-Kahn (1971)
No ratings yet
Baker-Kahn (1971)
13 pages
Jurnal EEG
No ratings yet
Jurnal EEG
17 pages
Accessibility in Public Buildings
No ratings yet
Accessibility in Public Buildings
8 pages
Exports: Automobile Industry in India
No ratings yet
Exports: Automobile Industry in India
3 pages
Portal:Runway Incursion - SKYbrary Aviation Safety
No ratings yet
Portal:Runway Incursion - SKYbrary Aviation Safety
2 pages
Mass Analyzers and Ionization Methods: Greg Barrett-Wilt, PHD Uw Mass Spectrometry/Proteomics Facility
No ratings yet
Mass Analyzers and Ionization Methods: Greg Barrett-Wilt, PHD Uw Mass Spectrometry/Proteomics Facility
32 pages
Module 1 Introduction To International Cuisine 1
No ratings yet
Module 1 Introduction To International Cuisine 1
15 pages
Optimum Anode-to-Cathode Speed: Copper Acid, High-Speed 5260
No ratings yet
Optimum Anode-to-Cathode Speed: Copper Acid, High-Speed 5260
2 pages
DOC126 EasyStart Pro Fault Codes For TP7 and TP7.1
No ratings yet
DOC126 EasyStart Pro Fault Codes For TP7 and TP7.1
9 pages
Lynx Brochure 2018 Web PDF
No ratings yet
Lynx Brochure 2018 Web PDF
6 pages
Efficient Crop Yield Analysis Prediction in Modern Agriculture System Using Machine Learning Algorithm
No ratings yet
Efficient Crop Yield Analysis Prediction in Modern Agriculture System Using Machine Learning Algorithm
4 pages
Where can buy (Ebook) Ophthalmology residency made easy by Dr neha adlakha ISBN 9789389774887, 9389774888 ebook with cheap price
100% (1)
Where can buy (Ebook) Ophthalmology residency made easy by Dr neha adlakha ISBN 9789389774887, 9389774888 ebook with cheap price
77 pages
DOC177
No ratings yet
DOC177
10 pages
A Case of Bilateral Corneal Perforation in A Patient With Severe Dry Eye in India
No ratings yet
A Case of Bilateral Corneal Perforation in A Patient With Severe Dry Eye in India
2 pages
Senecio Grandidentatus Ledeb. (Asteraceae) : Anatomical Aspects of The Stem and Leaf of
No ratings yet
Senecio Grandidentatus Ledeb. (Asteraceae) : Anatomical Aspects of The Stem and Leaf of
7 pages
- annona-squamosa-leaves-extract-a-potential-dog-gel-soap-ingredient-for-eradication-of-ticks-đã chuyển đổi
No ratings yet
- annona-squamosa-leaves-extract-a-potential-dog-gel-soap-ingredient-for-eradication-of-ticks-đã chuyển đổi
8 pages
مشروع تصميم PDF
No ratings yet
مشروع تصميم PDF
20 pages
5.0 Aristotelian vs. Galilean Motion
0% (1)
5.0 Aristotelian vs. Galilean Motion
24 pages
Analysis For Website
No ratings yet
Analysis For Website
3 pages
Rudrayāmalokta UgraTārākavacam+
No ratings yet
Rudrayāmalokta UgraTārākavacam+
11 pages
J ctt46mz0k 7
No ratings yet
J ctt46mz0k 7
17 pages
Fire Load Calculation PDF
No ratings yet
Fire Load Calculation PDF
14 pages
IELTS Listening Practice Test 10 Printable
No ratings yet
IELTS Listening Practice Test 10 Printable
11 pages
Computer Graphics
No ratings yet
Computer Graphics
82 pages
Instant Download Pharmaceuticals From Microbes Impact On Drug Discovery Divya Arora PDF All Chapter
100% (5)
Instant Download Pharmaceuticals From Microbes Impact On Drug Discovery Divya Arora PDF All Chapter
52 pages
Chemistry 101A General College Chemistry PDF
No ratings yet
Chemistry 101A General College Chemistry PDF
603 pages
Actions (Loading) On Bridges-Eurocode Guideline
100% (1)
Actions (Loading) On Bridges-Eurocode Guideline
97 pages
Grade 7 Maths Test Finals
0% (1)
Grade 7 Maths Test Finals
2 pages