0% found this document useful (0 votes)

26 views12 pages

Final Compre - Solutions - Updated FoDS

Uploaded by

Azure

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views12 pages

Final Compre - Solutions - Updated FoDS

Uploaded by

Azure

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Birla Institute of Technology & Science - Pilani, Hyderabad Campus

First Semester 2023-24

CS F320 – Foundations of Data Science
Comprehensive Examination
Type: Closed Time: 180 mins Max Marks: 80 Date: 19.12.2023

sAll parts of the same question should be answered together.

1.a. We say that two random variables are pairwise independent if p(X2/X1) = p(X2) and hence p(X2, X1) =
p(X1)p(X2/X1) = p(X1)p(X2).
We say that n random variables are mutually independent if p(Xi/XS) = p(Xi) for all S ⊆ {1, . . . , n}\{i}
and hence p(X1, X2, . . ., Xn) = P(X1) P(X2) … P(Xn).
Prove or disprove “Pairwise independence between all pairs of variables necessarily imply mutual independence”.
Note: The proof should be complete and correct if you are proving the statement and need to provide a counter
example in case you are not proving the statement. [8 Marks]

Sol: Suppose you are tossing two fair coins.

A = {First toss is head} = {HH, HT}
B = {Second toss is head} = {HH, TH}, and
C = {The outcomes are same} = {HH, TT}
P(A) = 1/2, P(B) = 1/2 and P(C) = 1/2.
P(A, B) = 1/4 = P(A)P(B).
P(C, B) = 1/4 = P(C)P(B).
P(A, C) = 1/4 = P(A)P(C).
P(A, B, C) = 1/4 ≠ P(A)P(B)P(C).

1.b. Find out the first principal component that emerges as part of PCA for the following data set. [6 Marks]
X1 X2
4 1
2 3
5 4
1 0
Sol:
1.c. You are given the following 2D dataset, draw the first and second principle components on the plot.

Note: You will have to draw the above figure approximately in your answer sheet and in the figure you should
show the first and second principle components. No need to calculate anything but approximately identifying the
principal components is the spirit of the question. [4 Marks]
Sol:

2. Consider the joint distribution p(X; Y)

a. What is the joint entropy H(X,Y)?

Sol:

b. What are the marginal entropies H(X) and H(Y)?

Sol:
c. The entropy of X conditioned on a specific value of y is defined as

Compute H(X/y) for each value of y.

Sol:
d. The conditional entropy is defined as

Compute this.

Sol:
e. What is the mutual information between X and Y? [10 Marks]
Sol:

3.a. Suppose X is a discrete random variable taking ‘n’ values say x1, x2, . . , xn. What is the discrete
distribution that maximizes the entropy of the random variable. What is the discrete distribution that minimizes
the entropy of the random variable. [4 Marks]
Sol:

3.b. Prove that H(X,Y) = H(X) + H(Y/X). [4 Marks]

Sol:
3.c. Can you think of a situation in which identification numbers would be useful for prediction? [2 Marks]
Sol: Student IDs are a good predictor of graduation date.

4.a. Propose k-nearest neighbour algorithm as a generative class of algorithm with all necessary mathematical
formulation. [6 Marks]
Sol: Refer to class notes
4.b. Suppose there are 80 features in a data set with 25000 training examples and 10000 testing examples. With
Euclidean distance as the similarity metric, a 1-Nearest Neighbour classifier is built and the misclassification rate
(on the 10000 test examples) of the dataset is found to be 2.98%. You are asked to randomly permute the features
(columns of the training and test design matrices), and then apply the classifier. Do you think that the
misclassification rate (on the same set of 10000 test examples) of 2.98% changes with the new classifier? Justify
your answer with appropriate reasoning. If you answer without the correct justification will be awarded no marks.
[6 Marks]
Note: A design matrix with 10000 examples having 80 features is of 10000 X 80 size. Every training example is
a row in the design matrix.
Sol:
5.a. In the probabilistic approach to linear regression, we assume that the target variate, t, follows normal
distribution with mean as the predicted target attribute (say, wT x where x is assumed to be the vector of D
dimensions) and variance as s2 where w and s2 are parameters. Assuming that you are given a good estimate of w,
as w’, show that the Maximum Likelihood Estimator (MLE) for the variance, s2 is given by
T 2
(1/N) ∑𝑛𝑖=1 (yi – xi w’) [8 Marks]
Sol:
5.b. Derive the dual formulation of the least squares linear regression problem as discussed in the class and thus
justify how it helps solving the linear regression problem in the dual mode when the number of features are far
more in number as compared to the number of training data examples. [6 Marks]
Sol: Refer to class notes

5.c. Let pemp(x) be the empirical distribution, and let q(x/θ) be some model. Show that argminθKL(pemp||q(x/ θ))
is obtained by q(x) = q(x/θ’), where θ’ is the MLE. [6 Marks]
Sol: Refer to class notes

6.a. Do you agree with the statement “Ordering of attributes is an important activity in constructing parallel
coordinate plots that needs to be considered with utmost care”. Justify your answer with appropriate reasoning.
[2 Marks]
Sol: Yes, ordering of the attributes is very important otherwise we will get weird plots which may not give any
useful information,
6.b. Suppose a hospital tested the age and body fat data for 18 randomly selected adults with the following
Result: [4 Marks]
Draw the boxplots for age and %fat.
Sol:

6.c. Suppose a group of 12 sales price records has been sorted as follows: 3, 3, 102, 58, 7, 28, 9, 75, 122, 17, 98,
72. Partition them into three bins by each of the following methods. [4 Marks]
(a) equal-frequency partitioning
(b) equal-width partitioning
Sol:
Equal Frequency:
bin 1: 3,3,7,9
bin 2: 17, 28, 58, 72
bin 3: 75, 98, 102, 122

Equal Width:
Width= 39.66
bin 1: 3,3,7,9,17, 28
bin 2: 58, 72, 75
bin 3: 98, 102, 122

Matlab
100% (1)
Matlab
309 pages
Gate Statistics Practice Question
No ratings yet
Gate Statistics Practice Question
30 pages
Unit-2 Arrays and StringsPAB
100% (2)
Unit-2 Arrays and StringsPAB
52 pages
MLvsMAP Merged
No ratings yet
MLvsMAP Merged
208 pages
Rtmnu Mechanical Engineering Syllabus
No ratings yet
Rtmnu Mechanical Engineering Syllabus
85 pages
Week 1
100% (1)
Week 1
25 pages
SMAI Question Papers
No ratings yet
SMAI Question Papers
13 pages
Data8 Fa23 Final
No ratings yet
Data8 Fa23 Final
22 pages
HW 2
No ratings yet
HW 2
7 pages
Beam Tool
No ratings yet
Beam Tool
4 pages
Machine 2020 Jul-Dec Practice 7,8
No ratings yet
Machine 2020 Jul-Dec Practice 7,8
37 pages
Endsem ML Regular AK
No ratings yet
Endsem ML Regular AK
7 pages
Data8 Fa23 Final Solutions
No ratings yet
Data8 Fa23 Final Solutions
22 pages
Unit 5
No ratings yet
Unit 5
21 pages
CS 7641 CSE/ISYE 6740 Mid-Term Exam 2 (Fall 2016) Solutions: 1 Probability and Bayes' Rule (14 PTS)
No ratings yet
CS 7641 CSE/ISYE 6740 Mid-Term Exam 2 (Fall 2016) Solutions: 1 Probability and Bayes' Rule (14 PTS)
12 pages
Midterm Solutions
No ratings yet
Midterm Solutions
8 pages
Matrix Diagram
83% (6)
Matrix Diagram
9 pages
Strings
No ratings yet
Strings
23 pages
Spring Mid Sem ML Evalution Scheme
No ratings yet
Spring Mid Sem ML Evalution Scheme
8 pages
2022 CS244 End Sem Soln
No ratings yet
2022 CS244 End Sem Soln
6 pages
DM Practice Problem Set-2
No ratings yet
DM Practice Problem Set-2
7 pages
End Sem PYQ
No ratings yet
End Sem PYQ
8 pages
ST3189 - Machine Learning - 2019 Exam - Zone-B
No ratings yet
ST3189 - Machine Learning - 2019 Exam - Zone-B
6 pages
COMP 1003&1433 Midterm (Tuesday)
No ratings yet
COMP 1003&1433 Midterm (Tuesday)
8 pages
Midterm Trial
No ratings yet
Midterm Trial
15 pages
Department of Electrical Engineering School of Science and Engineering
No ratings yet
Department of Electrical Engineering School of Science and Engineering
10 pages
CS215 Autumn 2024-1
No ratings yet
CS215 Autumn 2024-1
6 pages
Midem ML Regular Solution
No ratings yet
Midem ML Regular Solution
10 pages
Sem 2023
No ratings yet
Sem 2023
6 pages
MS4610 - Introduction To Data Analytics Final Exam Date: November 24, 2021, Duration: 1 Hour, Max Marks: 75
No ratings yet
MS4610 - Introduction To Data Analytics Final Exam Date: November 24, 2021, Duration: 1 Hour, Max Marks: 75
11 pages
ME5895 - Exam 1 - S2024 Rev - Filled
No ratings yet
ME5895 - Exam 1 - S2024 Rev - Filled
7 pages
BDS 2019-20
No ratings yet
BDS 2019-20
5 pages
DS4420 Coding Midterm
No ratings yet
DS4420 Coding Midterm
5 pages
Assignment 1
No ratings yet
Assignment 1
6 pages
Worksheet For Quiz
No ratings yet
Worksheet For Quiz
5 pages
Compre FoDS
No ratings yet
Compre FoDS
3 pages
IT4L2 Java Lab Credits:2 Internal Assessment: 25 Marks Lab: 3 Periods/week Semester End Examination: 50 Marks Objectives
100% (1)
IT4L2 Java Lab Credits:2 Internal Assessment: 25 Marks Lab: 3 Periods/week Semester End Examination: 50 Marks Objectives
2 pages
QB FDS
No ratings yet
QB FDS
5 pages
Practice Midterm 2010
No ratings yet
Practice Midterm 2010
4 pages
2019-20-I MS Key
No ratings yet
2019-20-I MS Key
6 pages
Solutions: 10-601 Machine Learning, Midterm Exam: Spring 2008 Solutions
No ratings yet
Solutions: 10-601 Machine Learning, Midterm Exam: Spring 2008 Solutions
8 pages
Practice Midterm
No ratings yet
Practice Midterm
4 pages
Practice Problems For ML Midterms
No ratings yet
Practice Problems For ML Midterms
5 pages
ES Key
No ratings yet
ES Key
4 pages
Example Questions For Final
No ratings yet
Example Questions For Final
9 pages
Quiz1 18September2021-Ans
No ratings yet
Quiz1 18September2021-Ans
3 pages
CS725 2020 Quiz1
No ratings yet
CS725 2020 Quiz1
3 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
3 pages
DM Makeup Key
No ratings yet
DM Makeup Key
6 pages
Machine Learning Assignment
No ratings yet
Machine Learning Assignment
5 pages
Exam dm1 121017 Ans
No ratings yet
Exam dm1 121017 Ans
8 pages
CSE 569 Homework #1: Notes
No ratings yet
CSE 569 Homework #1: Notes
3 pages
Data Mining - Sem 3 - Assignment - 2
No ratings yet
Data Mining - Sem 3 - Assignment - 2
5 pages
Compre FoDS
No ratings yet
Compre FoDS
2 pages
Quiz3 2024
No ratings yet
Quiz3 2024
2 pages
2021 - Data Mining DU CBCS
No ratings yet
2021 - Data Mining DU CBCS
4 pages
Compre FoDS
No ratings yet
Compre FoDS
2 pages
MMT 008
No ratings yet
MMT 008
8 pages
Mid-Sem 9
No ratings yet
Mid-Sem 9
2 pages
Mid-Sem 11
No ratings yet
Mid-Sem 11
2 pages
Compre FoDS
No ratings yet
Compre FoDS
2 pages
Mid Semester Make-Up
No ratings yet
Mid Semester Make-Up
3 pages
Final Exam, Data Mining (CEN 871) : Name Surname: Student's ID
No ratings yet
Final Exam, Data Mining (CEN 871) : Name Surname: Student's ID
2 pages
ECE457 Pattern Recognition Techniques and Algorithms: Answer All Questions
No ratings yet
ECE457 Pattern Recognition Techniques and Algorithms: Answer All Questions
3 pages
Homework Set 3
No ratings yet
Homework Set 3
7 pages
Mid Semester Make-Up Data Mining Second Semester 2019-2020
No ratings yet
Mid Semester Make-Up Data Mining Second Semester 2019-2020
3 pages
Casio Cfx-9850gb Plus
No ratings yet
Casio Cfx-9850gb Plus
28 pages
Variations On A Theme by Kepler (Colloquium Publications) (Victor W. Guillemin and Shlomo Sternberg) (Z-Library)
100% (3)
Variations On A Theme by Kepler (Colloquium Publications) (Victor W. Guillemin and Shlomo Sternberg) (Z-Library)
97 pages
Numpy Cheat Sheet
No ratings yet
Numpy Cheat Sheet
1 page
12 Half Yearly Exam. (Maths QP)
No ratings yet
12 Half Yearly Exam. (Maths QP)
6 pages
18 3 24 Upto Week 6 A B Latest 1
No ratings yet
18 3 24 Upto Week 6 A B Latest 1
25 pages
Mdsyll
No ratings yet
Mdsyll
241 pages
Rep 205
No ratings yet
Rep 205
2 pages
Linear Algebra-Week-2
No ratings yet
Linear Algebra-Week-2
18 pages
Module 4
No ratings yet
Module 4
75 pages
Week 4 Bartos - Wehr - Understanding Conflict
No ratings yet
Week 4 Bartos - Wehr - Understanding Conflict
19 pages
Lecture 4 - Matrices
No ratings yet
Lecture 4 - Matrices
8 pages
Graphical Plots and Histograms: 16 - 1 Arrays, Matrix Algebra & Complex Numbers
No ratings yet
Graphical Plots and Histograms: 16 - 1 Arrays, Matrix Algebra & Complex Numbers
16 pages
Matrix Pencil 01
No ratings yet
Matrix Pencil 01
11 pages
Matrix Algebra: Unit 4 IB SL Math
No ratings yet
Matrix Algebra: Unit 4 IB SL Math
29 pages
FEM Project Report Group13
No ratings yet
FEM Project Report Group13
16 pages
CH 4
No ratings yet
CH 4
39 pages
CP Manual PDF
No ratings yet
CP Manual PDF
79 pages
DS Unit 1 Essay Answers.
No ratings yet
DS Unit 1 Essay Answers.
18 pages
Dirac Fowler 1997 Quantum Mechanics of Many Electron Systems
No ratings yet
Dirac Fowler 1997 Quantum Mechanics of Many Electron Systems
20 pages
Class-Xii Holiday Homework (2024-25) Science
No ratings yet
Class-Xii Holiday Homework (2024-25) Science
7 pages
KBA 2106 Management Mathematics II Cat 2
No ratings yet
KBA 2106 Management Mathematics II Cat 2
1 page
Izzuddin 2006
No ratings yet
Izzuddin 2006
12 pages
L2 Piv Partiala Eng
No ratings yet
L2 Piv Partiala Eng
2 pages

Final Compre - Solutions - Updated FoDS

Uploaded by

Final Compre - Solutions - Updated FoDS

Uploaded by

Birla Institute of Technology & Science - Pilani, Hyderabad Campus

First Semester 2023-24

sAll parts of the same question should be answered together.

Sol: Suppose you are tossing two fair coins.

2. Consider the joint distribution p(X; Y)

a. What is the joint entropy H(X,Y)?

b. What are the marginal entropies H(X) and H(Y)?

Compute H(X/y) for each value of y.

3.b. Prove that H(X,Y) = H(X) + H(Y/X). [4 Marks]

You might also like