0% found this document useful (0 votes)
16 views

Homework 1

This document contains solutions to 7 problems related to machine learning and optimization homework. Problem 1 involves determining whether systems of equations are consistent or underdetermined using Gaussian elimination. Problem 2 covers maximum likelihood estimation, properties of Gaussian distributions, and independence. Problem 3 analyzes time complexities of algorithms. Problem 4 fits linear and polynomial regression models to data using R. Problem 5 discusses properties of hyperplanes. Problem 6 covers properties of variance. Problem 7 involves additional concepts in machine learning and optimization.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Homework 1

This document contains solutions to 7 problems related to machine learning and optimization homework. Problem 1 involves determining whether systems of equations are consistent or underdetermined using Gaussian elimination. Problem 2 covers maximum likelihood estimation, properties of Gaussian distributions, and independence. Problem 3 analyzes time complexities of algorithms. Problem 4 fits linear and polynomial regression models to data using R. Problem 5 discusses properties of hyperplanes. Problem 6 covers properties of variance. Problem 7 involves additional concepts in machine learning and optimization.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Alejandro Fonseca

Machine learning and optimization


Homework 1

Problem 1
1.

2.

3.

4. Since X is not invertible, the systems given in this exercise will not be consistent, so they will
not have unique solutions, but they can still have infinite solutions, so we need to determine if
they are inconsistent or underdetermined systems. To do this, we need to compute Gaussian
elimination in the augmented matrices and then evaluate the results:
5.

6.

7.
Problem 2
1.

2.

3. The value of p that maximizes the probability of the observed data of a Bernoulli random
variable is the observed sample mean, in this case it would be 𝑝 = 𝑥 = 0. 5
4. If X and Y are Gaussian random variables, the random variable X + aY also follows a
Gaussian distribution.
5. If X is a standard normal Gaussian random variable, it follows a normal distribution with
mean equal to 0 and variance 1, which means that it can take positive and negative values
with the same probability, so multiplying it by a variable that can only take the values {-1,1}
with equal probability does not affect Z in any way. Therefore, Z is independent from Y.

Problem 3
1.
a) 𝐵𝑜𝑡ℎ ℎ𝑎𝑣𝑒 𝑐𝑜𝑚𝑝𝑙𝑒𝑥𝑖𝑡𝑦 𝑂(𝑙𝑜𝑔 𝑛) 𝑠𝑜 𝑓(𝑥) = 𝑂(𝑔(𝑥)) 𝑎𝑛𝑑 𝑔(𝑥) = 𝑂(𝑓(𝑥))
b) 𝑒 > 2 ⇒ 𝑓(𝑥) 𝑔𝑟𝑜𝑤𝑠 𝑓𝑎𝑠𝑡𝑒𝑟 𝑡ℎ𝑎𝑛 𝑔(𝑥) ⇒ 𝑔(𝑥) = 𝑂(𝑓(𝑥))
c) 𝐵𝑜𝑡ℎ 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛𝑠 ℎ𝑎𝑣𝑒 𝑐𝑜𝑚𝑝𝑙𝑒𝑥𝑖𝑡𝑦 𝑂(𝑛), 𝑏𝑢𝑡 𝑠𝑖𝑛𝑐𝑒 𝑔(𝑥) 𝑔𝑟𝑜𝑤𝑠 𝑓𝑎𝑠𝑡𝑒𝑟 𝑡ℎ𝑎𝑛 𝑓(𝑥),
𝑓(𝑥) = 𝑂(𝑔(𝑥)).
2.
Problem 4 (this problem is solved with R)
# We first generate the vectors x and e
x = runif(1000,min = 0, max = 1)
e = rnorm(1000, mean=0, sd= sqrt(0.25))
# y is created as the sum of the previous 2 vectors
y=x+e

1.
plot(x,y, pch=16, cex=0.1)

2.
# the minimum of the function given corresponds to the least squares
# solution of the linear regression model, this can be calculated with
# the matrix multiplication formula:
a = solve(t(x) %*% x) %*% t(x) %*% y
# and adding the line gives us:
abline(0, a)
3.
# this problem can be transformed into a linear regression problem in
# which the different variables are just the original variable to the
# corresponding power.
# The first step is to create the matrix X:

create_x = function(x, d) {
# Create a matrix with d+1 columns
new_x = matrix(NA, nrow = length(x), ncol = d + 1)
# Fill the matrix with powers of x
for (i in 0:d) {
new_x[, i + 1] <- x^i
}
return(new_x)
}

# Now, to find a, we just need to perform the least square matrix multiplication formula:

find_a = function(x, d, y){


final_x = create_x(x,d)
a = solve(t(final_x) %*% final_x) %*% t(final_x) %*% y
return(a)
}

# The solution a to this problem is unique when the number of observations is


# larger or equal to the degree of the polynomial we are trying to fit.
4.
x = runif(1000,min = 0, max = 1)
e = rnorm(1000, mean = 0, sd = sqrt(0.1))
y = 30*((x-0.25)**2)*(x-0.75)**2 + e
plot(x,y, pch=16, cex=0.1)

# In this case, the maximum exponent is 4, so:


d=4
# we calculate the vector a:
a = find_a(x, d, y)
# and to plot the regression curve, we first define the function and add
# the curve to the scatter plot
f = function(x){
result = a[1] + a[2]*x + a[3]*x^2 + a[4]*x^3 + a[5]*x^4
return(result)}
curve(f, from=0, to=1, add=TRUE)
Problem 5

1. The only way for the points of the hyperplane to verify the given relation is for b to be equal
to 0. If it was different to 0, it would not be a constant number, it would change depending on
the point of the hyperplane we choose. It is only in the case where w is orthogonal to the
hyperplane that the given relationship is satisfied and b is a constant (b=0).
2. The distance between a given point y and the the hyperplane is the dot product between the
orthogonal vector of the hyperplane (w) and y, divided by the modulus of w. the result of this
operation gives us the distance between the hyperplane and y.

Problem 6
2 2 2
1. 𝑉𝑎𝑟[𝑎𝑋 + 𝑏] = 𝑉𝑎𝑟[𝑎𝑋] + 𝑉𝑎𝑟[𝑏] = 𝑉𝑎𝑟[𝑎𝑋] = 𝑎 𝑉𝑎𝑟[𝑋] = 𝑎 σ
2.
a)

b)
3.

Problem 7
1.

2.
3.

You might also like