0% found this document useful (0 votes)
35 views

Self-Practice Coding Questions

This document contains 3 coding problems related to linear regression modeling: 1. A weekly lottery jackpot simulation that calculates the expected value and variance of the jackpot amount before the drawing using conditional expectations. 2. A simple linear regression on height and weight data to estimate coefficients, interpret results, and predict height for a person weighing 65kg. 3. A multiple linear regression on customer spending data to estimate coefficients for variables like salary, children, gender, marketing contacts and interpret the results.

Uploaded by

Nguyen Hoang
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views

Self-Practice Coding Questions

This document contains 3 coding problems related to linear regression modeling: 1. A weekly lottery jackpot simulation that calculates the expected value and variance of the jackpot amount before the drawing using conditional expectations. 2. A simple linear regression on height and weight data to estimate coefficients, interpret results, and predict height for a person weighing 65kg. 3. A multiple linear regression on customer spending data to estimate coefficients for variables like salary, children, gender, marketing contacts and interpret the results.

Uploaded by

Nguyen Hoang
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Self-practice Coding Questions

April 6, 2022

Problem 1 (1 point - Random Vectors, Conditional Expectations).

In a weekly lottery, each $1 ticket sold adds 50 cents to the jackpot that
starts at 1 million before any tickets are sold. The jackpot is announced each
morning to encourage people to play. On the morning of the i-th day before
the drawing, the current value of the jackpot Ji is announced. On that day, the
number of tickets sold, Ni , is a Poisson random variable with expected value
Ji . Thus, six days before the drawing, the morning jackpot starts at $1 million
and N6 tickets are sold that day. On the day of the drawing, the announced
jackpot is J0 dollars and N0 tickets are sold before the evening drawing. What
are the expected value and variance of J, the value of the jackpot the instant
before the drawing? Hint: Use conditional expectations.
Write a Python program that simulates m runs of the weekly lottery exper-
iment described above. For m = 1000 sample runs, form a histogram for the
jackpot J.

Problem 2 (0.5 point - Simple linear regression model).

Consider the heights (in centimeters) and weights (in kilograms) dataset

heights 169.6 166.8 157.1 181.1 158.4 165.6 166.7 156.5 168.1 165.3
weights 71.2 58.2 56.0 64.5 53.0 52.4 56.8 49.2 55.6 77.8
We are about to fit a simple linear regression model to this dataset. Here we
treat heights as the response variable and weights as the explanatory variable.

height = β0 + β1 ∗ weight + ϵ,
ϵ ∼ N (0, σ 2 ).

(a) Make a scatter plot to see if there is a linear pattern of the set of points.
Hint: Consider each pair of weight and height as a point on the plane.
Plot all points on the same coordinates system. You can use Python to
do that.

1
(b) Write a Python program to return the estimate the coefficients of the
simple linear model and the variance of the noise.
Hint: Use least square method or maximum likelihood estimation method
then check your result by any calculation tools. It is more convenient to
consider multiple linear regression model using matrix form.

(c) Interpret the estimated coefficients. What can you expect the height of the
one whose weight is 65 centimeters. What is the error of your prediction?

Problem 3 (0.5 point - Multiple linear regression model).

The data set “DirectMarketing.csv” comes from the book Business Analytics
for Managers by Wolfgang Jank. It contains information on 1000 customers in a
customer database for the company Direct Marketing (DM). DM would like to
build a model for the variable AmountSpent, being the amount spent by each
customer in one year on DM products. There are many other variables in the
dataset that can be employed as explanatory variables.
Consider a linear regression model with dependent variable AmountSpent,
and independent variables Salary, Children, Gender b (where 1 = female, 0 =
male) and Catalogs (being the no. sent to each customer)

AmountSpent = β0 + β1 ∗ Salary + β2 ∗ Children + β3 ∗ Gender + β4 ∗ Catalogs + ϵ,


ϵ ∼ N (µ, σ 2 ).

(a) Using Python to produce a matrix of scatter plots to see if there is a linear
correlation between each pair variables Salary, Children, Gender, Catalogs.
(b) Write a Python program to estimate the coefficients of the multiple linear
model and the variance of the noise.
Hint: Use least square method or maximum likelihood estimation method.
You can solve the normal system by hand then use calculation tool to check
your result.
(c) Give the interpretation for each estimated coefficient. What can you ex-
pect the amount spent by a customer in one year on DM products if he
has one child, earning 85600 dollars per year, having been sent 18 times
of marketing? What is the error of your prediction?

You might also like