vertopal.com_22644501_lab02 (4)
vertopal.com_22644501_lab02 (4)
Welcome to your first practice lab! In this lab, you will implement linear regression with one
variable to predict profits for a restaurant franchise.
Outline
• 1 - Packages
• 2 - Linear regression with one variable
– 2.1 Problem Statement
– 2.2 Dataset
– 2.3 Refresher on linear regression
– 2.4 Compute Cost
• Exercise 1
– 2.5 Gradient descent
• Exercise 2
– 2.6 Learning parameters using batch gradient descent
NOTE: To prevent errors from the autograder, you are not allowed to edit or delete non-graded
cells in this notebook . Please also refrain from adding any new cells. Once you have passed this
assignment and want to experiment with any of the non-graded code, you may follow the
instructions at the bottom of this notebook.
1 - Packages
First, let's run the cell below to import all the packages that you will need during this
assignment.
• You would like to expand your business to cities that may give your restaurant higher
profits.
• The chain already has restaurants in various cities and you have data for profits and
populations from the cities.
• You also have data on cities that are candidates for a new restaurant.
– For these cities, you have the city population.
Can you use the data to help you identify which cities may potentially give your business higher
profits?
3 - Dataset
You will start by loading the dataset for this task.
• The load_data() function shown below loads the data into variables x_train and
y_train
– x_train is the population of a city
– y_train is the profit of a restaurant in that city. A negative value for profit
indicates a loss.
• A good place to start is to just print out each variable and see what it contains.
The code below prints the variable x_train and the type of the variable.
# print x_train
print("Type of x_train:",type(x_train))
print("First five elements of x_train are:\n", x_train[:5])
x_train is a numpy array that contains decimal values that are all greater than zero.
# print y_train
print("Type of y_train:",type(y_train))
print("First five elements of y_train are:\n", y_train[:5])
Similarly, y_train is a numpy array that has decimal values, some negative, some positive.
• These represent your restaurant's average monthly profits in each city, in units of
$10,000.
– For example, 17.592 represents $175,920 in average monthly profits for that city.
– -2.6807 represents -$26,807 in average monthly loss for that city.
Please print the shape of x_train and y_train and see how many training examples you have
in your dataset.
The city population array has 97 data points, and the monthly average profits also has 97 data
points. These are NumPy 1D arrays.
• For this dataset, you can use a scatter plot to visualize the data, since it has only two
properties to plot (profit and population).
• Many other problems that you will encounter in real life have more than two properties
(for example, population, average household income, monthly profits, monthly
sales).When you have more than two properties, you can still use a scatter plot to see the
relationship between each pair of properties.
# Create a scatter plot of the data. To change the markers to red "x",
# we used the 'marker' and 'c' parameters
plt.scatter(x_train, y_train, marker='x', c='r')
• With this model, you can then input a new city's population, and have the model estimate
your restaurant's potential monthly profits for that city.
• The model function for linear regression, which is a function that maps from x (city
population) to y (your restaurant's monthly profit for that city) is represented as
f w ,b ( x )=w x +b
• To train a linear regression model, you want to find the best ( w ,b ) parameters that
fit your dataset.
– To compare how one choice of ( w ,b ) is better or worse than another choice, you
can evaluate it with a cost function J ( w , b )
• J is a function of ( w ,b ). That is, the value of the cost J ( w , b ) depends on
the value of ( w ,b ).
– The choice of ( w ,b ) that fits your data the best is the one that has the smallest
cost J ( w , b ).
• To find the values ( w ,b ) that gets the smallest possible cost J ( w , b ), you can use a
method called gradient descent.
– With each step of gradient descent, your parameters ( w ,b ) come closer to the
optimal values that will achieve the lowest cost J ( w , b ).
• The trained linear regression model can then take the input feature x (city
population) and output a prediction f w ,b ( x ) (predicted monthly profit for a
restaurant in that city).
5 - Compute Cost
Gradient descent involves repeated steps to adjust the value of your parameter ( w ,b ) to
gradually get a smaller and smaller cost J ( w , b ).
• At each step of gradient descent, it will be helpful for you to monitor your progress by
computing the cost J ( w , b ) as ( w ,b ) gets updated.
• In this section, you will implement a function to calculate J ( w , b ) so that you can check
the progress of your gradient descent implementation.
Cost function
As you may recall from the lecture, for one variable, the cost function for linear regression
J ( w , b ) is defined as
m− 1
1 2
J ( w , b )= ∑
2 m i=0
( f w ,b ( x (i ) ) − y (i ))
• You can think of f w ,b ( x (i )) as the model's prediction of your restaurant's profit, as opposed
to y (i ), which is the actual profit that is recorded in the data.
• m is the number of training examples in the dataset
Model prediction
• For linear regression with one variable, the prediction of the model f w ,b for an example
x (i ) is representented as:
f w ,b ( x (i )) =w x (i )+ b
Implementation
Please complete the compute_cost() function below to compute the cost J ( w , b ).
Exercise 1
Complete the compute_cost below to:
• Iterate over the training examples, and for each example, compute:
– The prediction of the model for that example
f w b ( x( i) )=w x (i )+b
c o s t i =( f wb − y i )
( ) () 2
If you get stuck, you can check out the hints presented after the cell below to help you with the
implementation.
# UNQ_C1
# GRADED FUNCTION: compute_cost
Args:
x (ndarray): Shape (m,) Input to the model (Population of
cities)
y (ndarray): Shape (m,) Label (Actual profits for the cities)
w, b (scalar): Parameters of the model
Returns
total_cost (float): The cost of using w,b as the parameters
for linear regression
to fit the data points in x and y
"""
# number of training examples
m = x.shape[0]
total_cost = cost/(2*m)
### END CODE HERE ###
return total_cost
You can check if your implementation was correct by running the following test code:
# Public tests
from public_tests import *
compute_cost_test(compute_cost)
<class 'numpy.float64'>
Cost at initial w: 75.203
All tests passed!
6 - Gradient descent
In this section, you will implement the gradient for parameters w ,b for linear regression.
m −1
∂ J ( w , b) 1
= ∑ ( f w, b ( x (i) ) − y ( i) )
∂b m i=0
m −1
∂ J ( w , b) 1
= ∑ ( f w, b ( x (i) ) − y ( i) ) x (i )
∂w m i=0
∂ J ( w) ∂ J ( w)
You will implement a function called compute_gradient which calculates ,
∂w ∂b
Exercise 2
Please complete the compute_gradient function to:
• Iterate over the training examples, and for each example, compute:
f w b ( x( i) )=w x (i )+b
∂ J ( w , b ) (i)
=( f w ,b ( x (i )) − y (i )) x (i )
∂w
• Return the total gradient update from all the examples
m −1 (i )
∂ J ( w , b) 1 ∂ J ( w , b)
= ∑
∂b m i=0 ∂b
m −1 (i )
∂ J ( w , b) 1 ∂ J ( w , b)
= ∑
∂w m i=0 ∂w
If you get stuck, you can check out the hints presented after the cell below to help you with the
implementation.
# UNQ_C2
# GRADED FUNCTION: compute_gradient
def compute_gradient(x, y, w, b):
"""
Computes the gradient for linear regression
Args:
x (ndarray): Shape (m,) Input to the model (Population of
cities)
y (ndarray): Shape (m,) Label (Actual profits for the cities)
w, b (scalar): Parameters of the model
Returns
dj_dw (scalar): The gradient of the cost w.r.t. the parameters w
dj_db (scalar): The gradient of the cost w.r.t. the parameter b
"""
Run the cells below to check your implementation of the compute_gradient function with
two different initializations of the parameters w ,b .
compute_gradient_test(compute_gradient)
Gradient at initial w, b (zeros): -65.32884974555672 -5.83913505154639
Using X with shape (4, 1)
All tests passed!
Now let's run the gradient descent algorithm implemented above on our dataset.
• You don't need to implement anything for this part. Simply run the cells below.
• A good way to verify that gradient descent is working correctly is to look at the
value of J ( w , b ) and check that it is decreasing with each step.
• Assuming you have implemented the gradient and computed the cost correctly and
you have an appropriate value for the learning rate alpha, J ( w , b ) should never
increase and should converge to a steady value by the end of the algorithm.
Args:
x : (ndarray): Shape (m,)
y : (ndarray): Shape (m,)
w_in, b_in : (scalar) Initial values of parameters of the model
cost_function: function to compute cost
gradient_function: function to compute the gradient
alpha : (float) Learning rate
num_iters : (int) number of iterations to run gradient descent
Returns
w : (ndarray): Shape (1,) Updated values of parameters of the
model after
running gradient descent
b : (scalar) Updated value of parameter of the
model after
running gradient descent
"""
for i in range(num_iters):
Now let's run the gradient descent algorithm above to learn the parameters for our dataset.
We will now use the final parameters from gradient descent to plot the linear fit.
Recall that we can get the prediction for a single example f ( x ( i) )=w x (i) +b .
To calculate the predictions on the entire dataset, we can loop through all the training examples
and calculate the prediction for each example. This is shown in the code block below.
m = x_train.shape[0]
predicted = np.zeros(m)
for i in range(m):
predicted[i] = w * x_train[i] + b
We will now plot the predicted values to see the linear fit.
Your final values of w ,b can also be used to make predictions on profits. Let's predict what the
profit would be in areas of 35,000 and 70,000 people.
predict1 = 3.5 * w + b
print('For population = 35,000, we predict a profit of $%.2f' %
(predict1*10000))
predict2 = 7.0 * w + b
print('For population = 70,000, we predict a profit of $%.2f' %
(predict2*10000))
For population = 35,000, we predict a profit of $4519.77
For population = 70,000, we predict a profit of $45342.45
Congratulations on completing this practice lab on linear regression! Next week, you will
create models to solve a different type of problem: classification. See you there!