Sir Syed University of Engineering & Technology, Karachi
Implementation of Linear
Regression
Week 3
Session 2
Batch - 2017 Department of Computer Science
Libraries Required
Step 1: Import the following libraries using {import
libraryName as alias}
• NumPy is a library for the Python programming language,
adding support for large, multi-dimensional arrays and
matrices, along with a large collection of high-level
mathematical functions to operate on these arrays.
• pandas is a software library written for
the Python programming language for data manipulation
and analysis. In particular, it offers data structures and
operations for manipulating numerical tables and time
series.
• [Link] is a python module for data visualization
Gradient Descent Algorithm
Gradient descent simply is an algorithm that
makes small steps along a function to find a
local minimum. We can look at a simply
quadratic equation such as this one:
• x_quad = [n/10 for n in range(0, 100)]
• y_quad = [(n-4)**2+5 for n in x_quad]
[Link](figsize = (10,7))
[Link](x_quad, y_quad, 'k--')
[Link]([0,10,0,30])
[Link]([1, 2, 3], [14, 9, 6], 'ro')
[Link]([5, 7, 8],[6, 14, 21], 'bo')
[Link](4, 5, 'ko')
[Link]('x')
[Link]('f(x)')
[Link]('Quadratic Equation')
We’re trying to find the local minimum on this function. If we start at the first
red dot at x = 2, we find the gradient and we move against it. In this case, the
gradient is the slope. And since the slope is negative, our next attempt is
further to the right. Thus bringing us closer to the minimum.
If we start at the right-most blue dot at x = 8, our gradient or slope is positive,
so we move away from that by multiplying it by a -1. And it eventually makes
smaller and smaller updates (as the gradient approaches 0 at the minimum)
until the parameter converges at the minimum we’re looking for
Step 2: Search and download data file named ‘[Link]’ from internet
Step 3: Load the data file using pandas.read_csv( ) function as
data = pd.read_csv(FilePath/[Link]', names = ['population', 'profit'])
Step 4: Split population and profit into X and y
X_df = [Link]([Link])
y_df = [Link]([Link])
m = len(y_df)
Step 5: Plot the data using [Link]( ) function as,
[Link](figsize=(10,8))
[Link](X_df, y_df, 'kx')
[Link]('Population of City in 10,000s')
[Link]('Profit in $10,000s')
Step 6: Initialize variables alpha and iter for gradient descent
iter = 1000
alpha = 0.01
Step 7: Add a columns of 1s as intercept to X
X_df['intercept'] = 1
Step 8: Transform the dataframes into Numpy arrays for easier matrix operations
and initialize theta to 0
X = [Link](X_df)
y = [Link](y_df).flatten()
theta = [Link]([0, 0])
Step 9: Define cost function for Linear regression
𝑚
1 𝑖 2
𝐽 𝜃0 , 𝜃1 = (ℎ𝜃 𝑥 −𝑦 𝑖 )
2𝑚
𝑖=1
def cost_function(X, y, theta):
m = len(y)
## Calculate the cost with the given parameters
J = [Link](([Link](theta)-y)**2)/2/m
return J
Step 10: Call cost function as
cost_function(X, y, theta)
Step 11: Define function gradient descent as
def gradient_descent(X, y, theta, alpha, iterations):
cost_history = [0] * iterations
for iteration in range(iterations):
print X
print [Link](X)
hypothesis = [Link](theta)
loss = hypothesis-y
gradient = [Link](loss)/m
theta = theta - alpha*gradient
cost = cost_function(X, y, theta)
cost_history[iteration] = cost
return theta, cost_history
Step 12: Call function gradient descent as
gd = gradient_descent(X,y,theta,alpha, iter)
Step 13: Plot the best fit line as
best_fit_x = [Link](0, 25, 20)
best_fit_y = [t[1] + t[0]*xx for xx in best_fit_x]
[Link](figsize=(10,6))
[Link](X_df.population, y_df, '.')
[Link](best_fit_x, best_fit_y, '-')
[Link]([0,25,-5,25])
[Link]('Population of City in 10,000s')
[Link]('Profit in $10,000s')
[Link]('Profit vs. Population with Linear Regression Line')
[Link]()
Tasks to perform:
• Derive equations for Gradient Descent by using Linear
Regression cost function.
• Follow above steps and run the above code to get
optimized parameters for Linear Regression by using
Gradient Descent. Print the optimized parameters and
visualizations and attach in your file.
• By using the parameters obtained in above question,
manually solve linear regression hypothesis equation for
x=3.7 and x=7.4 by showing all necessary steps.
• Search a dataset for Linear Regression and apply same
algorithm on your dataset. Print the optimized parameters
and visualizations and attach in your file. Also attach the
code of this part in your file