A Practical Approach To Linear Regression in Machine Learning - by Ashwin Raj - Towards Data Science
A Practical Approach To Linear Regression in Machine Learning - by Ashwin Raj - Towards Data Science
Listen Share
Imagine having this ability to predict real-world outcomes based on just one feature.
Sounds a bit magical, right? Well, this magic is called Simple Linear Regression, and
that’s a fundamental tool in the world of Machine Learning. But don’t let this term
intimidate you - it’s much simpler than you may think.
https://towardsdatascience.com/linear-regression-5100fe32993a 1/20
11/4/23, 11:10 AM A Practical Approach to Linear Regression in Machine Learning | by Ashwin Raj | Towards Data Science
If you find this blog helpful, do consider giving it some claps so that Medium knows,
that you’re enjoying, what you’re reading. For more exciting Machine Learning
Recipes, make sure to follow me. Let’s delve straight into our topic!
One of the most common example wherein a linear regression model is used is
when predicting the price of a house by analyzing sales data of that region.
Linear Regression can be classified into two main categories - Simple Linear
Regression and Multiple Linear Regression. The former centers around just one
https://towardsdatascience.com/linear-regression-5100fe32993a 2/20
11/4/23, 11:10 AM A Practical Approach to Linear Regression in Machine Learning | by Ashwin Raj | Towards Data Science
single independent variable while the latter extends its reach to multiple
independent variables, creating a multidimensional landscape for predictive
modeling. In this article, we will be discussing the Simple Linear Regression.
wherein, ‘m’ is the slope of the line, and ‘c’ is the intercept, and together they are
referred to as the ‘Model’s Coefficients’. This equation is the basis for any Linear
Regression model and is often referred to as the Hypothesis Function
The goal of most machine learning algorithms is to construct a model i.e. the
hypothesis (H) to estimate the dependent variable based on our independent
variables such that it minimizes the Loss Function (Residual Sum of Squares)
Aptly termed the Least Squares Method, this approach seeks to find the most
optimal values for the model parameters that minimize the loss function/RSS
https://towardsdatascience.com/linear-regression-5100fe32993a 3/20
11/4/23, 11:10 AM A Practical Approach to Linear Regression in Machine Learning | by Ashwin Raj | Towards Data Science
One of the primary metric used to assess the model’s performance is MSE, or Mean
Squared Error. This metric quantifies the squared difference, between the actual
observed values and the predicted values from the regression line. A lower MSE
indicates that the model’s predictions are closer to actual values
Beyond these, you may also explore other metrices such as: Residual Sum of
Squares, and Residual Analysis that offer a more comprehensive evaluation of
model performance. Read more about regression evaluation metrics here
3. Normality of Residuals: This assumption signifies that all residuals should follow
a normal distribution pattern, with a mean of zero. If the confidence intervals
become too wide or narrow, some statistical tests may not hold true
https://towardsdatascience.com/linear-regression-5100fe32993a 4/20
11/4/23, 11:10 AM A Practical Approach to Linear Regression in Machine Learning | by Ashwin Raj | Towards Data Science
Ashwin Raj
In this tutorial, we’re going to create a diamond price prediction model using Simple
Linear Regression. We’ll train this model using the Diamonds dataset found on
Kaggle. You can grab the dataset for yourself from this GitHub repo
Here, I will be using Google Colab to build this model. You can also use other IDE’s
like PyCharm, Selenium, VS Code, etc. to follow along with this tutorial
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
https://towardsdatascience.com/linear-regression-5100fe32993a 5/20
11/4/23, 11:10 AM A Practical Approach to Linear Regression in Machine Learning | by Ashwin Raj | Towards Data Science
Once we’ve loaded these packages, the next task is to fetch the dataset & load the
data. For this, we’ll employ the read_csv method from the pandas library. While
specifying the dataset’s location, if the file is located in some different directory, you
will need to provide the relative or the absolute path to the file
data = pd.read_csv("datasets/diamonds.csv")
Although usually CSV files are used for machine learning tasks, JSON files & Excel
Search
spreadsheets can also be used as datasets. The only distinction will be to use the
read_json() & read_excel() functions when working with such files
print(data.head(3))
https://towardsdatascience.com/linear-regression-5100fe32993a 6/20
11/4/23, 11:10 AM A Practical Approach to Linear Regression in Machine Learning | by Ashwin Raj | Towards Data Science
When dealing with unseen datasets, ensuring data quality is paramount. The real-
world datasets often come with few missing values, duplicate records, & outliers.
Inaccurate or incomplete information can lead to skewed results, & analysis. So
before we put our model to the test, we need to clean our dataset
data.fillna(data.mean(numeric_only=True), inplace=True)
data.drop_duplicates(inplace=True)
One of the most common challenges, when working with real-world datasets is
dealing with missing data. To handle such missing values, a good approach is to
impute the missing values with the mean of that column. This approach is
particularly useful when the data is random, and unlikely to introduce bias
Another challenge that we might come across when working with real-world data is
managing the duplicate entries. To tackle this issue, a straightforward method is to
eliminate these duplicates, using the drop_duplicates() function
Q1 = data.quantile(0.25, numeric_only=True)
Q3 = data.quantile(0.75, numeric_only=True)
IQR = Q3 - Q1
data = data[
~((data < (Q1 - 1.5 * IQR)) | (data > (Q3 + 1.5 * IQR))).any(axis=1)
]
Outliers are data points that deviate significantly from the rest of the dataset These
may indicate errors, anomalies, or unique occurrences. One approach to managing
outliers involves filtering out data points that lie beyond the IQR range (i.e below Q1
– 1.5 * IQR or above Q3 + 1.5 * IQR wherein IQR = Q3 - Q1)
https://towardsdatascience.com/linear-regression-5100fe32993a 7/20
11/4/23, 11:10 AM A Practical Approach to Linear Regression in Machine Learning | by Ashwin Raj | Towards Data Science
Image by Author
Matplotlib & Seashore are excellent libraries that can be used for visualizing data.
Some of the commonly used visualization techniques are as described:
1. Pair Plot: Pair plots paint a matrix of scatter plots, revealing how variables
correlate. These are often used to spot trends, clusters, or even outliers & are often
used to study the relationship between the predictor and the predictant
sns.pairplot(
data,x_vars=['carat'],y_vars=['price'],height=12,kind='scatter'
)
plt.xlabel('Carat')
plt.ylabel('Price')
plt.title('Diamond Price Prediction - Carat vs Price')
plt.show()
plt.figure(figsize=(12, 8))
sns.heatmap(
https://towardsdatascience.com/linear-regression-5100fe32993a 8/20
11/4/23, 11:10 AM A Practical Approach to Linear Regression in Machine Learning | by Ashwin Raj | Towards Data Science
plt.show()
3. Violin Plot: These are used to study the distribution of data across multiple levels,
much like a violin’s shape. These are often used to spot the density of values,
especially when exploring the spread of the given dependent variable
plt.figure(figsize=(10, 6))
Other important visualizations include box plot for distribution and outliers, the
regression plot for model alignment and bar plot for categorical insights
https://towardsdatascience.com/linear-regression-5100fe32993a 9/20
11/4/23, 11:10 AM A Practical Approach to Linear Regression in Machine Learning | by Ashwin Raj | Towards Data Science
To start off, divide the data into two parts: one for training the model and the other
for testing its performance. A common practice is to allocate over 70% to 80% of the
data for training & reserve the remaining portion for validation
X_train,X_test,y_train,y_test = train_test_split(X,y,train_size=0.7,random_stat
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
Note that only the feature variables needs to be scaled. Also, in simple linear
regression, as there is only a single feature involved, scaling may be skipped.
linear_regression_model = LinearRegression()
linear_regression_model.fit(X_train_scaled,y_train)
y_pred = linear_regression_model.predict(X_test_scaled)
Next, we need to identify the best-fit line that represents the relationship b/w
predictor, and the predictant. Our aim is to minimize the difference between the
predicted & actual values by adjusting the slope, and intercept of the line
https://towardsdatascience.com/linear-regression-5100fe32993a 10/20
11/4/23, 11:10 AM A Practical Approach to Linear Regression in Machine Learning | by Ashwin Raj | Towards Data Science
Finally, we have successfully built the model & now it’s time to put it through its
paces & see how it holds up. For this, we calculate the MSE & the R2 Score of the
model. Our aim here is to minimize the MSE & maximize the R2 Score
However, simple linear regression is not the only linear algorithm. There are
extensions that tackle the limitations of such models. Polynomial regression for
instance, introduces curves & bends, accommodating nonlinear patterns.
https://towardsdatascience.com/linear-regression-5100fe32993a 11/20
11/4/23, 11:10 AM A Practical Approach to Linear Regression in Machine Learning | by Ashwin Raj | Towards Data Science
With that, we have reached the end of this article. You have chosen the right career
at exactly the right time. If you have any questions or if you believe I have made a
mistake, feel free to connect with me. Get in touch with me over LinkedIn. Read
more such Machine Learning Recipes here. Happy Learning!
Follow
https://towardsdatascience.com/linear-regression-5100fe32993a 12/20
11/4/23, 11:10 AM A Practical Approach to Linear Regression in Machine Learning | by Ashwin Raj | Towards Data Science
859
https://towardsdatascience.com/linear-regression-5100fe32993a 13/20
11/4/23, 11:10 AM A Practical Approach to Linear Regression in Machine Learning | by Ashwin Raj | Towards Data Science
The Next Frontier of Search: Retrieval Augmented Generation meets Reciprocal Rank Fusion
and Generated Queries
2.1K 24
2.4K 30
https://towardsdatascience.com/linear-regression-5100fe32993a 14/20
11/4/23, 11:10 AM A Practical Approach to Linear Regression in Machine Learning | by Ashwin Raj | Towards Data Science
1K
https://towardsdatascience.com/linear-regression-5100fe32993a 15/20
11/4/23, 11:10 AM A Practical Approach to Linear Regression in Machine Learning | by Ashwin Raj | Towards Data Science
14
Mohamadhasan Sarvandani
https://towardsdatascience.com/linear-regression-5100fe32993a 16/20
11/4/23, 11:10 AM A Practical Approach to Linear Regression in Machine Learning | by Ashwin Raj | Towards Data Science
Lists
AshirbadPradhan
https://towardsdatascience.com/linear-regression-5100fe32993a 17/20
11/4/23, 11:10 AM A Practical Approach to Linear Regression in Machine Learning | by Ashwin Raj | Towards Data Science
In this blog, we learn the basics of Simple Linear Regression (SLR), building a linear model with
python libraries.
31
Yennhi95zz
159 2
https://towardsdatascience.com/linear-regression-5100fe32993a 18/20
11/4/23, 11:10 AM A Practical Approach to Linear Regression in Machine Learning | by Ashwin Raj | Towards Data Science
220 3
https://towardsdatascience.com/linear-regression-5100fe32993a 19/20
11/4/23, 11:10 AM A Practical Approach to Linear Regression in Machine Learning | by Ashwin Raj | Towards Data Science
When working with machine learning models, it is important to preprocess the data before
training the model. One common preprocessing…
57
https://towardsdatascience.com/linear-regression-5100fe32993a 20/20