0% found this document useful (0 votes)
42 views1 page

Concept Drift in Machine Learning

The document discusses concept drift in machine learning, which is when the relationship between input and output variables changes over time, affecting a model's performance. It explains what concept drift is, how it relates to the data science lifecycle, why it needs to be monitored, and methods for addressing it like periodically retraining or updating models.

Uploaded by

tedsm55458
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views1 page

Concept Drift in Machine Learning

The document discusses concept drift in machine learning, which is when the relationship between input and output variables changes over time, affecting a model's performance. It explains what concept drift is, how it relates to the data science lifecycle, why it needs to be monitored, and methods for addressing it like periodically retraining or updating models.

Uploaded by

tedsm55458
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

Search Origin

Shobhit Srivastava

Concept drift in Machine Learning


“The pessimist complains about the wind; the optimist expects it to
change; the realist adjusts the sails.”- William Arthur Ward

Source: Unsplash Anthony Aird

Everything changes with time, data is no exception. The change in data leads
to degrading testing performance of the machine learning model with time.
Ultimately the wrong prediction coming out of the model can affect its
business values.

The relationship between input and output label attributes doesn’t remain
static rather it changes with time, which affects the model performance as it
is unable to understand the new underlying pattern present in the new data.
The effect is termed as Concept Drift in machine learning.

In this article, I will provide a brief overview of this concept that is used in
machine learning quite frequently and is important for every practitioner to
be aware of.

Here’s a brief mention of the points I will be going through the this article.

What is concept drift?

How the concept is related to data science life cycle?

Why do we need to monitor this effect?

How to address the issue?

Conclusion.

What is concept drift?

Concept drift is an effect which leads to degradation of the machine learning


model’s performance over the years. This degradation happens due to the
change in the underlying pattern between the new data set on which model is
tested and the data set on which model is trained. This change happens due
to change is customer’ s product buying pattern or due to some weather
parameter getting changed over time.

We all would be quite familiar with this basic function concept:

Y= f(X)

Here we have a function f which understands the pattern or relationship


between independent variable X and dependent variable Y.

But when this pattern fades, the model gives out the wrong output and
becomes equivalent to garbage.

This is where concept drift comes into effect.

Source: Unsplash by Markus Spiske

How the concept is related to data science life cycle?

We all are aware that the data science project is executed in various phases,
right? Starting with:

Problems identification and its business context.

Data set collection.

Data exploration and feature engineering.

Data visualizations.

Model training and development.

Model testing and deployment.

Model retraining and update process.

I am taking that we all are quite familiar with the top 6 concepts. Concept
drift comes in the last phase i.e., model retraining and updating. It is
where the model is deployed on the customer end and frequent model testing
happens daily. To avoid the model’s deviation, its prediction is monitored
and checked as if it is giving the right predictions or not to maintain business
productivity.

Why do we need to monitor this effect?

We need to monitor this effect because it can cause a huge problem for the
business entity it is running for. Wrong predictions can lead to a business
company losing its reputation as well as its loyal customers as a model could
be providing wrong recommendations that aren’t matching with the new
buying pattern of the users.

Let’s take an example of Corona pandemic time where people have


experienced a major shift in their buying patterns. They are only catering
their requirements to very necessary stuff, of which the model is unaware.
Thus it keeps recommending products to them which aren’t going with the
customer’s choice. Due to this, a business can lose a major chunk of revenue.

How to address the issue?

There are many methods to deal with this issue.

1.Do Nothing (Maintain a single static model).

NO, I am not joking...!. We can just assume that the underlying pattern in the
data doesn’t change over time which in many cases happens so.

Due to this, we can focus on building one single best model for making future
predictions and focus on some other projects.

2.Periodically re-fit the model.

This may be a bit more effective than the first one. We retrain our outdated
model on the new data set coming in, thus explaining the new underlying
pattern in the data set.

This saves the model from becoming ‘Garbage’ and keeps bringing business
values.

3.Periodically update the model.

Instead of updating the outdated model on the new data set we can train and
deploy a new model time by time when our testing shows that the previous
model giving wrong predictions.

This method is a bit more effective as a change in the model can make leads
to more accurate predictions. But model training, as well as its deployment,
takes significant time.

4.Ensemble a new model with the old one.

In this method, we ensemble some new models trained on the new data set
with the outdated model. It is where the new model work together with the
old one at the same time correcting the wrong predictions of the old model.

This method comes out to be a bit complex but more effective than the above
mentioned.

Source: Unsplash by Aaron Burden

[Edit] If you want to dive deep into the topic, please go through this article at
neptune.ai.

Conclusion...

All right guys, that’s it for today. I think we must have learned some new
concepts. This effect is most of the time ignored by the junior data scientists
who after completing one project thinks their work is over, but that’s not the
case. Their responsibilities don’t end there. To maintain our business values
we must monitor and track how and what values we are adding to customer’s
experience and whether the recommendation provider is providing them
with good recommendations or not.

For more as such visit here.

If this article has benefitted you in anyway. Please do support me here


https://www.buymeacoffee.com/shobhitsri

Please feel free to comment below in case you are unclear with any points. I
will reply as soon as possible. You can connect with me here on LinkedIn.
Thank you for co-operating. Have a good day.

Machine Learning Data Science Articles Learning

Recommended from ReadMedium

Eivind Kjosbakken

How to Effectively Forecast Time Series with Amazon's New Time Series
Forecasting Model
Learn about the new Amazon time series model, which you can use to forecast energy usage,
traffic congestion, and weather.

12 min read

Benyam
avata

Mastering Machine Learning Tutorial Creation


Ten Key Insights from My Journey in Crafting Engaging ML Video Tutorials

4 min read

Qwak

How to Build an End-to-End ML Pipeline in 2024


Learn to build an end-to-end ML pipeline and streamline your ML workflows in 2024, from data
ingestion to model deployment and performance…

24 min read

Turkish Technology

Deep Learning with Tabnet


TabNet is a deep learning architecture specifically designed for tabular data, introduced in the
paper “TabNet: Attentive Interpretable…

7 min read

Simran Kaushik

House Price Prediction: A Simple Guide with Scikit-Learn and Linear


Regression
Navigate the realm of predictive analytics with simplicity

7 min read

Hakan Ateşli

Explainable AI With SHAP


From Complexity to Clarity: Exploring AI Transparency with SHAP

13 min read

You might also like