Concept Drift in Machine Learning
Concept Drift in Machine Learning
Shobhit Srivastava
Everything changes with time, data is no exception. The change in data leads
to degrading testing performance of the machine learning model with time.
Ultimately the wrong prediction coming out of the model can affect its
business values.
The relationship between input and output label attributes doesn’t remain
static rather it changes with time, which affects the model performance as it
is unable to understand the new underlying pattern present in the new data.
The effect is termed as Concept Drift in machine learning.
In this article, I will provide a brief overview of this concept that is used in
machine learning quite frequently and is important for every practitioner to
be aware of.
Here’s a brief mention of the points I will be going through the this article.
Conclusion.
Y= f(X)
But when this pattern fades, the model gives out the wrong output and
becomes equivalent to garbage.
We all are aware that the data science project is executed in various phases,
right? Starting with:
Data visualizations.
I am taking that we all are quite familiar with the top 6 concepts. Concept
drift comes in the last phase i.e., model retraining and updating. It is
where the model is deployed on the customer end and frequent model testing
happens daily. To avoid the model’s deviation, its prediction is monitored
and checked as if it is giving the right predictions or not to maintain business
productivity.
We need to monitor this effect because it can cause a huge problem for the
business entity it is running for. Wrong predictions can lead to a business
company losing its reputation as well as its loyal customers as a model could
be providing wrong recommendations that aren’t matching with the new
buying pattern of the users.
NO, I am not joking...!. We can just assume that the underlying pattern in the
data doesn’t change over time which in many cases happens so.
Due to this, we can focus on building one single best model for making future
predictions and focus on some other projects.
This may be a bit more effective than the first one. We retrain our outdated
model on the new data set coming in, thus explaining the new underlying
pattern in the data set.
This saves the model from becoming ‘Garbage’ and keeps bringing business
values.
Instead of updating the outdated model on the new data set we can train and
deploy a new model time by time when our testing shows that the previous
model giving wrong predictions.
This method is a bit more effective as a change in the model can make leads
to more accurate predictions. But model training, as well as its deployment,
takes significant time.
In this method, we ensemble some new models trained on the new data set
with the outdated model. It is where the new model work together with the
old one at the same time correcting the wrong predictions of the old model.
This method comes out to be a bit complex but more effective than the above
mentioned.
[Edit] If you want to dive deep into the topic, please go through this article at
neptune.ai.
Conclusion...
All right guys, that’s it for today. I think we must have learned some new
concepts. This effect is most of the time ignored by the junior data scientists
who after completing one project thinks their work is over, but that’s not the
case. Their responsibilities don’t end there. To maintain our business values
we must monitor and track how and what values we are adding to customer’s
experience and whether the recommendation provider is providing them
with good recommendations or not.
Please feel free to comment below in case you are unclear with any points. I
will reply as soon as possible. You can connect with me here on LinkedIn.
Thank you for co-operating. Have a good day.
Eivind Kjosbakken
How to Effectively Forecast Time Series with Amazon's New Time Series
Forecasting Model
Learn about the new Amazon time series model, which you can use to forecast energy usage,
traffic congestion, and weather.
12 min read
Benyam
avata
4 min read
Qwak
24 min read
Turkish Technology
7 min read
Simran Kaushik
7 min read
Hakan Ateşli
13 min read