Member-only story
An introduction to data and concept drift in machine learning
You just finished deploying your brand new machine learning model, and everything is working fine.
Unfortunately, your work is not done yet. In fact, you are arguably only half the way through it!
The performance of machine learning models degrades over time. A system that’s working today might be completely broken in a few weeks. Sometimes, the quality of the results of the model lasts longer. Sometimes, it’s just a matter of days.
Understanding why this happens is a fundamental step to prepare for it.
A machine learning model, simplified
Let’s start by taking a look at a simple representation of a machine learning model:
X → y
Given an input X, the model produces a prediction y. The → symbol represents the relationship that the model learned between the input and the output it produces.
Now that we trained and deployed our model, what would happen if the distribution of the input X changes?
Data drift
Remember your cellphone camera from 10 years ago?
Imagine that a few smart people created a face recognition model back then. They had to use the data that…