An introduction to data and concept drift in machine learning

Santiago Valdarrama
3 min readMay 21, 2021
No way to recognize a face with a mask anymore.

You just finished deploying your brand new machine learning model, and everything is working fine.

Unfortunately, your work is not done yet. In fact, you are arguably only half the way through it!

The performance of machine learning models degrades over time. A system that’s working today might be completely broken in a few weeks. Sometimes, the quality of the results of the model lasts longer. Sometimes, it’s just a matter of days.

Understanding why this happens is a fundamental step to prepare for it.

A machine learning model, simplified

Let’s start by taking a look at a simple representation of a machine learning model:

X → y

Given an input X, the model produces a prediction y. The symbol represents the relationship that the model learned between the input and the output it produces.

Now that we trained and deployed our model, what would happen if the distribution of the input X changes?

Data drift

Remember your cellphone camera from 10 years ago?

Imagine that a few smart people created a face recognition model back then. They had to use the data that…

--

--

Santiago Valdarrama

I build machine learning systems until 5 pm. Then I come here and tell you stories about them.