An introduction to Active Learning

Santiago Valdarrama
3 min readMay 7, 2021
One of the hypothetical circuit boards that our factory produced.

Do you know what scares me? Having to go through a mountain of data to come up with labels.

Data labeling is hard, expensive, and sometimes outright prohibitive. Data labeling can kill your machine learning project even before it starts.

Let’s kick this off with a hypothetical problem: we’d like to build a model capable of visually inspecting photos of circuit boards and classify them based on their specific configuration.

Imagine a factory producing thousands of these boards per minute. Going through each circuit board manually would be a nightmare and slow down production significantly. That’s where our model would come in!

A big hurdle

How hard would it be to use deep learning to build a classifier that categorizes a picture of a circuit board appropriately?

As with any good ol’ classification model, we need to build a training set containing samples for each specific category we want to classify.

And here lies the problem.

Somebody would have to do this manually. Depending on how many images we need, the number of different classes, how long it takes to inspect each picture visually, and how much we need to pay people to do this job, creating the training dataset may be too expensive to consider.

--

--

Santiago Valdarrama

I build machine learning systems until 5 pm. Then I come here and tell you stories about them.