Deep Dive on Active Learning

What is Active Learning?

Active Learning isn’t anything new, and it’s not very complicated either, once you get the idea.

It was originally developed for the purpose of reducing labeling costs, but it actually offers countless other benefits which the Alectio Platform will help you gain full advantage of.

Its workflow is very straightforward, and is based on an iterative approach. Each cycle is called a loop.



Active Learning can be broken down into two main categories:

Streaming Active Learning, where the algorithm decides whether or not to keep a record one-by-one. This technique tends to be more compute-efficient as it is not necessary to assess the entire dataset before selecting a record, but it fails to control the quantity of data that gets selected. An example of streaming approach would be to apply a confidence-level threshold to filter data for the next loop.

Pooling Active Learning, where the algorithm scans through the entire remaining pool of unselected data to select the most interesting records. While more computationally expensive, this approach offers the advantage of being able to tune the quantity of data (and hence, the budget), better.

What is a Querying Strategy?

Active Learning consists in dynamically choosing some records from raw, unannotated data, and getting those records annotated so that they can be used for training. The workflow is essentially about moving records from what we often refer to as the unselected data on this website, to a selected data pool. The question of how this selection is made is what querying strategies are all about.

A querying strategy (sometimes also called a selective sampling method) is simply a function that selects a sample of data to be annotated and used for training in the next loop. How this function gets chosen, is naturally the bigger question.

Most data scientists familiar with Active Learning assume that there are only a few querying strategies to pick from, just like there is only a finite number of different ML algorithms, but this is actually incorrect as a querying strategy can literally be any sampling function.

In general, people who use Active Learning pre-determine a static, arbitrary querying strategy set once and for all before the beginning of the process. Because Active Learning isn’t mainstream in the industry, very few people have developed the necessary expertise to choose those strategies wisely, leading to poor performance which leads many to give up. Besides, there is actually no reason why the same query strategy should be used through the entire process. For example, some querying strategies provide a really high compression but are computationally-greedy, which might make them suitable candidates only for the earliest loops when the amount of selected data is still low.

At Alectio, we wanted to address all misconceptions about Active Learning by completely rethinking what a querying strategy should be and taking a Machine Learning approach to building a querying strategy.

Why is Active Learning not as easy as it seems?

Multi-Generation Active Learning
When selecting data for a given loop,vanilla querying strategies rely on the outcome of the previous loop only. That said, the state of the model and metadata generated over the entire lifetime of the process contains precious information that has been disregarded by ML researchers so far. One of our breakthroughs was to treat all metadata as time-series rather than singular values.
ML-Driven Active Learning
Meta Active Learning
Learning-to-Active-Learn
Compute-Saving Active Learning

Even with the perfect querying strategy, Active Learning presents a major disadvantage: it is extremely compute-greedy. That’s because the model is re-trained from scratch with the data selected so far at each loop.

Book a Demo Today.

BOOK A DEMO

TRY IT