Our blog
No, the world doesn’t need another synthetic data company
Let’s begin with the obvious: every machine learning project starts with data. Whether that data needs to be labeled, collected, generated, cleaned, munged, or fussed with in any way, shape, or form, we all understand that machine learning requires not only data, but...
Just because the data is representative doesn’t mean it’s useful
The blueprints for the first machine that could vaguely be called a computer were created by Charles Babbage in the 1830s. It was called the Difference Engine. The plans called for a monstrous, steam punk contraption, a collection of gears that had to be physically...
Using Explore-Exploit to Build a Better Breed of Active Learning
Explore-exploit is a paradigm that goes way beyond Machine Learning; it is actually the conceptualization of an everyday dilemma that we face at almost every instant of the day when we make even the simplest decision. The human brain is wired to seek...
How we can understand what data your model needs–without looking at your model
At Alectio, we’ve pioneered a technique that lets us understand how a model’s learning and what data the model needs without looking at either the model or the data. Simply put: we use machine learning to understand how a machine learning model works and importantly,...
How to tell if active learning will work for your problem
Active learning is one of the most misunderstood techniques in machine learning. Many of us had some experience with it in school, using those well-curated academic datasets but few people use it in the business world to handle real-world data with all its messy...
Here’s why you need a data collection strategy
Let us introduce you to DailyDialog. DailyDialog is a manually labeled, multi-turn dialog dataset covering a whole host of emotions, topics, lengths, and types of statements. This dataset includes stuff like casual chats about the weather, couples negotiating about...
Why the end of Moore’s Law means the end of Big Data as we know it
The year is 1965. Lyndon B. Johnson is sworn in as president. The Rolling Stones releases “Satisfaction,” their first number one single in the United States. Vietnam War protests grow in size and frequency. Canada adopts its familiar maple leaf flag. And the first...
All data is not created equal
Perhaps the most pervasive misconception in data science is “the more data, the better.” Just think about how many people you know who have been collecting and hoarding as much of it as possible. Think of how many colleagues and bosses who’ve admonished you to do so....
Everything you wanted to know about Alectio
Here are the questions we most frequently get and the answers we most frequently give: What is active learning? Active learning is a semi-supervised machine learning strategy. Generally speaking, active learning aims to reduce the amount of labeled data required to...