Meet John. John is a machine learning scientist. He likes his cat and he loves his dogs. He has a wife and two kids and he cares about the environment. He rides his bike to work instead of driving but when he has to drive, he takes his Tesla or his wife’s Prius. He installed solar panels on his roof, he donates to the Sierra Club, and he buys organic produce. John wants his grandkids to inherit a cleaner world. He cares genuinely about it.
But there’s a problem and that problem is in the second sentence of this post. John is a machine learning scientist. And machine learning has a sustainability problem.
See, the theoretical framework behind deep learning was established before we had the compute power to really try it. Its success depended on both being able to store massive amounts of data but more importantly to analyze, train, and re-train models quickly. Our ability to train more accurate, more complex models goes hand-in-hand with our computational speed and capacity. And that compute takes energy.
What’s less broadly known is just how much energy we’re talking about here. A recent study noted that training a single deep learning model can produce 626,155 pounds of CO2 emissions. To put that in perspective, that’s the carbon footprint of about 17 Americans for an entire year.
That’s just for one model. Experts estimate that a full two percent of all carbon emissions come from data centers (340 megatons) and that number is expected to quadruple by 2025. Further, data centers require an immense amount of energy, with estimates ranging between 1 and 2.5% of all power consumption being used to power them. Add to this the growth of blockchain, a technology that requires more power and compute to be spent on its increasing complexity. Add to that the preponderance of streaming video and video games and you can start to forecast a world where data centers are chewing more of our available power and producing more than their share of greenhouse gases.
The point here is simple and it’s one John probably doesn’t want to hear: machine learning isn’t green. And if we keep practicing it the way we are today, the problem is going to get much worse.
What we need to do is examine some of the habits we’ve formed in the industry. We need to stop hoarding data and hoping that information will someday become useful when we know it really isn’t. Some 60% of all data we collect is known as “dark data,” meaning that it’s never used for any decision making. It’s just sitting there in our data centers, collecting the equivalent digital dust and cobwebs.
And that’s just the beginning. We need to rethink not just what we collect but what we store, how we train models, how often we retrain models, how complex models should be and so, so much more.
In the coming weeks, we’ll be looking at all of these questions and providing real answers to our sustainability problem. Because at Alectio, we really do believe machine learning has the promise to remake the world and improve the way we do everything from running a business to diagnosing disease to, ironically, understanding climate change and big global trends.
It will take many of us working together to reverse the trajectory we’re on. But it’s better business, it’s smarter, it’s cheaper, and it’s greener to do so. John’s grandkids wouldn’t mind either.