Back in 2013, a man by the name of Eric Loomis was arrested in Wisconsin. Loomis was driving a car that had been used in a shooting and pled guilty to eluding an officer. It’s a case that should be fairly unremarkable but at sentencing, the judge sentenced Loomis to six years in prison based, in part, on the recommendation of a machine learning algorithm called COMPAS (Correctional Offender Management Profiling for Alternative Sanction). When his lawyers asked to examine the algorithm that effectively sentenced their client to jail, they were rebuffed. In other words, COMPAS marked Loomis as a “high risk” offender but his lawyers had no way of understanding why or challenging the model itself.
The Supreme Court had a chance to rule on this case in 2017 but demurred. Since then, researchers from Dartmouth have studied the algorithm and shown that COMPAS is “no better at predicting an individual’s risk of recidivism than random volunteers recruited from the internet.” Past that, COMPAS uses a supposed 137 criteria when determining risk for a defendant and those same researchers found equal success using only two: age and prior convictions. And not only that, but COMPAS was found to display racial bias: it sentenced black defendants over-aggressively while being overly lenient with white ones.
This is the point where Supreme Court Chief Justice Earl Warren would’ve asked “yes, but is that fair?” Should we sentence criminals with a model they can’t interrogate? Is a commercially sold algorithm with provable bias the right way to determine how long a defendant should be behind bars? Is COMPAS fair?
The answer is no. That’s easy. But the answer to “how do we fix this?” is a lot more complex.
As an industry, how we create, train, source, and build our machine learning models is incredibly important. We’ll be covering opportunity in AI and access to AI in our next few pieces. But ethically built machine learning models aren’t worth much if the impact they have on society is definitely bad.
Now, the reason we led this piece with the story of Eric Loomis is simple: courts and legislators don’t currently have the expertise needed to adjudicate these issues. Anyone who’s watched a CEO from a big tech company testify in front of Congress likely understands that. The responsibility is ours. We need to take ownership of the impact of the technologies we’re building. And like every other facet of responsible AI, that means thinking ethically and morally in addition to financially.
Fighting bias in AI
The story of COMPAS is not an outlier. We started our series talking about Google’s image recognition problem wherein pictures of Black engineers were mislabeled as gorillas. But there are many more. Their search engine also exposed far fewer ads for high-paying executive jobs to women. Online loan applications suffer from data that uses redlining practices that date back to the 30s and results in less Black and Hispanic loans being approved when those borrowers should be eligible.
These aren’t toy problems. It’s not mislabeled sentiment or some intern’s AI project gone awry. These biases affect our neighbor’s’ ability to get a mortgage or get into the college of their dreams or land a job they’re qualified for. And all of these models were put into production by real companies who frankly should have known better.
There are of course ways to fix these unfairness problems. Training your model with new information that combats bias — -like training your image rec algorithm on darker complected people — -can help. Removing biasing data like historical information from an era of redlining can help. And remembering that representativeness does not equal usefulness is also important: a model may need extra information about certain classes because they are difficult or nuanced examples. Lastly — and this one is vital — -you should attack bias in the data collection process as well. It’s much easier to fix the cause of your bias problem (the data) than the consequence (the models).
Past that, all AI applications should have some explainability built in. We know some might balk at this and claim it’s an unnecessary burden but if your model is producing bias and you can’t say why, why would you release that model in the first place?
Because again: this isn’t up to legislators. They’re behind on this and may not catch up in time. It’s about the industry taking ownership and responsibility for our technology. Do you want to release a model with innate biases into the world? Do you want what you create to make other people’s lives worse?
That’s hopefully another one of those easy answers.
The Need for Sustainable AI
You don’t need a hundred trend pieces to understand AI is more widespread now than it was just a decade ago. The reason? We emerged from the most recent AI Winter largely because of the massive increase in both compute power and available data. Put simply: there was more stuff to train models on and the machines that trained them got fast enough to make it worth the cost.
There’s a real cost to that explosion of data and compute power though. And we don’t just mean a monetary cost. We also mean an environmental one.
Last year, a study from Emma Strubell out of UMass found that a single deep learning model can generate over 600,000 pounds of carbon dioxide (to put that in perspective, the average American generates 36,000 pounds of carbon dioxide in an entire year). Now consider that best-in-class models in 2018 required 300,000 times the compute resources as they did in 2012. If that trend continues, our industry will be meaningfully contributing to our climate change crisis.
So how do we fix this? We can start by addressing some of our worst habits.
Right now, we’re using too much data to train our models. There’s this pervasive idea that more data is always better when, in fact, data has varying levels of utility. Some of it is helpful to your models, some is useless, and some is actively harmful. Understanding what data will truly make your model better and training on that instead of just training on everything? Not only does that reduce your carbon footprint but it can make your models better and more accurate.
Past that, there’s an issue with re-training. Some practitioners choose to retrain their models from scratch, both during training and when they’re in production, over and over again. This has real sustainability costs and, additionally, squanders some of what the model learned in prior iterations. It’s also worth asking how often your model requires retraining. For certain use cases like eCommerce recommendation algorithms or cybersecurity applications, that might be necessary. For others, it’s a massive use of resources with vanishing utility.
It’s more than a little ironic that AI promises efficiency but we’re training and building AIs inefficiently. That inefficiency has monetary costs for the organizations we work in but also environmental ones. Responsible AI requires us to look honestly at what our habits are and how we can invest in technologies and partnerships that reduce our compute costs, our training times, and our environmental impact.
Just Do More Good
The last point we want to make about impact is a simple one: we as machine learning practitioners need to take it upon ourselves to do more good.
There’s no shortage of AI projects that can help our fellow citizens. Just piggybacking on the last section, we already have researchers doing great work to help fight deforestation, to forecast cyclones, to monitor endangered species, and so, so much more. But AI can help with more than climate change. We know that it’s a great technology for all kinds of medical diagnoses but partnerships, like the one Facebook had with Red Cross to help mapping and natural disaster response, are really promising. After all, nobody expects Facebook to understand the nuances of these efforts like the Red Cross but nobody expects the Red Cross to have hundreds of machine learning scientists on staff.
And we understand: not everyone has the budget or the ability to donate their time to help NGOs and charitable organizations. That said: if you can, do so. At the very least, discounting your business’s price for non-profits and others aiming to solve intractable, real-world problems is an ethical and simple solution we can all pledge to do.
Lastly, one of the best parts of the machine learning community is how much code is open sourced, how much data is available to each of us, and how many practitioners do ML as both a job and a hobby. A few hours here and there can make a big difference to making the world a better place and making AI a force for good, not just a force for profit.