In Software Engineering, Tech Debt (sometimes referred to as “Code Debt”) typically happens when engineering teams opt for a quick hack instead of adopting a more complex but more robust implementation, usually in order to meet a deadline. However, even though Tech Debt has bad press, there is nothing fundamentally wrong with taking on Tech Debt, as long as the debtor intends to address it at some point and doesn’t get complacent about paying it off. Adopting Tech Debt, just like financial debt, is a tactic to quickly achieve an outcome that would have had to wait otherwise, which can be the right thing to do when developing new technologies and fixing issues for customers.
Real issues start happening when engineers resort to Tech Debt to please a customer or management but eventually forget or fail to pay it off. In the long term, this oversight, convoluted with tens of others, leads to severe maintainability issues and makes it close to impossible for the engineering team (and in particular newcomers and juniors) to figure things out. From that point on, every new feature leads to additional compounded interest. Eventually, it becomes infeasible to further develop the product: that’s when a state of bankruptcy has been reached and the team has to build everything from scratch all over again. This is why it is so critical to manage Tech Debt continuously.
Unsurprisingly, Tech Debt exists in Machine Learning too, though sadly, data scientists are not properly informed about its risks. And even the few who are, rarely realize that ML Tech Debt goes beyond vanilla Code Debt. They might have heard of Software Engineering best practices, be aware that good code requires good documentation and promote code review sessions among their peers, but unfortunately, in the context of Machine Learning, none of this is sufficient to prevent or manage Tech Debt.
Special Treatment for a Special Type of Software
There are quite a few reasons why ML Tech Debt requires additional precautions compared to standard Software Engineering.
- Machine Learning algorithms might be implemented in the same programming languages, but evaluating whether they work “properly” is much more difficult because the predictions they output can look reasonable in spite of being totally inaccurate. In ML, errors and bugs tend to be much harder to diagnose.
- The outputs of ML systems are often irreproducible, which means the same model run on a different machine can yield different results. This, too, can make such systems tricky to validate.
- Finally, ML models are algorithms trained on data. Hence, evaluating the soundness of the code is only half of the story. Before deploying models to production, engineers need not only to validate the code, but also check the validity of the training dataset. This gets harder when maintaining a model in production, as the data needs to be dynamically controlled. This means that even if everything looked right at deployment, Tech Debt might just be lurking around the corner.
ML models are simply no ordinary pieces of software, and as such, require non-ordinary Tech Debt management measures. It’s a well-known fact: ML systems are complicated and deploying and managing them can give the best DevOps engineers the headache of a lifetime. And that’s precisely why the entire MLOps industry saw the light of day. Few people know it, but the concept of MLOps originated from a Google paper titled “Hidden Technical Debt in Machine Learning Systems”, which implies MLOps was literally invented to address a growing concern about the mounting amount of Tech Debt in Machine Learning.
Can we then expect MLOps to help us deal with ML Tech Debt? Read further to find out!
What Constitutes Tech Debt in Machine Learning?
When they think about Tech Debt, most people think about poor quality software, undocumented code or anything that could make issues difficult to diagnose or fix, or code hard to reuse. But as we have seen already, Tech Debt in the context of Machine Learning takes another dimension.
Additional Code Debt
Even data scientists shouldn’t overlook Code Debt. However, because they don’t always come from a Computer Science background, they sometimes have little knowledge of coding best practices. Besides, copy-pasting other people’s code is common practice among them. Difficult to blame them for doing so though, knowing that most code they have to write involves formatting or transforming datasets, which can hardly be qualified as exciting and is a notoriously error-prone task.
In short, data scientists rely on a lot of open source software and often build their models on top of someone else’s code, and that can lead to disaster. Anyone who has ever tried their luck using a paper with code as a starting point for a project knows that the code is often riddled with errors which need resolving before it can even be run. In addition, open-source libraries are often highly sensitive to local setup and rarely provide sufficient documentation regarding dependencies. It’s not difficult to understand why using open source libraries and repositories as a dependency in a production environment can lead to critical issues.
Luckily, some level of code reusability and transferability can be provided through the growing adoption of microservice architectures and containerization, a fundamental practice in MLOps. Foundational models also reduce reliance on quickly hacked ML models and offer more robustness.
Let’s not forget that Machine Learning is fundamentally about experimentation. This has unfortunately led data scientists to adopt the dreadful habit of creating tens of different versions of their code, which translates into repositories with many (usually poorly labeled) branches and countless versions of Jupyter notebooks which no one – not even the author – could distinguish just hours later. That’s where model version control solutions can help dramatically.
Data scientists failing to adopt coding best practices isn’t the only source of ML Tech Debt; if it were the case, retraining our ML workforce or having dedicated software engineers rewrite the code before it goes to production might solve a big part of the issue.
There are in fact many challenges that are more specific to Machine Learning. The sophisticated workflows involved in most ML products is one of them. In practice, multiple components need to be orchestrated. Should one single component change, the whole system could crumble. With no standardized way to track both the consumers and providers of these components, establishing what other components could impact or be impacted by a modification is no trivial task. Disaster can strike ever more easily when components feed off of each other via complex feedback loops, for example when the predicted values from one system constitute input data for another. Imagine team A being in charge of generating pricing data which team B and C both consume and not being able to warn teams B and C of a sudden change in definition. Keeping all users and consumers of that data in sync is the core benefit of a Feature Store, a major piece of the MLOps puzzle.
Finally, the load on Machine Learning systems can be particularly prone to variations and unpredictable compared to other systems because they’re built on top of highly fluctuating real-time data streams. This is why Containerization and Autoscaling are so critical in production-level ML systems.
Live Data Dependencies
Not only do ML systems consume from volatile data sources: they often consume from many independent sources with different latencies, data modalities and formats. Thankfully, the fast-maturing DataOps field has been a key asset to MLOps engineers. That said, even new DataOps technology has long failed to ensure proper traceability of the data, and MLOps experts have had to address this major source of Tech Debt by developing Data Lineage solutions.
Another, less obvious problem is that of underutilized data dependencies. Having data pipelines connected to sources that are actually not used by a production system might not seem like a major issue, but it can still lead to unnecessary system latencies, risks of data leakage or weaken data security.
Fundamentally, ML systems are designed to process real-time data. This also means they’re meant to respond to changes to the real world. They are dynamic systems. Yet, a lot of their parameterization is set once and for all: for instance, it is not uncommon to find arbitrarily fixed thresholds in production-level ML products. When circumstances change (for example, due to seasonality patterns), those thresholds need re-adjusting. In the absence of tuning, they would constitute another type of ML Tech Debt. Seasonality can also lead to other potential problems, such as existing correlations that suddenly vanish because the distribution of the data suddenly diverges from the one of the data the model was trained on: this is commonly known as “data drift”. The practice of monitoring and addressing data drift is another subfield of MLOps called ML Observability. It plays a critical role in avoiding that models go stale and constitutes yet another way to prevent ML Tech Debt.
Training Data Management
We discussed real-time data inputs; let’s now cover the systemic issue of training data. More than 90% of ML systems in use nowadays leverage supervised learning. This implies most models are trained offline before they are deployed to production. The data used during that training process is decisive in the performance of the model. Use another training dataset, and your model might make completely different predictions; use a faulty one, and those predictions will be plain wrong. This implies that those datasets need to be traceable, so that the data scientist knows what actions to take to get things fixed if an issue arises in production. Model version control alone does not help as storing the model state does not require tracking the data by default. However, Data Version Control solutions do.
But proper data version management is only the top of the iceberg. Working with unstructured data (text, images, etc.) requires valid annotations which should be easy to generate, store and manage. As surprising as it might seem though, in 2023, the majority of labeling jobs are still managed through email instead of APIs. This means that multiple team members cannot collaborate easily, and that labels are usually not reusable across projects. Even finding out where to find the latest version of their labels is a challenge. Providing such traceability and transparency is what LabelOps is all about.
And with an increasing number of people relying on more complex labeling workflows (such as autolabeling, which is the process of using a pre-trained ML model to generate ground truth, or human-in-the-loop data labeling), things get even more complicated. For example, without proper real-time validation, autolabeling can dramatically pollute the training dataset and cause biases that eventually lead to erroneous results in production. In order to avoid this severe form of “Data Debt”, old-fashioned data labeling processes should be replaced by end-to-end DataPrepOps pipelines designed to cleanse training data of bad labels and even flush out harmful data in order to avoid issues to trickle down into the ML model.
Tech Debt is often perceived as the sole responsibility of technology teams, yet in practice, one of the most daunting types of Tech Debt is associated with Data Privacy or Data Security. Whenever a data scientist trains her model with a dataset, she needs some guarantee that the dataset itself or the underlying information that can be inferred from it is not subject to restrictions. In fact, she and her company need to ensure that the rights associated with the data won’t change down the line, and that neither will the laws that protect the dataset. Her company could be accused of breaking privacy laws very easily and have to pay a hefty price for that mistake. This is another way that Data Lineage can help.
But guaranteeing that one has the rights to use a dataset for the purpose of training an ML model is nowhere near sufficient. In recent months, the number of attacks against ML models has been on the rise, and the nascent fields of Adversarial Machine Learning and MLSecOps have fortunately followed suit. With more ML products on the market every week, hackers are figuring out how to reverse-engineer them in order to extract sensitive information from customers and organizations. More research will be required to identify or even prevent such bad players from operating on the market.
Last but not least, if the maker of an AI is to be held accountable for the decisions generated by the system, it is to be expected that companies will make more investments in Model Explainability. Explainability is already a popular topic in the FinTech and LegalTech spaces where, for example, rejected loan applications need to be justified to the applicant. Generally speaking, the impact of failing ML products is having such a large impact on society as a whole that ML Tech Debt is likely to become everyone’s business, and not only that of the engineers and product managers who launched the product in the first place.
We’ve seen throughout this article that ML Tech Debt was a complex topic, but also an integral part of Machine Learning. We even came to discuss how ML Tech Debt could really have an impact on users and should eventually involve decision makers and consumer advocates. The good news is that new and fast-maturing MLOps technology naturally developed in a way that the different aspects of ML Tech Debt were being taken care of. That, of course, is yet another very strong reason why you should adopt MLOps in your organization as soon as possible.
Who would have thought that a seemingly boring topic was actually one of the main drivers for some of the most exciting technologies in the world?