Lucky for them, those tormented economic times coincided with much less gloomy news. After a long streak of AI Winters, the AI field was finally ready for some long-awaited progress, as hardware was getting up to par with the needs of ML research. Technology was suddenly not only capable of capturing/storing critical masses of data necessary to train sophisticated ML models, it also brought sufficient amounts of compute to process it.
Those who once had left their labs for the cozy corner offices of the Wall Street, had now seemed to have found a new home in Data Science and Machine Learning. It was a great time for anyone dotted with data wrangling skills and strong coding capabilities to show what they could do, as companies were suddenly showering those ambitious intellectuals with lavish compensation packages and rewarding problems to work on. All it took was some open source magic and a few new python libraries to put their modeling capabilities on steroids and see a whole new generation of algorithms for perception, search relevance, voice recognition, and countless other applications come into existence.
AI was ready to take over the world by force. Investors started pouring money into AI, and some thought-leaders declared we were merely years away from Artificial General Intelligence. Everything was for the best, except for one tiny little issue: see, no matter how impressive, all those groundbreaking algorithms were stuck on the laptops of their genius inventors, with no way for them to make it to production.
How good, will you ask, is a model that cannot be deployed and used to power the best search engine of all time? The answer is obvious: from a business perspective, it is worth nothing. Crazy as it sounds, the executives who had spent millions of dollars and countless resources into those ML initiatives, were about to pull the plug on AI research once more, sending us all straight into yet another AI Winter hell. That’s when a lightbulb went up for a few selected entrepreneurs. Since the problem wasn’t the research but the process, why not sell the deployment process as a service?
The realization that building models was actually a very different issue than shipping them to production led to a burst of a new category in the ML space, MLOps. Organizations needed help getting the benefits of the work of their data scientists, and those new MLOps companies intended to provide just that by making the operational part of Machine Learning a breeze. By 2018, some creative entrepreneurs had already invented tools and workflows capable of fully automating model deployment.
But history didn’t stop here, and soon, tens of tools and platforms (both open source and enterprise) popped up to make every aspect of the model development effortless, from hyperparameter tuning to monitoring of training process, and within years, it didn’t take a PhD in Math or Computer Science to do Machine Learning anymore. Today, with tools offering Bayesian hyperparameter tuning at one’s fingertips, it only takes data scientists the ability to learn a new library to optimally tune hyperparameters when training a deep neural network.
Does this imply data scientists are being automated away? Well, It is certainly too early to tell if the data science profession is obsolete, but the space is definitely changing. No need to worry about the math wizards who were once building ML models everywhere, they have now moved on to writing libraries, assisting others in creating new ML products. And with ever more powerful AutoML technology to empower non-experts to solve ML-worthy problems, that might be for the best.
With all this being said, what is to be expected of the MLOps space? For one, after a decade of frenzy and abundant VC funding, the space is indisputably overdue for a consolidation as practitioners are demanding end-to-end solutions to meet all of their modeling needs. Today, the enterprise will not buy model monitoring solutions that do not also allow them to tune their models or get them deployed. MLOps companies that were considered an industry standard a few short years ago, are now struggling to stay ahead of the competition as more ML brands keep popping up every now & then! They have had to extend their offerings, or make it ever more simple to integrate with others whom they once considered their competitors.
And yet, even though building a Machine Learning model has never been easier, the days of AutoML are still in a very distant future. For a simple yet overlooked reason, we still can’t manage our training data properly. Full model automation fails to resolve the #1 issue of all data scientists: what data should they use, and how should they prepare it? If the data management field has significantly matured over the past decade, it has not succeeded in addressing how to easily convert raw, flawed data into high-quality datasets. After years of trying, even the most renowned data labeling companies are still falling short of fulfilling the needs of the ML industry. They’re still requiring data experts to manually flush out their training data mistakes, and managing their labels mostly by hand.
This is how, just a few weeks away from 2023 and in the midst of yet another crisis that has hit the Tech field, companies are (once again) waiting for a revolution to stop them from hitting a wall. With labor shortages hitting all industries, no one is willing to give up on the fundamental promise of Machine Learning, automation. At the same time, the manual transformation of raw data into ML-ready data is preventing the adoption of Machine Learning at a global scale. The solution is as simple to fathom as it is hard to implement, and this is where the brainpower that once made it possible to deploy ML models to production, now needs to redirect its attention. And as the ML field slowly but surely-closes the chapter of MLOps, it opens up on the new chapter of DataPrepOps – and we can’t wait for you to join us in writing it.