MLOps vs DevOps: the Difference Data Makes

We recently wrote an essay for O’Reilly Radar about the machine learning deployment stack and why data makes software different. The questions we sought to answer were:

Why does ML need special treatment in the first place? Can’t we just fold it into existing DevOps best practices?
What does a modern technology stack for streamlined ML processes look like?
How can you start applying the stack in practice today?

Our real goal was to begin a conversation around how to build a foundation of a canonical infrastructure stack, along with best practices, for developing and deploying data-intensive software applications. As this needs to be an ongoing conversation, we thought to write more on each of the points above. In this post, we’ll answer the first question:

Why does ML need special treatment in the first place?

In doing so, it will become clear why can’t just fold machine learning into existing DevOps best practices.

On the need for MLOps

The key feature of ML-powered software is that it is directly exposed to real-world data and the complexity and messiness inherent in it.

All machine-learning projects and data-intensive applications are software projects, often consisting of repositories of Python code under the hood and involving the operation of containers and operational dashboards in production, not dissimilar to most other software services. So why don’t we treat machine learning-powered software as business-as-usual software engineering projects, educating ML practitioners about standard best practices?

To answer this question, let’s tease apart the key differences by first considering the job of a non-ML engineer and then jumping into ML-powered applications:

First, note that traditional software engineering deals with well-defined and narrowly-scoped inputs, and the engineer can cleanly and comprehensively model these in the code. In practice, the software engineer defines the world and frame in which the software operates.
In stark contrast to this, perhaps the most important feature of ML-powered and data-intensive software applications is that they are, by definition, directly exposed to real-world data and the complexity and messiness inherent in it. Here, you are just not able to cleverly architect your code by hand and are forced to work with less transparent and data-driven optimization processes. The trade-off here is that you are no longer able to completely control and understand the system but you can build software that would have been impossible, even five years ago.

Inputs and outputs for ML applications and traditional applications.

From data-centric programming to production ML pipelines

This quality makes ML-powered software inherently different from traditional applications.

Our hypothesis is that it is this quality that makes ML-powered software inherently different from traditional applications. Although this is relatively straightforward to state, it has wide-ranging implications with respect to who is best positioned to develop such software and how they should be doing so:

ML applications are directly exposed to the constantly changing real world through data, in contrast to traditional software, which operates in a simplified, static, abstract world directly constructed by the engineer.
ML apps have to be developed through cycles of experimentation: as a result of the constant exposure to data, we cannot learn the behavior of ML apps through logical reasoning but through empirical observation. In contrast to traditional software, we can’t predict the behavior of even the simplest prototypes without testing the ideas in practice.
The background and the skillset of people building the applications are realigned: although it is still effective to express software in code, the emphasis shifts to data and experimentation, far closer to empirical science than traditional software engineering.

Although this is a paradigm shift from mainstream software engineering, note that it is by no means a new approach. This is science, my friend! More than this, we have a long tradition of what is becoming known as data-centric programming: researchers, scientists, and developers who have been using data-centric IDEs, such as RStudio, Matlab, Jupyter Notebooks, or even Excel to model complex real-world phenomena, should find this paradigm familiar. In fact, this rich history and practice go back at least as far as Mathematica notebooks! As has become apparent, however, such tools are wonderful for exploration, experimentation, and prototyping but can be found to lack across several dimensions when it comes to working with models in production.

Machine learning software needs to be production-ready

It must adhere to the same set of standards as traditional software.

As ML-powered applications become more commonplace and progress from R&D to front-line production, however, you also need to deploy them in conjunction with more traditional forms of software. Ideally, your data-centric software can be written bottom up so that it can be incorporated into production workflows both ergonomically and seamlessly. To be clear, the entire industry is currently rethinking the now-antiquated approach which sees data scientists using isolated environments to prototype models that need to be reimplemented for production because (1) it is too slow and (2) it blocks iterative development as production deployments are a one-way street.

This raises a key point: we need to make ML software production-ready from the get-goand, to achieve this, it must adhere to the same set of standards as traditional software, leading to another set of requirements:

The scale of operations is often significantly larger than in the data-centric, exploration-driven notebook-style environments. Data is often larger, but so are models, particularly deep learning models.
ML applications must be carefully orchestrated: with the significant rise in the complexity of apps, which often require dozens of interconnected steps, developers need better paradigms for software development and deployment, such as first-class DAGs.
We need robust versioning for data, models, code, along with the internal state of applications — think Git on steroids to answer inevitable questions: Why did something break? What changed? Who did what and when? How do two iterations compare?
The applications must be integrated into the surrounding business systems so that you can test and validate ideas in a controlled manner in the real world.

Looking at these two lists, notice that they describe the convergence of two macro trends, (1) the practice of data-centric programming and the value it delivers, along with (2) the requirements of large-scale, production software for businesses. Each of these alone is inadequate for the challenge at hand: you wouldn’t want to build modern ML-powered software in Excel, nor would it be much fun acting as though ML applications can be built with standard software toolchains.

Wrapping up

These are the aspects of software engineering and the stack that need to shift when moving to a world populated with ML-powered software. In future posts, we’ll begin to answer the remaining foundational questions:

What does a modern technology stack for streamlined ML processes look like?
How can you start applying the stack in practice today?

Thanks for reading! If you’d like to get in touch or discuss such issues, come say hi on our community Slack! 👋

On the need for MLOps​

From data-centric programming to production ML pipelines​

Machine learning software needs to be production-ready​

Wrapping up​

Smarter machines, built by happier humans

On the need for MLOps

From data-centric programming to production ML pipelines

Machine learning software needs to be production-ready

Wrapping up