Kubernetes (K8s) presents somewhat of a paradox to data scientists and data science teams: they shouldn’t need to know lower level infrastructural tools and yet machine learning applications need to play nicely with production infrastructure, which includes Kubernetes in a growing number of organizations. This is why we’re so excited to integrate Kubernetes to Metaflow: With Metaflow, data scientists can leverage K8s clusters for their work without having to know K8s. Metaflow presents a user-friendly UX to data scientists while working nicely with production infrastructure, which engineers can appreciate. Read on to discover more about “Metaflow as the data scientist’s interface to Kubernetes” and to give the beta version a spin.
A Healthy Obsession with Productivity in Data Science
At Outerbounds, we are obsessed with data scientist productivity, and making sure that first-class machine learning infrastructure is accessible to everyone and delightful to work with. Our mission is to remove all possible infrastructure-related friction in data scientists’ day-to-day work and make sure they can experiment with ideas quickly and deploy to production confidently. For the past four years, we have been developing an open-source framework, Metaflow, to make this a reality for most organizations.
Oleg Avdeev, co-founder of Outerbounds, talking about Metaflow’s journey to K8s.
The Infrastructure Paradox: Why Kubernetes? And why not?
Data scientists should not need to know lower level infrastructural tools such as Kubernetes but Kubernetes is a key part of the “modern production infrastructure” 🤔
Today, in 2022, there is no question that Kubernetes (often abbreviated K8s) is eating the world of container orchestration. You can even argue that it is approaching the “plateau of productivity” of the hype cycle, where the tooling is getting mature enough and the K8s skills are (almost) widespread enough among DevOps engineers. Betting on K8s in your org is no longer the choice of only the brave and technically forward-thinking organizations that are ready to deal with the rough edges. These days, thanks to packaged offerings like EKS on AWS, any skilled platform team can make it work without paying too much of an early adopter penalty.
What’s so good about it, and what’s its unique promise for infrastructure teams? There are many answers to this question, but here’s one take. It is not so much about the software implementation itself but having :
- A set of standard APIs to manage containerized infrastructure
- A (mostly) standardized set of tooling and processes to deploy and evolve this infrastructure using best practices of infrastructure-as-code
In other words, using Kubernetes, your friendly ops team can assemble a tailored, cohesive “platform” from the building blocks and operate it using industry-standard practices. The combination of the blocks is unique to your business needs, but the blocks themselves and tooling around them aren’t.
But here’s a paradox: data scientists should not need to know lower-level infrastructural tools such as Kubernetes, as Chip Huyen has written. However, to make it easy to move machine learning projects from prototype, iteration, and experiment to production, machine learning applications need to play nicely with production infrastructure, which very much includes Kubernetes in a growing number of organizations. In a word, machine learning and MLOps are not islands. To state the paradox another way, if you’re the leader of a data science team, you can ignore K8s…. but you also can’t.
How do we fold existing DevOps practices into modern ML data-centric and MLOps workflows? This is why we’re so excited to integrate Kubernetes to Metaflow: With Metaflow, data scientists can leverage K8s clusters for their work without having to know K8s. Metaflow presents a user-friendly UX to data scientists while working nicely with production infrastructure, which engineers can appreciate. You need both Luigi and Mario to complete this level but you don’t need to play as both by yourself!
Metaflow as the data scientist’s interface to Kubernetes
The key to successful organizational adoption is making sure that the K8s-based platform is easy to use, and that everyone on your data science team can fully benefit from it without learning the nitty-gritty details about Kubernetes.
The first step is to enable data science and machine learning practitioners to use K8s clusters as pools of compute resources. Data science is an inherently compute-heavy activity and we can use K8s as an engine to power it. Metaflow makes it easy to start by running everything locally but when you need to scale, it allows you to smoothly move parts of your workflows to K8s by adding a single line of code.
By running a compute layer in K8s, the platform engineer gets K8s-native infrastructure with all the industry-standard management and observability tools. And as a data scientist, you can keep using all the cool features that Metaflow provides out of the box, such as
- Human-friendly, idiomatic Python API
- Vertical and horizontal scalability
- Automated tracking of experiments, models, and other artifacts
- Built-in dependency management for machine learning libraries.
Try it out and give us feedback
The beta version of the Kubernetes integration is available in Metaflow today. The feature is not yet documented, as we want to ensure that the user experience it provides meets our high bar. If you would like to give it a try, join our Slack and shoot a note on the #dev-metaflow channel. We want to make sure the integration plays nicely with different K8s-native observability tools, access control tooling, autoscalers, job queues, and GPUs. We don’t want to underestimate the effort it takes to sort out all these paper cuts. If you have war stories, feedback, or requirements you want to share related to the K8s ecosystem, please reach out to us! Also, if the idea of working in the intersection of K8s, modern cloud tooling, and machine learning excites you, we are hiring!