Table of Contents

Why ML/AI Developers and Platform Teams Choose Metaflow

December 10, 2025

As 2026 approaches, Metaflow is more widely adopted than ever. This article outlines Metaflow’s design philosophy and walks through its core features that you can use to assemble your own tailored ML/AI platform. If you are in a hurry, scan a list of new features marked with ✨ below, as well as the Outerbounds-exclusive capabilities marked with ⭕.

‍

The latest Technology Radar report from the Cloud Native Computing Foundation — the home of Kubernetes and other core cloud infrastructure — evaluated leading open-source frameworks for ML and AI orchestration. Metaflow ranked at the top in every category surveyed:

The report, which you can read here, concludes the findings: More than half of respondents familiar with Metaflow (51%) are highly likely to recommend it, with a further 35% likely to recommend it.

The positive sentiment echoes an old truth in product building: when you put practitioners’ real needs first, everything else aligns. From the very first open-source release of Metaflow in 2019, we’ve prioritized the real, everyday needs of ML and AI practitioners, instead of proposing elaborate abstractions or chasing trends.

Across releases, we’ve kept the Metaflow APIs intentionally compact, coherent, and human-friendly — pushing back against the accidental complexity and API sprawl that tend to creep into frameworks like this. So whenever an external review makes a note along the lines of “Metaflow is accessible to data science teams exploring MLOps for the first time,” it signals that we are on the right track.

Easy != Simple != Simplistic

Because of its apparent simplicity, one might mistakenly conclude that Metaflow is an entry-level framework, unsuitable for the needs of demanding projects in large organizations. Obviously this is not true, as witnessed by the numerous sophisticated real-world ML/AI systems powered by Metaflow.

Our philosophy is to go at great lengths to avoid accidental complexity, often introduced by unnecessary abstractions, while respecting the fact that real-world systems come with some amount of inherent complexity. In other words, Metaflow wants to be simple but not simplistic.

When it comes to software infrastructure, this form of simplicity is the ultimate form of sophistication. This becomes especially clear when you reflect on what the words actually mean (inspired by this clear-sighted essay):

Simple is the opposite of complex - a simple system can be understood by an individual without onerous effort and with a high likelihood of that understanding being correct.
Easy is the opposite of difficult - an easy task can be accomplished by an individual without onerous effort and with a high likelihood of the outcome being correct.‍
Simplistic is the opposite of advanced and practical - a simplistic approach treats complex issues and problems as if they were much simpler than they really are, leading to impractical solutions.

If you are a platform engineer, Metaflow provides simple, composable constructs for expressing the abstractions and policies your business domain requires. The resulting platform won’t be simplistic - it will contain exactly the level of complexity the problem demands, yet for end users, working with it remains easy and straightforward.

Or, if you don’t need anything special, you can simply use Metaflow out of the box — minimal assembly required.

Building Blocks for Your ML/AI Platform

What are the building blocks offered by Metaflow in practice?

First, it is helpful to clarify Metaflow’s role in ML/AI projects. The prefix meta- denotes something that sits behind, after, or beyond (the project predates the renaming of a famous company). Metaflow supports the massive Python ecosystem of ML/AI libraries, including PyTorch, JAX, Ray, vLLM, XGBoost, and newer frameworks like LangChain, focusing on the layers that fall outside their scope. Its purpose is to make it straightforward to combine these libraries with your models, data, and code into production-grade systems.

So what falls outside the scope of substantial frameworks like PyTorch? Critically, the questions of how you develop, deploy, and operate systems that rely on compositions of these libraries. Addressing these concerns effectively requires components that lie beyond the frameworks themselves — the realm of Metaflow and Outerbounds — as illustrated in the diagram below:

In this diagram, the code defining your ML/AI project resides in the green box on the left, with libraries like PyTorch handled by the dependency management component, surrounded by your project-specific patterns and policies.

Feature Walkthough

Let’s walk through the boxes and the corresponding features in Metaflow — and, by extension, Outerbounds — to illustrate how they can be combined to tailor a platform that fits your needs. If you are already familiar with Metaflow, feel free to focus on the new features marked with ✨, released over the past twelve months. If you are curious about the features that Outerbounds offers on top of Metaflow, take a look at the features marked with ⭕.

While the feature list is extensive, you only need to focus on the elements relevant to your projects. Each feature is intentionally designed for simplicity and perfect composability, allowing you to easily mix and match them as needed. Crucially, all core Metaflow features are subject to the project’s backwards compatibility guarantee — a guarantee that has held steady since the inception of the project — ensuring that you are building your projects and platforms on a rock-solid, battle-hardened foundation.

Development Experience

A top-notch developer experience is where Metaflow shines - it’s the part that makes the platform easy to use. For a concise overview, take a look at an article Netflix published about the topic recently.

Metaflow provides a simple API for constructing workflows, with recently added support for recursive and conditional steps ✨.
You can develop and test code locally in rapid, incremental iterations using the new spin feature ✨, using either notebooks or IDEs - both supported natively. On Outerbounds, you can develop code using cloud workstations based on VSCode ⭕, securely running on your cloud account.
One of the killer features of Metaflow is its ability to resume execution from any past results, now with explicit support for checkpointing long-running tasks ✨ like model training or fine-tuning.

Project-Specific Patterns

Every project and organization has different needs. It would be overly simplistic to assume a perfect, one-size-fits-all experience. Metaflow addresses this by allowing platform and project teams to layer their own abstractions on top of (and around) Metaflow, customizing it for their specific use cases and business environment.

The key tool for implementing project-specific patterns are custom decorators and mutators ✨, which allows for creative customization and extension of your flows.
When you want to implement policies and functionality that are automatically applied to all flows in a project, say, domain-specific data access, the BaseFlow pattern ✨ comes in handy. Outerbounds comes with a paved path project structure ⭕ that helps you to leverage the approach easily.

Stay tuned for even more features in 2026 which will make Metaflow even more customizable and versatile for addressing project- and domain-specific needs.

Configuration Management

Depending on the needs of your project, you can choose the right balance between code and configuration. Sometimes experiments focus on altering configuration (even automatically!) while keeping the code intact, while in other cases most changes are made in the code itself.

The configuration management subsystem ✨ allows you to load, parse, and even generate configurations easily.
Hyperparameter optimization ⭕ is a special case of configuration management, which is neatly supported though frameworks like Optuna.

Dependency Management

As noted above, Metaflow's purpose is to make it easy for you to leverage the vast ecosystem of Python libraries to build production-grade ML/AI systems. To make this possible, Metaflow comes with robust functionality for managing software dependencies, packaging code, and creating stable, reproducible, and secure runtime environments for your code. In fact, about 10% of the entire Metaflow codebase is dedicated to dependency management alone.

Metaflow creates and manages reproducible runtime environments from packages installed via PyPI and Conda, now with native support for uv ✨
In seconds, Outerbounds converts these environments automatically into optimized Docker images, thanks to Fast Bakery ⭕

Observability, Lineage, and Asset Tracking

Metaflow offers a level of depth in artifact tracking and observability unmatched by other frameworks. Instead of relying solely on explicitly registered artifacts, It automatically tracks the complete state of the entire workflow in a way that scales. As a case in point, Netflix has over 12PB of artifacts managed by Metaflow.

The backbone of Metaflow’s state management and observability is the automatic snapshotting of artifacts at the end of each step, which you can observe in real-time using the Client API and through the Metaflow UI.
Besides logs, you can output rich, customizable reports in real-time using Cards.
Should you need a model registry and additional visibility into data lineage, you can promote certain artifacts as Assets, visible right in the Outerbounds UI ⭕.

2026 will bring even more flexibility and capabilities in artifact serialization and management.

GitOps & CI/CD

We believe that software engineering best practices — think robust development environments, GitOps, and CI/CD — matter more in 2026 than ever, especially with the rise of AI co-pilots. The ability to iterate rapidly on a new experiment (with or without AI assistance) modifying any part of the system, and deploy the result as an isolated branch with a single click is extremely powerful.

While data scientists and other domain experts once lived mostly inside notebooks, today full-fledged Python software development is within reach for everyone. The catch is that your tooling must support this workflow natively. In particular, it needs to make isolated branches effortless, so anyone — AI agents, novice developers, or fast-moving experts — can experiment safely.

Metaflow provides a whole stack of built-in features to address the need:

All executions (experiments and production) are automatically organized in namespaces, so you can safely develop on your own swimlane.
You can deploy multiple versions (branches) of your project to run in parallel, including sophisticated projects spanning multiple flows and deployments.
Outerbounds elevates these features to be a first-class concern with a native CI/CD integration, providing a higher-level project concept covering all elements of the system: experiments, online and offline deployments, and assets ⭕.
When it comes to automated testing, the new spin feature enables quick unit testing ✨, and the new Dev Stack allows you to conduct end-to-end integration testing in an isolated environment ✨.

Production Deployments

It is hard to prove the value of AI and ML projects without contact with reality. Making it easy to deploy projects into serious production settings quickly was the main reason Metaflow was built at Netflix in the first place.

There isn’t a single way to deploy an AI/ML project. Sometimes it runs entirely offline as a batch process, sometimes fully online with real-time inference. In many real-world cases, it ends up being an elaborate hybrid of the two modalities. Metaflow comes in handy in all of these scenarios.

Offline

You can deploy a Metaflow project to be executed in a highly available manner on a production-grade workflow orchestrator, such as Argo Workflows or AWS Step Functions. Or, thanks to Metaflow Extensions, you can even bring your own, like Netflix did with Maestro.
Besides the happy path where everything works, Metaflow provides plenty of functionality for dealing with failures, so you can take appropriate action ✨ whether the workload succeeds or fails.
If you have existing pipelines running on Airflow or on Kubeflow (a brand new feature!) ✨, you can target those systems with Metaflow as well, so you can start developing Metaflow projects side by side with your existing pipelines.
The deployments are not islands - they can be connected to the outside world and to each other through real-time events.

Metaflow can provide a robust foundation for production-grade agents ✨, leveraging your favorite agent frameworks.
And, it can help orchestrate massive-scale, optimized batch-style autonomous inference use cases leveraging state-of-the-art GenAI models ⭕.

Online

Should you need online, real-time inference deployments for traditional ML or GenAI, Outerbounds provides a secure and scalable platform for them, seamlessly integrated with Metaflow ⭕.
Another common real-time use case are internal apps, dashboards, and specific tools such as Tensorboard, Optuna, LiteLLM, which you can leverage securely without your Outerbounds deployment ⭕.

There are a number of new features in the works addressing real-time use cases in particular, so stay tuned for exciting releases in 2026.

Flexible Compute

If you asked what sets ML/AI use cases apart from other software or data engineering systems, the answer is simple: their demand for compute.

Since 2023, the generational boom in AI investment has focused mostly on expanding the availability of compute - GPUs in particular. This development is a huge boon for builders. The supply of compute is scaling by orders of magnitude, while the cost is decreasing rapidly.

Metaflow lets you harness all this capacity, both existing and forthcoming. As Metaflow is an ML/AI-native framework by design, it is no surprise that roughly 25% of its codebase is devoted to orchestrating cloud compute. This foundation enables all the features outlined above, from experimentation to production deployments.

You can deploy Metaflow on your AWS, GCP, or Azure accounts with provided deployment templates, or run it on-prem. Or, you can get the whole platform as a managed service with Outerbounds, securely deployed in your account in 15 minutes ⭕.
Metaflow makes it easy to scale workloads vertically, providing access to large compute instances, which is especially relevant today with the booming availability of massively powerful GPUs.
You can parallelize workloads easily to span even tens of thousands of servers.
Large compute goes wasted unless you have a fast IO path, that is, a way to load and store data quickly. Metaflow comes with a built-in, throughput-optimized S3 client to address the need which allows you to access data at the maximum speed.
You can manage large-scale data and models through distributed computing, natively supporting frameworks like PyTorch Distributed and Ray. For serious workloads, Outerbounds provides further battle-hardening and support for distributed computing ⭕.
You can leverage compute capacity across GPU providers, including new GPU clouds such as Nebius and CoreWeave, which Outerbounds connects into a unified compute layer ⭕. Or, you can leverage your existing in-house HPC infrastructure using Slurm ✨.
The most enticing part of this is the ability to access compute resources at the lowest cost directly in your own accounts, eliminating middlemen and with full visibility into incurred costs ⭕.

Security, Governance, and Integrations

Finally, none of the features above matter if you cannot use them within your real business environment. As exciting as AI is, it doesn’t erase existing security, compliance, or governance requirements — if anything, it makes them even more important.

Both Outerbounds and Metaflow deploy in your own cloud environment, so all data and processing resides within your governance boundary.
Metaflow is battle-tested and audited by organizations serious about security, including Netflix, Dell, DoorDash, GE Healthcare, and Goldman Sachs, establishing a strong foundation of trust. This commitment to security and compliance is further emphasized by the SOC2 and HIPAA compliance offered by Outerbounds ⭕.
You can implement your own policies on top of Metaflow, leveraging your existing security policies, secret managers, and security tooling.
When it comes to secure data access, Metaflow runs seamlessly side by side with data warehouses and data lakes such as Databricks and Snowflake - or even inside your data warehouse for extra governance guarantees ⭕.
You are able to authenticate and authorize all access with SSO/SAML integration, Role Base Access Control (RBAC), and isolate environments with perimeters ⭕.
If you are a sophisticated organization with existing infrastructure and special needs, you can use Metaflow Extensions to integrate your existing data, compute, and orchestration systems into Metaflow seamlessly, following the footsteps of Netflix.

Building AI/ML systems in 2026 and beyond

While the list of features may feel overwhelming, the good news is that you can only focus on the parts that are relevant to you today, and add new functionality and policies only when necessary. Metaflow makes simple things simple, and complex things possible - and often surprisingly easy too!

Regardless of whether you are developing a fraud detection system with supervised ML, computer vision, document understanding with LLMs, or autonomous AI agents, the basic needs are common:

As a developer, you need to focus on the trifecta of code, data (or context), and models — the fundamental building blocks of any AI/ML system. These systems are built iteratively, so strong development and experimentation hygiene through GitOps and DevOps is a must. None of this is possible without a solid infrastructure foundation that provides secure access to data, ample cost-efficient compute, and adherence to all business requirements.

While the stack is common, the differentiation lies in the details. You can tailor the environment to your specific needs to maximize your development velocity. This is the promise of Metaflow: You get a set of battle-tested, simple building blocks that cover the entire stack which you may assemble and customize to fit your needs.

We've developed these building blocks over the past six years through constant collaboration with developers and platform teams building real-world ML and AI systems. The work continues, now just with a larger community. Whether you're just getting started or already operating a billion-dollar AI platform, we'd love to hear from you!

Join the Metaflow Slack community, ask anything, and start building with stable infra.