Skip to main content

Metaflow Highlights of 2021

Last year was an exciting year for Metaflow and we want to highlight some of the growth in both adoption and functionality.

Developed internally at Netflix and open-sourced in 2019, Metaflow is now used to power machine learning in production by hundreds of companies across industries from bioinformatics to real estate. We’ll highlight use cases at 23andMe,, and CNN. We’ll also check out how Latana uses Metaflow for brand tracking with Bayesian statistics and the Metaflow GUI we launched at the end of 2021. 

We were also thrilled to start Outerbounds in 2021, where we are laser-focused on building the modern, human-centric ML infrastructure stack. You can find out more about Outerbounds here.

Our team at Outerbounds also presented our work on DAG cards for machine learning workflows at Neurips in late 2021 with Jacopo Tagliabue and his team at Coveo. Find out how you can use DAG cards to integrate Pythonic visual reports into your ML pipelines here.

Let’s now jump into the Metaflow highlights from last year!

Developing safe and reliable ML products at 23andMe

23andMe has been using Metaflow for much of their machine learning, which includes polygenic modeling for reports on conditions such as Typ2 Diabetes, HDL Cholesterol, Severe Acne, and Gallstones. 

They identified many challenges in ML projects which are familiar to many businesses, such as the difficulties in testing ML systems, organizational and technical fragmentation, siloing, and the need for specialization in organizations, along with team independence (e.g. ML research versus ML engineering). They’re also working in a domain that is regulated and which also needs to take the privacy concerns of its stakeholders seriously.

For many of their projects, model, data, and pipeline lineage are key, which makes Metaflow a good fit for them. On top of this, they also want to be able to 

  • develop models locally and
  • be able to move seamlessly between local and remote environments, and
  • perform continuous improvements at a low marginal cost.

For them, Metaflow facilitates all of this, and helps them with debuggability as “all run parameters and model metrics are conveniently located alongside the run to aid in such debugging.”

They also identified the need for specialized domain knowledge and made clear how difficult it is to find a set of individuals with the right mix of expertise across engineering, genomics research to build these systems.

In our experience shipping ML products at 23andMe, it’s increasingly clear that both engineers and scientists developing models should be aware of the infrastructure and product context (failure modes, performance SLAs, compliance concerns, etc) in which the models are trained and deployed.

Improving Data Science Processes to Speed Innovation at, one of the nation’s most popular home search destinations with more than 100 million users visiting the site each month, use machine learning for

  • personalized searches,
  • recommendations,
  • forecasting housing trends,
  • helping consumers find the best agent to help them along their home shopping journey.

They began experimenting with Metaflow with the goal of accelerating their machine learning function. The idea was to accelerate the ML lifecycle:

Rapid prototype → deploy → feedback → iterate cycle

We enjoyed reading their description of the Metaflow project:

  • “The team created a standardized access framework for data from multiple cloud platforms that manifests itself as a Python Library that any data scientist can easily access.”
  • “The initiative created a standardized process, infrastructure, and tooling for deploying machine learning algorithms in a directed acyclic graph (DAG) such that they become automated, scalable, and simple.”

Most importantly, the team took months off the time to build a productionized machine learning model, which also had a clear impact on the speed of business.

They were also able to develop a repeatable, automated deployment methodology and toolingthat also coordinates well with the rest of the organization.

Accelerating ML within CNN

CNN, which averaged more than 200 million unique global visitors every month of 2020, was “able to test twice as many models in Q1 2021 as they did in all of 2020” due to their adoption of Metaflow!

Their huge audience requires both trusted and personalized news, and they want their data scientists to be focused on improving this experience, instead of constantly having to coordinate compute, clusters, and YAML. As there are always too many interesting ideas to explore, it’s key that the research to test and iterate time is as short as possible.

Their previous process was “not designed for lightweight experimentation, and because of that, the research iteration process was frustratingly slow.” What they like about Metaflow is 

The straightforward Python interface meant that it instantly felt familiar to our ML engineers. Additionally, the seamless integration with AWS Batch gave us straightforward and simple scalability options.


[Their] ML Ops engineers were able to implement a rich compute substrate that met the security/scalability requirements of CNN, and Metaflow was able to leverage that substrate using AWS Batch to scale up (and spin down) resources dynamically. This allowed [their] researchers to rapidly iterate with a small set of data locally.

On top of this, they were able to scale without rewriting their code!

They also loved that all runs are reproducible and that all code, data, and model artifacts are snapshotted to s3. An interesting side effect was that they could transition production training to this workflow also. It is wonderful to have them as part of the community and they even submitted PRs back to Metaflow – including Terraform support. In fact, CNN was instrumental in driving the Terraform support for Metaflow – their engineers delivered the entire support end-to-end.

“At this point, we’ve been able to entirely switch over our research experimentation process to rely on this Metaflow-powered workflow and with great success. Our researchers are able to spend more of their time improving their models for our audience, and our engineers are able to provide a rich compute substrate without having to manually manage clusters. As an informal estimate, our data science team believes they were able to test twice as many models in Q1 2021 as they did in all of 2020, with simple experiments that would have taken a week now taking half a day.”

Brand Tracking with Bayesian Statistics and AWS Batch

Bayesian inference is a wonderful part of statistics and one limitation is that its computational methods, such as MCMC and NUTS, can require a lot of compute, which is why it was exciting to see Metaflow and Batch used for parallelizing hundreds of models.

The use case of generative Bayesian models here is for consumer perception of brands, for example, answering questions such as “how many people from a specific demographic know a particular brand?”

The Bayesian framework here helps with 

  • small target groups (through the hierarchical groups),
  • distinguishing between signal and noise,
  • Poststratification to deal with the problem of biased samples.

To quote the authors of the blog post,

In each model, the same predictor variables are used (the demographic variables), and only the dependent variable changes (the question we predict). It thus makes sense to do all the data transformation once and only parallelize the models itself. Using AWS Batch, this means we would have one batch job to load and transform the data, then one job per model in parallelism, and a single job afterward to collect and combine the results. However, this also means we need to take care of data sharing between the different batch jobs. We quickly noticed that orchestrating the different batch jobs ourselves was difficult to maintain and was also taking away our time and resources that we’d rather spend on improving the statistical models.

They also found the following features useful:

  • Metaflow took care of saving and sharing data across the steps as well as handling the parallelization.
  • One of the advantages when working with Metaflow is that there was barely any difference between running the code locally and running the code on AWS Batch.
  • Using Metaflow ensured that the same piece of code was run, locally and on AWS Batch, and it also created the same conda environment for both environments. This helped tremendously with reproducibility and generally made development easier since they could be sure that what “works on their machine” will also work in the cloud and give the same results.
  • Switching to Metaflow with AWS Batch really allowed them to speed up their development process and focus on improving the statistical models.

Open-Sourcing a Monitoring GUI for Metaflow, Netflix’s ML Platform

2021 was a big year of growth for Metaflow inside Netflix as well. Internally, Netflix’s Metaflow deployment covers millions of runs and billions of artifacts produced. In a large-scale environment like this, it can be hard for an individual data scientist to know how their runs, and other runs they are interested in, are behaving. To address this pain point, already in 2020 Netflix had started developing a monitoring GUI for Metaflow, which was finally released in open-source in October 2021.

The monitoring GUI for Metaflow allows data scientists to 

  • monitor their workflows in real-time,
  • track experiments, and
  • see detailed logs and results for every executed task.

On top of this, the GUI can be extended with plugins, allowing the community to build integrations to other systems and embed custom visualizations via Metaflow Cards directly into its views.

When you open the GUI, you see an overview of all flows and runs, both current and historical, which you can group and filter in various ways:

Note that you can also use this view for experiment tracking!

There is also a timeline view, which is extremely useful in understanding performance bottlenecks, distribution of task runtimes, and finding failed tasks:

You can jump into tasks and the task view also, which includes “logs produced by a task, its results, and optionally links to other systems that are relevant to the task.”

If, for example, the task had deployed a model to a model serving platform, the view could include a link to a UI used for monitoring microservices.

Check out the blog post for more, or, if you can go ahead and deploy the GUI in your own environment.

Thanks for reading! We’d love to hear your thoughts on the ML stack and Metaflow, as well as learn about what you’re working on. The easiest way to chat is to join us and over 900 data scientists and engineers at Metaflow Community Slack 👋

Smarter machines, built by happier humans

The future will be powered by dynamic, data intensive systems - built by happy humans using tooling that gives them superpowers

Get started for free