Skip to main content

Case Study: How uses Metaflow

We recently sat down with Russell Brooks, Principal Machine Learning Engineer at, to discuss how his team uses Metaflow, among many other great tools, and how it has impacted machine learning more generally at

What type of ML questions do you work on at and what type of business questions do they solve? is one of the nation’s most popular home search destinations, with more than 100 million users visiting the site each month and hundreds of thousands of property listings. There are tons of ML use cases and they cover the gamut of modeling techniques, including:

  • A consumer-facing website that people might be familiar with, which has many common website optimizations such as recommendation systems, search ranking, consumer segmentation, and models for personalization to help people find relevant homes for their specific needs.

  • A variety of image and NLP models to enrich property content.

  • Forecasting for housing trends and strategic planning, sales/pricing optimization models, and surrogate experimentation metrics for AB testing and dashboards.

  • Models to help consumers find the best matching real estate professionals based on their needs.

Everything from deep learning models, Kalman Filters, Bayesian probabilistic models, and every flavor of boosted trees you could hope to encounter.

Each day these ML models output billions of predictions that get used across the business – all in service of empowering consumers to feel confident in their home buying or rental journey, alongside providing real estate professionals with tools to operate more effectively.

What did your ML stack look like before you adopted Metaflow and what pain points did you encounter?

There was a mixed bag of AWS services, largely stitched together manually. Back when Metaflow was first open-sourced in late 2019, we were using many of the same AWS services that Metaflow supports like Batch, Step Functions, and S3 as core components of our ML pipelines.

It was quite tedious to maintain as there weren’t many good infra-as-code options out there, especially that wasn’t a pain from a typical ML workflow perspective. If I have to write more YAML/JSON than Python, something has gone wrong!

Creating the state machines was a pain, and back then there wasn’t even baked-in support for Batch jobs as standalone steps, so you had to create Lambda functions to submit the Batch jobs, poll for job status changes, and orchestrate those cyclical job dependencies in the state machine yourself.

At that time, we were also creating the Batch job definitions separately for each component of a workflow, each using its own Docker image with the source files and Python environment built into it. Having to rebuild/sync docker images to ECR during dev to propagate changes onto Batch jobs adds friction.

However, that was seen as easier from a dev workflow perspective than relying on git to ship code / dev branches onto the remote jobs (Metaflow’s code artifacts are an even better dev experience!). Another common pain point was state transfer and sharing of data between steps. We had some tools for dataset abstractions that would also handle reading/writing to a versioned S3 bucket to allow for run-over-run comparisons, but even after a couple of iterations it still felt a bit lacking from a developer experience.

Choices that have aged well

All that said, some early decisions of the tech stack have aged well over time, like focusing on cloud-first development, creating a separate AWS account for our ML/DS org, and creating clear handoff rules of engagement with our engineering teams for batch/real-time model deployments. We have also always emphasized simplicity, e.g. vertically scaling before refactoring for distributed approaches.

This was back in 2017-2018 as much of the Hadoop/Spark hype was starting to taper off in favor of managed offerings like Redshift/Snowflake for a data platform. Between offloading large dataset preprocessing onto those tools and utilizing very large EC2 instances, we were often able to accommodate our most demanding workloads on a single node.

For cloud-first development at that time, we provisioned each ML/DS individual with their own EC2 instance as their “laptop in the cloud” and we had a shared docker image for a consistent toolchain with a preconfigured python environment and tools to query various databases. VSCode/PyCharm support for remote containers was still in its infancy back then, so we had tools to SSH and do port forwarding to those development boxes, and most people used JupyterLab as their remote IDE.

Model deployments

For model deployments at that time (and largely still the case now), we try to utilize batch deployments as often as possible, which usually involves the ML workflow outputting data which then gets ingested into a DB to be accessed later as needed by applications, dashboards, etc. These batch deployments cover ~95% of our ML use cases.

We’ve always worked to empower our ML/DS teams to be able to own batch deployments entirely end-to-end. In addition to often being a simpler deployment, with fewer teams involved, and fewer potential points of failure, they’re also often the best performance at runtime since the model outputs are effectively cached in a DB and ready to be queried by the apps/services.

For real-time model deployments, it can vary a bit case by case, but generally, that’s a handoff to our engineering teams who own that particular service, e.g. pickle to serialize a trained model artifact which can be loaded onto a flask web server for inference as needed.

What questions do you think about answering when adopting new ML technologies and how do you make decisions about what to adopt?

There’s a myriad of tools and services that exist, and for any given task, there are often many totally viable options to choose from, so it’s almost less about being functionally correct and more about having good taste. Good taste can also depend on many circumstances of your team, business maturity, and existing tech debt business investments.

Often though, I think it’s about making decisions that let you focus your energy on solving your actual business problems, rather than debugging infrastructure (e.g. leaning more heavily on managed services rather than self-hosting) or recreating some functionality that could be have been solved using open source solutions.

Beyond the core questions around what capabilities are being provided and what breadth the scope of any potential benefits might be – some additional considerations are:

  • Developer experience
    • What’s the learning curve like? How good is the documentation and is it accessible in less than 3 clicks? Does the tool have a janky DSL or a custom flavor of SQL?
    • Is there a community (e.g. slack, Stack Overflow) around the tool? A big chunk of software dev is googling and finding posts from other people who have solved similar challenges. There can be feedback effects that spur more growth from community momentum that continuously makes the dev experience even better.
    • Is the tool self-service? Can I grab an API key and spin up a prototype in an afternoon, without having to schedule a meeting with a salesperson or having to sit through a demo with someone who may not be able to answer technical questions?
  • Open source / hosted offering / cost
    • Open source can be considered a type of freemium model for many tools, e.g. feel free to run it yourself and once you’re hooked you can pay for our super powerful cloud-hosted solution.
    • If it’s a closed-source or a paid offering, is there a free tier to play around with, without having to rope in procurement teams and deal with MNDAs/supporting contracts?
    • If it’s self-hosted, how much of a pain is it to set up and maintain?

Why did you choose Metaflow?

Given all the pain points previously mentioned, when I saw the AWS ReInvent demo for Metaflow in 2019 it piqued my interest and as a potential new tool, I took it for a spin to get hands-on experience with it and provisioned the stack alongside our AWS infra pretty much right after.

Initially, I was excited just for the Pythonic abstractions to wire up those AWS services (the AWS CDK was pretty terrible at that time), much less all the other benefits from Metaflow like versioning and tracking metadata, providing clean S3 interfaces along with state transfer within workflows, and reducing the gap between dev and production.

There were some other tools coming out around the same time like Prefect that looked promising, and some other teams across the company were using Airflow, Redis Queues, and other flavors of job frameworks – all in all, Metaflow was a great fit with our existing ML infrastructure, is easy to use and maintain, and has great ergonomics with clean interfaces to track/share workflows and their artifacts.

What does your stack look like now, including Metaflow?

Right out of the gate, Metaflow provided much cleaner abstractions for many of the core AWS services that our ML pipelines relied on. Within a few weeks of that initial proof of concept, we had already refactored significant portions of our ML codebase to utilize Metaflow instead of our legacy utilities. As the saying goes, “Write code that is easy to delete, not easy to extend”, and it’s always satisfying to clean up and remove your old code in favor of better packages.

Beyond the myriad of AWS services, we’ve consolidated onto Snowflake as a data platform, dbt is used for many business-layer abstractions within our data models, Fivetran is used to replicate and pipe data between Snowflake and other application databases (e.g. Postgres, DynamoDB) and various 3rd party integrations.

Cloud-native development has really taken off in the past few years, with tools like VSCode remote containers making significant advancements. We still provision each DS/ML individual with their own EC2 instance as their “laptop in the cloud” and that’s continued to improve over time.

We have a small local Python environment for basic connectivity tools and helper utilities that everyone uses on their laptop, for things like Okta SSO authentication that can be paired with AWS SSM to establish secure connections to those instances while keeping them in private subnets. On those instances, we use a Docker container as a full-featured development environment so that we have a consistent, easily reproducible toolchain across our team.

Utilizing VSCode remote containers we can have what feels like a local IDE, but it’s actually running in AWS with all the compute flexibility that entails – plus having Metaflow preconfigured, environmental/DB permissions ready to go, and some hygiene tools for coding standards like black, ruff, precommit, etc. The ambition for developer experience has always been that a new hire should be empowered to go from nothing to a fully provisioned setup, capable of submitting a github PR, within an afternoon.

Relative to our Python stack, I’d say it spans a significant portion of the PyData stack and modeling frameworks. Metaflow provides significant flexibility on that front, so there aren’t many limitations, and people are empowered to use the best tool/package for the job.

  • PyTorch is our deep learning framework of choice, including PyTorch Lightning and PyTorch Forecasting.
  • CatBoost, XGBoost, LightGBM, sklearn, Huggingface, PyMC, Pyro, PyStan, Prophet, UMAP, HDBSCAN are all quite common across our ML projects.
  • In addition to Snowflake/DB connectors for querying data, AWSWrangler (now the AWS SDK for pandas) is frequently used for ETL and working with Parquet datasets.
  • For reporting and monitoring there’s a mix of notebooks, Metaflow cards, Slack integrations, Snowsight Dashboards, and Streamlit apps depending on the needs and flexibility required (e.g. interactive Altair plots, SHAP analysis, viz).

What did you discover about Metaflow after using it? What Metaflow features do you use and what do you find useful about them?

The AWS abstractions and workflow orchestration components of Metaflow were the most significant quality-of-life improvements during the initial adoption, and over time the metadata tracking became just as important if not even more so.

Being able to quickly debug workflows by retroactively grabbing the exact data and model artifacts, reproducing the issue in a notebook, and interactively solving the problem is a huge productivity boost. Being able to have namespace isolation to see who ran what and when along with the separation of production from dev is huge. Utilizing project branching has been a great way to quickly experiment on existing processes.

The Metaflow UI has been amazing. It wasn’t around when we first started using Metaflow, nor were cards, and they’ve both been super helpful for monitoring, quickly navigating through metadata, and even replacing a fair bit of team-specific basic reporting needs (e.g. cards with AB testing results that get refreshed on a schedule, rather than a full-fledged dashboard).

How has using Metaflow changed ML at

Metaflow has helped our teams have a much smoother development experience, streamlined our toolchain, and shaved months off the time it took to go from ideation to production deployed pipelines.

It’s significantly easier to build and scale workflows, experimentation friction has been greatly reduced, and we’ve also had improved collaboration with other organizations across the company that have benefited from more consistent coordination of models and rapid prototyping.

Metaflow has become the backbone of ML projects at, and enables our teams to quickly iterate and deliver value to the business and our consumers.

Join our community

If these topics are of interest, you can find more interviews in our Fireside Chats and come chat with us on our community slack here.

Smarter machines, built by happier humans

The future will be powered by dynamic, data intensive systems - built by happy humans using tooling that gives them superpowers

Get started for free