Deliveroo’s Recipe for Streamlined ML: From Argo to Metaflow
Smoother, more reliable deployment workflows
Faster model iteration and reduced time to production
Reduced complexity and support requests
Deliveroo, a household name in food delivery in Europe and Asia, connects hungry customers with restaurants and, more recently, offers additional delivery options beyond food. To stay ahead in the competitive food delivery sector, Deliveroo relies on sophisticated machine learning (ML) infrastructure to support timely deliveries, optimize routes, and refine customer recommendations.
Thomas Furmston, an ML Platform Engineer at Deliveroo, brought his ML and data engineering expertise to the company to address critical ML workflow issues. With a PhD in machine learning, Thomas has deep experience across ML and engineering roles, including expertise in tooling and infrastructure for streamlined ML model development, training, and deployment pipelines. As he explained, his role at Deliveroo is to make ML deployment and training efficient, standardized, and smooth for ML engineers and data scientists.
When Thomas joined, Deliveroo's ML infrastructure was a patchwork of legacy in-house tools that had grown organically without a cohesive strategy. Teams faced considerable challenges deploying ML models, and the lack of a structured platform led to frequent support requests and frustrations. This inefficiency motivated the exploration of alternative platforms that could simplify ML deployment while supporting Deliveroo’s expanding ML needs.
Limitations of Argo Workflows and Legacy Tools
On arrival, Thomas encountered a disjointed environment with scattered tooling and inconsistent workflows. The company relied on Kubernetes alongside a set of custom-built in-house tools that lacked cohesion, leading to frequent deployment issues and a high rate of configuration errors.
The ML platform team had previously attempted to transition to Argo Workflows, a popular workflow orchestration tool, hoping it would offer the flexibility needed. However, they soon found that Argo workflows, while powerful, weren’t user-friendly for ML engineers who preferred to avoid YAML pipeline configurations. This added complexity hindered Deliveroo’s machine learning engineers, who struggled to transition from local development to scalable deployment effectively. Thomas shared, “Developers don’t want to create pipelines directly in YAML—there’s no efficient model development experience with that approach.”
Recognizing the gap in functionality and developer satisfaction, Thomas and his team embarked on a rigorous search, examining alternative solutions that could better serve Deliveroo’s needs by minimizing support requests and improving model development workflows. As he explained, “Teams were running around with their hair on fire, struggling with our legacy tools, so it was clear we needed a tool that would make things smoother and easier.”
The Switch to Metaflow
To address these challenges, Thomas’s team conducted an extensive evaluation process. They assessed several platforms, including Argo Workflows, Airflow, Prefect, Weights & Biases, Comet, and even AWS SageMaker. Ultimately, they selected Metaflow as the top choice for model development and pipeline management.
Deliveroo conducted a pilot with multiple teams, giving ML engineers hands-on experience with both Metaflow and Comet. The decision was based on two core factors: the simplicity of developing, testing, and deploying models with Metaflow, and the strong preference from ML engineers for Metaflow's Python-centric workflow. The feedback from the pilot was clear—teams overwhelmingly preferred Metaflow.
Thomas shared, “Metaflow’s approach was a natural fit for the way we wanted to do ML at Deliveroo. ML engineers could write pipelines directly in Python, test them locally, and then deploy them easily, which created a seamless, cohesive experience that none of the alternatives offered.”
Metaflow provided the additional advantage of flexible extensions, which allowed Deliveroo to tailor the platform to its existing infrastructure. The team developed GitOps-style workflows to align with their preference for Git-based CI/CD deployments rather than CLI-based approaches. “We used Metaflow’s extensions framework to customize our deployments, allowing us to retain our GitOps setup. This was a crucial factor for us,” he explained.
Smoother ML Workflows, Reduced Support Burden, and Enhanced Deployment
The transition to Metaflow marked a major improvement for Deliveroo’s ML teams. Implementing Metaflow streamlined the development and deployment of ML models, allowing engineers to focus on model building rather than fighting infrastructure challenges. The integration into Deliveroo’s Kubernetes setup and the use of GitOps for deployment meant that migrating pipelines took only days or a week at most, a drastic reduction compared to the prior setup.
Qualitative and quantitative feedback indicated that the platform change was a success. Thomas explained, “It’s been a big win for both the ML engineers and the company as a whole. The support requests have noticeably decreased, and teams now report that developing and iterating on pipelines is far easier and quicker.” Key improvements included:
- Reduced Support Requests: By moving from Argo to Metaflow, the ML platform team saw a marked reduction in support requests related to pipeline configuration issues. Metaflow’s Python-centric interface eliminated the need for complex YAML configurations, simplifying troubleshooting and deployment.
- Accelerated Pipeline Iteration: ML engineers found it significantly easier to iterate on models and deploy new versions of their pipelines. By using Metaflow’s local testing features, engineers could quickly prototype, validate, and deploy models — reducing iteration time by as much as 50%.
- Smooth, Standardized Deployment: With the flexibility to adapt Metaflow’s extensions to fit their GitOps pipeline, Deliveroo achieved a standardized, Git-based deployment model that aligned with their broader engineering practices. This consistency across development and production environments led to a reduction in deployment errors.
Thomas emphasized the difference, stating, “With Metaflow, ML engineers can seamlessly move from local development to deployment without jumping through hoops, and it has significantly simplified the pipeline process.”
With Metaflow firmly integrated into Deliveroo’s ML platform, the results have been clear: ML engineers now spend less time dealing with deployment issues, and more time focused on model performance and improvement. The success has been such that teams not yet fully migrated to Metaflow are eager to adopt the platform.
Reflecting on Metaflow, Thomas shared, “Metaflow has allowed us to bring a level of standardization and efficiency that was previously lacking. The platform’s ability to streamline development while maintaining high flexibility has been invaluable.”
Looking ahead, Deliveroo anticipates further enhancing its ML capabilities with more advanced Metaflow extensions tailored to the company’s unique workflows. There are also discussions of automating additional parts of the ML lifecycle, such as model performance monitoring, to support Deliveroo’s growing reliance on data-driven insights. By building on the robust foundation Metaflow provides, Deliveroo is well-positioned to continue scaling its ML initiatives efficiently, ultimately enhancing its delivery experience.
Start building today
Join our office hours for a live demo! Whether you're curious about Outerbounds or have specific questions - nothing is off limits.