Mozilla’s Journey to Scalable Machine Learning with Outerbounds

This is a summary of the original

Mozilla

blog. You may access their original article

here.

Deployments

Increased model deployment frequency

Speed

Improved model integration speed and efficiency

Diagnostics

Significant transparency in job performance to diagnose issues

Name

Mozilla

Deployment Type

Outerbounds Platform

Founded

2005

Location

San Francisco, CA

Industry

Software and Technology

Original Article

ML Models

No items found.

Mozilla, a pioneer in privacy-focused technology, has always prioritized data privacy and user protection. This focus on privacy has significantly shaped the company’s approach to data collection, machine learning (ML), and automation. Chelsea Troy, a lead engineer at Mozilla, has been central to these efforts, leading Mozilla to streamline its machine learning operations and scale them efficiently. This case study outlines how Mozilla transitioned from a fragmented, disjointed ML infrastructure to a streamlined, scalable system using Outerbounds.

Fragmented Machine Learning Operations

Before adopting Outerbounds, Mozilla’s ML teams operated independently, each using its own stack and deployment tools. Chelsea described the situation: “Each team was using wildly different stacks that largely amounted to whatever it is they knew at the time they deployed their model.” This created several inefficiencies across the organization.

For one, different teams often faced challenges in deploying and managing models. Some teams even struggled to run their jobs on a regular schedule. This fragmentation limited Mozilla's ability to scale ML efforts efficiently across teams.

Another challenge was related to transparency and monitoring. While teams could deploy models, troubleshooting issues became time-consuming. “There was a refactor that made a fix universal, and it broke that job, and no amount of debugging seemed to be able to figure it out,” Chelsea shared. Logs and monitoring were decentralized, making it difficult to track issues or provide visibility across teams.

As Mozilla moved towards expanding its ML operations, it became clear that a more unified approach was necessary. The company wanted to bring machine learning models into consumer-facing products like Firefox’s translations and FakeSpot (for detecting fake product reviews), but this required both scaling the deployment and ensuring data privacy. “We have a small team at Mozilla, so duplication of effort is particularly costly for us,” Chelsea explained. The organization needed a solution that could increase efficiency, reduce duplication, and centralize operations.

Why Outerbounds?

When Chelsea and her team began looking for a unified platform to streamline ML operations, they evaluated several options, including open-source Metaflow, MLflow, and other orchestration tools. However, Mozilla had strict requirements, particularly around data privacy. As a privacy-first company, Mozilla couldn’t use solutions that required storing data outside of their infrastructure. Chelsea noted, “It was extremely important for those machine learning flows that ran on sensitive data to remain inside our GCP project.”

Outerbounds, built on top of Metaflow, stood out as the ideal choice. It allowed Mozilla to run ML jobs within Google Cloud Platform (GCP), ensuring that sensitive data remained within their environment. This was crucial for Mozilla’s privacy commitments. Additionally, Outerbounds provided a unified platform that could streamline deployment, monitoring, and scaling across multiple teams, reducing the overhead of managing separate stacks.

The API also proved a natural fit for Mozilla’s data scientists. “The Metaflow API integrates so seamlessly with the Python code that teams are already writing. It’s a light lift,” Chelsea explained. Data scientists could quickly adapt to Outerbounds without overhauling their existing workflows. The simplicity of the integration was a key factor in the decision to move forward.

Increased Model Deployment Frequency

One of the most immediate benefits Mozilla saw after adopting Outerbounds was the ability to run ML jobs more frequently. Many teams that had previously struggled to deploy models consistently were now able to do so with ease. “Many teams that didn’t run their jobs regularly are now able to do so through the platform,” Chelsea noted.

This shift had a significant impact on the overall productivity of Mozilla’s ML teams. Instead of worrying about deployment infrastructure, teams could focus on improving the quality of their models and running experiments more frequently. By eliminating the manual barriers to running jobs, Outerbounds allowed Mozilla to fully leverage their ML capabilities across the organization.

Improved Model Integration Speed and Efficiency

With Outerbounds, Mozilla’s teams also experienced improved speed and efficiency when it came to integrating models. Chelsea noted that integrating machine learning models had often been a time-consuming process, largely because teams had to manage their own infrastructure and deployment tools. “Each team was managing its own deployment stack, which often meant that deploying models could be slow and cumbersome,” she explained.

After transitioning to Outerbounds, the model integration process became much more streamlined. “Once they saw how easy it was to integrate their models, they were like, ‘I can do this with my model by next week,’” Chelsea recalled. This represents a significant improvement in integration speed, going from potentially weeks to just days. Teams could now deploy models faster and more efficiently, which allowed them to spend less time on infrastructure and more time on optimizing the models themselves.

This improvement in integration speed not only increased team productivity but also allowed Mozilla to experiment and iterate more frequently, leading to better overall model performance.

Increased Transparency in Logs and Error Tracking

In addition to the faster deployments and increased frequency, Outerbounds also solved a long-standing issue related to monitoring and transparency. Before Outerbounds, troubleshooting issues in production required manual intervention and often took a lot of time. Logs were not easily accessible, which meant engineers had to spend significant time debugging issues.

After adopting Outerbounds, this changed significantly. “The ability to access logs easily and see what’s happening saved us countless hours of troubleshooting,” Chelsea explained. By centralizing logs and error tracking in a single platform, engineers could quickly diagnose and fix issues, drastically reducing the time spent on manual debugging. This increased transparency in job performance and troubleshooting helped teams at Mozilla operate more efficiently and respond to issues in production faster.

Outerbounds’ Impact on Mozilla’s Machine Learning Operations

Outerbounds has not only improved Mozilla’s machine learning workflows but also enhanced the overall flexibility and scalability of their ML efforts. Chelsea highlighted several reasons why Outerbounds became the ideal choice for Mozilla:

Support for GCP: Outerbounds worked within Mozilla’s GCP environment, ensuring full compliance with their privacy requirements.
Centralized platform: Outerbounds allowed Mozilla to bring their machine learning models onto a single platform, simplifying operations across teams.
Seamless integration: The Metaflow-based API was easy for data scientists to adopt, significantly reducing onboarding time.
Responsive support: Chelsea noted that Outerbounds’ team was highly responsive to feature requests and provided solutions that fit Mozilla’s unique needs.

The combination of these features allowed Mozilla to focus on building better machine learning models while eliminating the operational overhead of managing separate stacks and manual processes.

Mozilla is continuing to expand its machine learning capabilities across more projects, including further development of Firefox’s translation models and other consumer-facing applications. With Outerbounds as their core ML platform, they have the tools necessary to scale these efforts while maintaining their commitment to data privacy and transparency.

“We’ve freed up engineering time and resources. Instead of worrying about infrastructure, we can now focus on building better machine learning models,” Chelsea concluded.

By adopting Outerbounds, Mozilla has not only improved the scalability and efficiency of their machine learning operations but has also set a strong foundation for future innovation—all while staying true to their core principles of privacy and user protection.

‍