How to Evaluate MLOps Tools (Stanford Talk Highlights)

By Hamel Husain

I’m a Machine Learning Engineer who loves building data-science infrastructure and developer tools. I’m currently working at Outerbounds, building ML infrastructure. I’ve previously held related roles at Airbnb, DataRobot, and GitHub as well as a long stint in management consulting, where I worked on a lot of other types of ML in retail, finance, and fashion.

I had fun recently giving this talk in Chip Huyen‘s Machine Learning Systems Class at Stanford about a subject we don’t talk about enough:

  1. How to evaluate ML Tooling
  2. How to spot & deal with 🔥Tool Zealots 🔥

We have recorded the full talk:

I wrote a Twitter thread summarizing the key insights and it resonated with a lot of people so I thought to put it all in one place here. For more context about each topic, take a look at the original video above. Let’s go!

We have all encountered tool zealots

Things that have been said to me:

  • “Only real ML engineers use Tensorflow, nobody uses Pytorch in production”
  • “TFX completely solves issues with data drift”
  • “fastai is just a toy”
  • and much more!

Zealots have a big impact on our profession. 

This is a pattern in MLOps

It can be hard to resist Zealots, especially when you are new. They often appeal to authority and try to make you feel inferior if you do not use their tools with cherry-picked features.

How to evaluate machine learning tools (such as TensorFlow Extended)

These are some criteria that can be helpful for evaluating tools that I discuss in the talk using TensorFlow Extended as an example:

The Data-Centric worldview

I discuss important things like adding friction into critical aspects of the workflow:

The importance of being able to iterate fast:

The danger of having too much complexity up-front:

New DSLs and confusing config files that add cognitive load for simple things like data visualization:

I also discuss how myopic focus tools can lead to large blind spots in how you solve certain ML problems:

The importance of documentation, API and otherwise

The most important consideration when selecting tools:

✨Quality of the documentation ✨

API design is important: tools should not force you to incur cognitive load with tons of boilerplate or unintuitive syntax:

What tooling zealots will tell you

BUT WAIT! There are counter-arguments zealots will often use. I outline the most common ones.

The first one:

  • “We need to scale”

Unnecessary focus on scaling is destructive, especially in the presence of reasonable alternatives that work well for many people:

The second one:

  • “You can prototype with your own tools, but only refactor when you are ready for production”

The slide speaks for itself:

The third one:

  • “If you use {favorite cloud provider}, it is not that bad!”

I think this is a dangerous career strategy

The most egregious argument zealots provide is an appeal to authority.

Zealots often do this because they do not have principled arguments and are not informed. It is important to shut this argument down if **all you hear** is an appeal to authority.


One thing I hope people will take away from this talk:

Don’t Become A Zealot!!! Why?

Just to be clear, I do not hate TFX. In the talk, I talk about some parts of TFX that I really love. It’s not a binary love vs. hate thing.

If you want to know what those are, watch the talk!

Here are some links that might be helpful

1. @chipro ‘s class syllabus:

2. Link to the slides:…

3. The talk is here!

You might be wondering, which tools do I like? The answer is: I am not satisfied with most ML tools for the reasons I outlined above. That being said, I think some tools have tremendous potential to become best of breed in their category based on their design choices, philosophy, and culture. That is why I joined Outerbounds because their views are aligned with mine, and I know I can build something great with them that breaks through the noise of tools that do not meet data scientists where they are. While there is still work left to do, I believe Metaflow is a solid foundation upon which to build something truly unique in this space. Stay tuned for future posts where I will expand more on this!

We’d love to hear your thoughts on the ML stack and Metaflow, as well as learn about what you’re working on. If these ideas resonate with you, we are hiring! Also, the easiest way to chat is to join us and over 900 data scientists and engineers at Metaflow Community Slack 👋