Skip to main content

Natural Language Processing with Metaflow Tutorial

In this series of episodes, you will learn how to build, train, test, and deploy a machine learning model that performs text classification. You will use Tensorflow, Scikit-learn, and Metaflow to operationalize a machine learning product using best practices for evaluating and testing. Lastly, you will learn how you can use this model in downstream processes.


We assume that you have taken the introductory tutorials or know the basics of Metaflow.

Tutorial Structure

The tutorial consists of seven episodes, all centering around a text classification task you will be introduced to in the first episode.

Each episode contains either a Metaflow script to run or a Jupyter notebook. You do not need access to cloud computing or a Metaflow deployment to complete the episodes. The estimated time to complete all episodes is 1-2 hours.

Why Metaflow?

The main benefit of using a data science workflow solution like Metaflow when prototyping is that your code will be built on a strong foundation for deploying to a production environment. Metaflow is most useful when projects have scaling requirements, are mission-critical, and/or have many interacting parts. You can read more at these links:

After completing the lessons, you will be able to transfer insights and code from the tutorial to your real-world data science projects. It is important to be mindful that this is a beginner tutorial so it will not reflect many important challenges to consider in production ML environments. For example, in production, you may consider using Metaflow features such as the @conda decorator for dependency management, @batch or @kubernetes for remote execution, and @schedule to automatically trigger jobs.