Natural Language Processing with Metaflow Tutorial
In this series of episodes, you will learn how to build, train, test, and deploy a machine learning model that performs text classification. You will use Tensorflow, Scikit-learn, and Metaflow to operationalize a machine learning product using best practices for evaluating and testing. Lastly, you will learn how you can use this model in downstream processes.
Prerequisites
We assume that you have taken the introductory tutorials or know the basics of Metaflow.
Tutorial Structure
The tutorial consists of seven episodes, all centering around a text classification task you will be introduced to in the first episode.
- Episode 1: Understand the Data
- Episode 2: Construct a Model
- Episode 3: Set Up a Baseline Flow
- Episode 4: Train your Model
- Episode 5: Evaluate your Model
- Episode 6: Use your Model in Python
- Episode 7: Use your Model in a Flow
Each episode contains either a Metaflow script to run or a Jupyter notebook. You do not need access to cloud computing or a Metaflow deployment to complete the episodes. The estimated time to complete all episodes is 1-2 hours.
Why Metaflow?
The main benefit of using a data science workflow solution like Metaflow when prototyping is that your code will be built on a strong foundation for deploying to a production environment. Metaflow is most useful when projects have scaling requirements, are mission-critical, and/or have many interacting parts. You can read more at these links:
After completing the lessons, you will be able to transfer insights and code from the tutorial to your real-world data science projects.
It is important to be mindful that this is a beginner tutorial so it will not reflect many important challenges to consider in production ML environments. For example, in production, you may consider using Metaflow features such as the @conda
decorator for dependency management, @batch
or @kubernetes
for remote execution, and @schedule
to automatically trigger jobs.