Skip to main content

Natural Language Processing with Metaflow Tutorial

In this series of episodes, you will learn how to build, train, test, and deploy a machine learning model that performs text classification. You will use Tensorflow, Scikit-learn, and Metaflow to operationalize a machine learning product using best practices for evaluating and testing. Lastly, you will learn how you can use this model in downstream processes.

Natural Language Processing with Metaflow

Try it out directly in your browser

Open in Sandbox
from metaflow import FlowSpec, step, Flow, current

class NLPPredictionFlow(FlowSpec):

def get_latest_successful_run(self, flow_nm, tag):
"""Gets the latest successful run
for a flow with a specific tag."""
for r in Flow(flow_nm).runs(tag):
if r.successful: return r

@step
def start(self):
"""Get the latest deployment candidate
that is from a successfull run"""
self.deploy_run = self.get_latest_successful_run(
'NLPFlow', 'deployment_candidate')
self.next(self.end)

@step
def end(self):
"Make predictions"
from model import NbowModel
import pandas as pd
import pyarrow as pa
new_reviews = pd.read_parquet(
'predict.parquet')['review']

# Make predictions
model = NbowModel.from_dict(
self.deploy_run.data.model_dict)
predictions = model.predict(new_reviews)
msg = 'Writing predictions to parquet: {} rows'
print(msg.format(predictions.shape[0]))
pa_tbl = pa.table({"data": predictions.squeeze()})
pa.parquet.write_table(
pa_tbl, "sentiment_predictions.parquet")

if __name__ == '__main__':
NLPPredictionFlow()

If you want to code along, you can open up your sandbox, skip the rest of this setup page, and follow Hugo and Hamel Husain in the Natural Language Processing meets MLOps Live Code Along on YouTube.

Prerequisites

We assume that you have taken the introductory tutorials or know the basics of Metaflow.

Tutorial Structure

The tutorial consists of seven episodes, all centering around a text classification task you will be introduced to in the first episode.

Each episode contains either a Metaflow script to run or a Jupyter notebook. You do not need access to cloud computing or a Metaflow deployment to complete the episodes. The estimated time to complete all episodes is 1-2 hours.

Why Metaflow?

The main benefit of using a data science workflow solution like Metaflow when prototyping is that your code will be built on a strong foundation for deploying to a production environment. Metaflow is most useful when projects have scaling requirements, are mission-critical, and/or have many interacting parts. You can read more at these links:

After completing the lessons, you will be able to transfer insights and code from the tutorial to your real-world data science projects. It is important to be mindful that this is a beginner tutorial so it will not reflect many important challenges to consider in production ML environments. For example, in production, you may consider using Metaflow features such as the @conda decorator for dependency management, @batch or @kubernetes for remote execution, and @schedule to automatically trigger jobs.