Natural Language Processing with Metaflow Tutorial

In this series of episodes, you will learn how to build, train, test, and deploy a machine learning model that performs text classification. You will use Tensorflow, Scikit-learn, and Metaflow to operationalize a machine learning product using best practices for evaluating and testing. Lastly, you will learn how you can use this model in downstream processes.

Natural Language Processing with Metaflow

Try it out directly in your browser

Open in Sandbox

from metaflow import FlowSpec, step, Flow, current

class NLPPredictionFlow(FlowSpec):
    
    def get_latest_successful_run(self, flow_nm, tag):
        """Gets the latest successful run 
            for a flow with a specific tag."""
        for r in Flow(flow_nm).runs(tag):
            if r.successful: return r
        
    @step
    def start(self):
        """Get the latest deployment candidate 
            that is from a successfull run"""
        self.deploy_run = self.get_latest_successful_run(
            'NLPFlow', 'deployment_candidate')
        self.next(self.end)
    
    @step
    def end(self):
        "Make predictions"
        from model import NbowModel
        import pandas as pd
        import pyarrow as pa
        new_reviews = pd.read_parquet(
            'predict.parquet')['review']
        
        # Make predictions
        model = NbowModel.from_dict(
            self.deploy_run.data.model_dict)
        predictions = model.predict(new_reviews)
        msg = 'Writing predictions to parquet: {} rows'
        print(msg.format(predictions.shape[0]))
        pa_tbl = pa.table({"data": predictions.squeeze()})
        pa.parquet.write_table(
            pa_tbl, "sentiment_predictions.parquet")
        
if __name__ == '__main__':
    NLPPredictionFlow()

If you want to code along, you can open up your sandbox, skip the rest of this setup page, and follow Hugo and Hamel Husain in the Natural Language Processing meets MLOps Live Code Along on YouTube.

Prerequisites

We assume that you have taken the introductory tutorials or know the basics of Metaflow.

Tutorial Structure

The tutorial consists of seven episodes, all centering around a text classification task you will be introduced to in the first episode.

Episode 1: Understand the Data
Episode 2: Construct a Model
Episode 3: Set Up a Baseline Flow
Episode 4: Train your Model
Episode 5: Evaluate your Model
Episode 6: Use your Model in Python
Episode 7: Make Batch Predictions in a Flow
Episode 8: Create a Realtime Endpoint

Each episode contains either a Metaflow script to run or a Jupyter notebook. You do not need access to cloud computing or a Metaflow deployment to complete the episodes. The estimated time to complete all episodes is 1-2 hours.

Why Metaflow?

The main benefit of using a data science workflow solution like Metaflow when prototyping is that your code will be built on a strong foundation for deploying to a production environment. Metaflow is most useful when projects have scaling requirements, are mission-critical, and/or have many interacting parts. You can read more at these links:

After completing the lessons, you will be able to transfer insights and code from the tutorial to your real-world data science projects. It is important to be mindful that this is a beginner tutorial so it will not reflect many important challenges to consider in production ML environments. For example, in production, you may consider using Metaflow features such as the @conda decorator for dependency management, @batch or @kubernetes for remote execution, and @schedule to automatically trigger jobs.

Natural Language Processing with Metaflow Tutorial

Natural Language Processing with Metaflow

Prerequisites​

Tutorial Structure​

Why Metaflow?​

Prerequisites

Tutorial Structure

Why Metaflow?