Natural Language Processing - Episode 8

In the final episode, we will write a Python file that loads your trained model, and puts it behind an API to serve predictions. Although this example is meant as a starter kit, and not to run as-is in production, it is powered by industry-standard tools that can scale to serious production workflows. In particular, we will use FastAPI, and uvicorn as a server.

tip

If you are new to FastAPI, they have a comprehensive getting started tutorial and advanced guides.

1Serve Your Model

In model_server.py, we do the following:

load the latest successfully trained model using Metaflow's client API,
create a FastAPI instance, and
define HTTP endpoint methods in Python using FastAPI.

model_server.py
from fastapi import FastAPI
from metaflow import Flow

def get_latest_successful_run(flow_nm, tag):
    """Gets the latest successful run 
        for a flow with a specific tag."""
    for r in Flow(flow_nm).runs(tag):
        if r.successful: return r
    
# load model
from model import NbowModel
nbow_model = NbowModel.from_dict(
    get_latest_successful_run(
        'NLPFlow', 'deployment_candidate'
    )
    .data
    .model_dict
)

# create FastAPI instance
api = FastAPI() 

# how to respond to HTTP GET at / route
@api.get("/") 
def root():
    return {"message": "Hello there!"}

# how to respond to HTTP GET at /sentiment route
@api.get("/sentiment") 
def analyze_sentiment(review: str, threshold: float = 0.5):
    prediction = nbow_model.predict([review])[0][0]
    sentiment = "positive" if prediction > threshold else "negative"
    return {"review": review, "prediction": sentiment}

2Run the Server

Now open up your terminal, make sure the mf-tutorial-nlp environment is active, and run the following command to start your API server:

uvicorn model_server:api --reload

note

The --reload argument is a development feature that enables hot reloading of the API server, so you can edit the model_server.py file, hit save, and not have to turn off the server. Turn this off in production.

In production, you will want to deploy this API on another machine, but you are already most of the way to being able to serve live predictions in your apps. This is really powerful! You can find much more information about how to scale operations of your API on the FastAPI deployment documentation.

3Make an API Request

Let's make a request to the server! 🎉 ✨

import requests 
import urllib

# Write your new review here. This will become the payload of your API request.
# In practice, this query may come from user input in an app, or wherever you want!
review = "As a self-proclaimed fashion enthusiast, I have tried countless clothing items from various brands, but the ComfyCloud 9000 Sweater has left me in awe! Not only did it exceed my expectations, but it has also become my go-to sweater for any occasion."

# Configure the URL for the request.  
endpoint_uri_base = "http://127.0.0.1:8000/"  
threshold_value = 0.5
sentiment_api_slug = f"sentiment?review={urllib.parse.quote(review)}&threshold={threshold_value}"
url = endpoint_uri_base + sentiment_api_slug

# Make the request to your API.
response = requests.get(url, verify=False, proxies={'https': endpoint_uri_base})

# Report predicted sentiment of the review. 
print(f'Review: "{review}"')
print(f'\nPrediction: {response.json()["prediction"]}')

    Review: "As a self-proclaimed fashion enthusiast, I have tried countless clothing items from various brands, but the ComfyCloud 9000 Sweater has left me in awe! Not only did it exceed my expectations, but it has also become my go-to sweater for any occasion."
    
    Prediction: positive

Conclusion

Congratulations, you have completed Metaflow's introductory tutorial on operationalizing NLP workflows! You have learned how to:

Create a baseline flow that reads data and computes a baseline.
Use branching to perform steps in parallel.
Serialize and de-serialize data in Metaflow.
Use tagging to evaluate and gate models for production.
Retrieve your model both outside Metaflow and from another flow.
Set up a basic batch prediction workflow.
Set up a basic real-time serving endpoint.

Further Discussion

This tutorial is intended as a simple example to get you utilizing Metaflow in realistic ways on your laptop. However, for production use cases you may want to use other built-in Metaflow features such as @conda for dependency management, @batch or @kubernetes for remote execution, and @schedule to automatically trigger jobs.

You can find more tutorials like this with code you can run in the browser at https://outerbounds.com/sandbox.

1Serve Your Model​

2Run the Server​

3Make an API Request​

Conclusion​

Further Discussion​

1Serve Your Model

2Run the Server

3Make an API Request

Conclusion

Further Discussion