Skip to main content

Natural Language Processing - Episode 6

This episode references two pieces of code:

  1. Notebook for this lesson.
  2. predflow.py

In the previous episode, you saw how we trained a model and tagged the model if it passed certain tests to indicate that it was ready for downstream processes. In this lesson, we show you how you can retrieve this model outside of flows with the client API. At the end of this lesson, you will know how to retrieve your flow results for analysis in a notebook or Python script.

1Use the Client API to Fetch the Latest Run

In addition to manipulating tags as seen in the previous lesson, the Metaflow client API allows you to access data from past runs. For example, this is how you can retrieve a model tagged as a deployment candidate outside of a flow:

from metaflow import Flow

def get_latest_successful_run(flow_nm, tag):
"Gets the latest successful run for a flow with a specific tag."
for r in Flow(flow_nm).runs(tag):
if r.successful: return r

The above code allows you to retrieve runs for flows matching flow_nm and filter them according to whether or not they are tagged. Finally, we check if the run is successful with the successful property.

2Load the Model

After retrieving the model's data with the client API, we can load the model like this:

from model import NbowModel

run = get_latest_successful_run('NLPFlow', 'deployment_candidate')
model = NbowModel.from_dict(run.data.model_dict)

3Make Predictions with the Model

Now that we have retrieved the model using the tag we can use it to make predictions:

import pandas as pd

predict_df = pd.read_parquet('predict.parquet')
preds = model.predict(predict_df['review'])
preds
    2023-03-31 17:41:26.790229: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz





array([[0.9973424 ],
[0.98123443],
[0.99737483],
...,
[0.9996966 ],
[0.9987401 ],
[0.4805344 ]], dtype=float32)

4Save Predictions

You can write these predictions to a parquet file like so:

import pyarrow as pa
pa_tbl = pa.table({"data": preds.squeeze()})
pa.parquet.write_table(pa_tbl, "sentiment_predictions.parquet")

In this episode, you saw how to use the results of a completed flow run, in this case accessing a trained model to make predictions on new data. In the next lesson, you will see how to access the model from a different flow.