Skip to main content

Natural Language Processing - Episode 7

This episode references two pieces of code:

  1. Notebook for this lesson.
  2. predflow.py

In Episode 5, you saw how to train a model and tag the model if it passed certain tests to indicate that it was ready for downstream processes. In this episode, you will retrieve this model for use in other flows. More generally, you will learn how to retrieve any of your flow results for use in another flow.

1Use your Trained Model in a Prediction Flow

With the Metaflow client API, you can retrieve your artifacts in whatever downstream application you want, or even just use the API for ad-hoc testing.

You can utilize the client API to also retrieve model artifacts within a flow!

This flow contains the following steps:

  1. Get the latest deployment candidate using the Metaflow API in the start step. Recall that the name of our previous flow is NLPFlow.
  2. Make predictions with our deployment candidate on a new dataset and write that to a parquet file in the end step.

predflow.py
from metaflow import FlowSpec, step, Flow, current

class NLPPredictionFlow(FlowSpec):

def get_latest_successful_run(self, flow_nm, tag):
"""Gets the latest successful run
for a flow with a specific tag."""
for r in Flow(flow_nm).runs(tag):
if r.successful: return r

@step
def start(self):
"""Get the latest deployment candidate
that is from a successfull run"""
self.deploy_run = self.get_latest_successful_run(
'NLPFlow', 'deployment_candidate')
self.next(self.end)

@step
def end(self):
"Make predictions"
from model import NbowModel
import pandas as pd
import pyarrow as pa
new_reviews = pd.read_parquet(
'predict.parquet')['review']

# Make predictions
model = NbowModel.from_dict(
self.deploy_run.data.model_dict)
predictions = model.predict(new_reviews)
msg = 'Writing predictions to parquet: {} rows'
print(msg.format(predictions.shape[0]))
pa_tbl = pa.table({"data": predictions.squeeze()})
pa.parquet.write_table(
pa_tbl, "sentiment_predictions.parquet")

if __name__ == '__main__':
NLPPredictionFlow()

2Run the Prediction Flow

python predflow.py run
     Workflow starting (run-id 1666721228321456):
[1666721228321456/start/1 (pid 53162)] Task is starting.
[1666721228321456/start/1 (pid 53162)] Task finished successfully.
[1666721228321456/end/2 (pid 53165)] Task is starting.
[1666721228321456/end/2 (pid 53165)] 312: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
71/71 [==============================] - 0s 349us/stepd/2 (pid 53165)] 1/71 [..............................] - ETA:
[1666721228321456/end/2 (pid 53165)] Writing predictions to parquet: 2264 rows
[1666721228321456/end/2 (pid 53165)] Task finished successfully.
Done!

Conclusion

Congratulations, you have completed Metaflow's introductory tutorial on operationalizing NLP workflows! You have learned how to:

  1. Create a baseline flow that reads data and computes a baseline.
  2. Use branching to perform steps in parallel.
  3. Serialize and de-serialize data in Metaflow.
  4. Use tagging to evaluate and gate models for production.
  5. Retrieve your model both outside Metaflow and from another flow.

Further Discussion

This is a very simple example that will also run on your laptop. However, for production use cases you may want to use other built-in Metaflow features such as @conda for dependency management, @batch or @kubernetes for remote execution, and @schedule to automatically trigger jobs.