Skip to main content

Beginner Computer Vision: Episode 5

This tutorial references this notebook. The notebook shows how to analyze the results of your flow runs from the previous episodes. You will see how to fetch data from flow runs and interpret it with tags. This is an important aspect of the experience of working with Metaflow. You will see how to move between scripts and notebooks. In this case, you will use the Metaflow client API to tag promising runs as production candidates.

After following the setup instructions, start the notebook with this command:

jupyter lab cv-intro-5.ipynb

1Load Flow Results

Tagging helps you organize flows. Tags let you apply interpretations to the results of flows. Let's see how they work by loading run data from the TuningFlow you built in episode 4. The data can be accessed in any Python environment using Metaflow's Client API:

from metaflow import Flow
model_comparison_flow = Flow('ModelComparisonFlow')
tuning_flow = Flow('TuningFlow')

2Define How to Aggregate and Compare Results

Next we define a function to parse the data in the runs. The customizable get_stats function will progressively build up a dictionary called stats. Each new entry in the stats dictionary contains hyperparameters, metrics, and metadata corresponding to a model trained in a TuningFlow.

import numpy as np 

def get_stats(stats, run, metrics):
if run.successful and hasattr(, 'results'):
results =
if not np.all(_m in results.columns for _m in metrics):
return stats
best_run = results.iloc[results[metrics[0]].idxmax()]
stats['flow id'].append(
stats['flow name'].append(run.parent.pathspec)
stats['model name'].append(best_run['model'])
for _m in metrics:
stats['test loss'].append(best_run['test loss'])
return stats

Next we loop through runs of TuningFlow and ModelComparisonFlow and aggregate stats:

metrics =

stats = {
'flow id': [],
'flow name': [],
'model name': [],
'test loss': [],
**{metric: [] for metric in metrics}

for run in tuning_flow.runs():
stats = get_stats(stats, run, metrics)

for run in model_comparison_flow.runs():
stats = get_stats(stats, run, metrics)
import pandas as pd

best_models = pd.DataFrame(stats)
flow id flow name model name test loss accuracy precision at recall
0 1666721523161525 TuningFlow CNN 0.026965 0.9910 0.999272
1 1665967558891569 TuningFlow CNN 0.027228 0.9907 0.999168
2 1666721393687341 ModelComparisonFlow CNN 0.026307 0.9910 0.999272
3 1665967344088184 ModelComparisonFlow CNN 0.030421 0.9892 0.998545

3Access the Best Model

With the list of best_models, we can sort by test accuracy performance and find the run containing the best model.

from metaflow import Run
sorted_models = best_models.sort_values(by=metrics[0], ascending=False).iloc[0]
run = Run("{}/{}".format(sorted_models['flow name'], sorted_models['flow id']))

Next, the model can be used to make predictions that we can check make sense when compared with the true targets:

from tensorflow import keras
import numpy as np

# get data samples
((x_train, y_train), (x_test, y_test)) = keras.datasets.mnist.load_data()
x_test = np.expand_dims(x_test.astype("float32") / 255, -1)

# use best_model from the Metaflow run
logits =
softmax = keras.layers.Softmax(axis=1)
probs = softmax(logits).numpy()
pred = probs.argmax(axis=1)
     51/313 [===>..........................] - ETA: 0s

2022-10-25 13:25:12.526043: W tensorflow/core/platform/profile_utils/] Failed to get CPU frequency: 0 Hz

313/313 [==============================] - 1s 3ms/step
print("Model predicts {}".format(pred))
print(" True targets {}".format(y_test))
    Model predicts [7 2 1 ... 4 5 6]
True targets [7 2 1 ... 4 5 6]

4Interpret Results with Tags

In the last section, you saw how to access and use the best model by filtering Metaflow runs. What if you want to add a property to runs so you can filter by that property later? Then it is time to leverage tagging. You can use .add_tag on runs that meet any condition.

In this case, we consider models that have a test accuracy > threshold. Runs that have models meeting this threshold are tagged as production.

def tag_runs(flow, metric = 'accuracy', threshold = 0.99):
for run in flow:
if run.successful and hasattr(, 'results'):
if[metric].max() > threshold:


Now runs can be accessed by filtering on this tag:

from metaflow import Flow
production_runs = Flow('TuningFlow').runs('production')

In this lesson, you saw how to load and analyze results of your flows. You added tags to runs that met your requirements for production quality. In the next lesson, you will see how to use models, filtered by the production tag, in a prediction flow.