Skip to main content

Add and Remove Tags

Question

How can I add and remove tags to flow runs programmatically?

Solution

You can do this within a flow or using the Client API.

1Run Flow

This flow shows how to:

  • Load a dataset.
  • Train a Scikit-learn model.
  • Evaluate the model on a test set.
    • Tag the model as a production_candidate if the model score is greater than accuracy_threshold.
add_remove_tags_programmatically.py
from metaflow import FlowSpec, step, Flow, current, Parameter

class ModelTaggingFlow(FlowSpec):

max_depth = Parameter('max-depth', default=2)
tag_msg = 'Tagging run {} as a promising model'
accuracy_threshold = 0.85

@step
def start(self):
from sklearn import datasets
from sklearn.model_selection import train_test_split
data = datasets.load_wine()
data = train_test_split(data['data'],
data['target'],
random_state = 42)
self.X_train = data[0]
self.X_test = data[1]
self.y_train = data[2]
self.y_test = data[3]
self.next(self.train)

@step
def train(self):
from sklearn.tree import DecisionTreeClassifier
self.params = {
'max_leaf_nodes': None,
'max_depth': self.max_depth,
'max_features' : 'sqrt',
'random_state': 0
}
self.model = DecisionTreeClassifier(**self.params)
self.model.fit(self.X_train, self.y_train)
self.next(self.eval_and_tag)

@step
def eval_and_tag(self):
from sklearn.metrics import (accuracy_score,
classification_report)
self.pred = self.model.predict(self.X_test)
self.accuracy = float(
accuracy_score(self.y_test, self.pred))
print(self.accuracy)
if self.accuracy > self.accuracy_threshold:
print(self.tag_msg.format(current.run_id))
run = Flow(current.flow_name)[current.run_id]
run.add_tag('promising model')
self.next(self.end)

@step
def end(self):
pass

if __name__ == '__main__':
ModelTaggingFlow()
python add_remove_tags_programmatically.py run
     Workflow starting (run-id 1659643565956220):
[1659643565956220/start/1 (pid 20307)] Task is starting.
[1659643565956220/start/1 (pid 20307)] Task finished successfully.
[1659643565956220/train/2 (pid 20310)] Task is starting.
[1659643565956220/train/2 (pid 20310)] Task finished successfully.
[1659643565956220/eval_and_tag/3 (pid 20313)] Task is starting.
[1659643565956220/eval_and_tag/3 (pid 20313)] 0.8666666666666667
[1659643565956220/eval_and_tag/3 (pid 20313)] Tagging run 1659643565956220 as a promising model
[1659643565956220/eval_and_tag/3 (pid 20313)] Task finished successfully.
[1659643565956220/end/4 (pid 20316)] Task is starting.
[1659643565956220/end/4 (pid 20316)] Task finished successfully.
Done!
python add_remove_tags_programmatically.py run --max-depth 6
     Workflow starting (run-id 1659643570153989):
[1659643570153989/start/1 (pid 20322)] Task is starting.
[1659643570153989/start/1 (pid 20322)] Task finished successfully.
[1659643570153989/train/2 (pid 20325)] Task is starting.
[1659643570153989/train/2 (pid 20325)] Task finished successfully.
[1659643570153989/eval_and_tag/3 (pid 20328)] Task is starting.
[1659643570153989/eval_and_tag/3 (pid 20328)] 0.8888888888888888
[1659643570153989/eval_and_tag/3 (pid 20328)] Tagging run 1659643570153989 as a promising model
[1659643570153989/eval_and_tag/3 (pid 20328)] Task finished successfully.
[1659643570153989/end/4 (pid 20331)] Task is starting.
[1659643570153989/end/4 (pid 20331)] Task finished successfully.
Done!

2Observe Model Scores

You can use the client API to get the latest flow runs. Here is a way to list the accuracy value of each Run.

from metaflow import Flow
flow = Flow('ModelTaggingFlow')
tag = 'promising model'
runs = list(flow.runs(tag))
print("All models tagged with `{}`:".format(tag))
for run in runs:
acc = round(100 * run.data.accuracy, 2)
print("\tRun {}: {}% Accuracy".format(run.id, acc))
    All models tagged with `promising model`:
Run 1659643570153989: 88.89% Accuracy
Run 1659643565956220: 86.67% Accuracy
Run 1659643254562464: 86.67% Accuracy
Run 1659643511759883: 88.89% Accuracy
Run 1659643507482620: 86.67% Accuracy
Run 1659643258636203: 88.89% Accuracy

3Update Tags Using the Client API

You can use the run.add_tag, run.remove_tag or run.replace_tag functions to change a Run tag.

These lines will add the production candidate tag for each promising model with an 87% accuracy score.

flow = Flow('ModelTaggingFlow')
runs = list(flow.runs('promising model'))
for run in runs:
if run.data.accuracy > .87:
run.add_tag('production candidate')

Now you can see the model accuracy only for these models. This can be a useful pattern when reviewing models or testing and promoting them to production.

flow = Flow('ModelTaggingFlow')
tag = 'production candidate'
runs = list(flow.runs(tag))
print("All models tagged `{}`:".format(tag))
for run in runs:
acc = round(100 * run.data.accuracy, 2)
print("\tRun {}: {}% Accuracy".format(run.id, acc))
    All models tagged `production candidate`:
Run 1659643570153989: 88.89% Accuracy
Run 1659643511759883: 88.89% Accuracy
Run 1659643258636203: 88.89% Accuracy

Further Reading