Skip to main content

Organize Results with Namespaces

Question

How can we use namespaces to keep results of our team's flow runs organized and accessible, no matter who ran the flow?

Solution

Metaflow persists all runs and the data they produce. This data can be accessed using the Client API. Namespaces are a mechanism to organize these results and the Client API data access patterns. By default, the active namespace will be user:<name> where name is the user name of the person who ran the flow.

This flow will be used throughout the remainder of this post. The important part is in the end step, where a random choice is stored as an artifact resulting from a run of TeamCollabFlow. This data could be any artifact you store

1Run Flow

team_collab_flow.py
from metaflow import FlowSpec, step, current

class TeamCollabFlow(FlowSpec):

@step
def start(self):
print("current.username: {}".format(current.username))
print("current.namespace: {}".format(current.namespace))
self.next(self.end)

@step
def end(self):
import random
self.choice = random.choice([1,2,3,4,5])
print("Random choice was {}".format(self.choice))

if __name__ == "__main__":
TeamCollabFlow()
python team_collab_flow.py run
     Workflow starting (run-id 1658843440575774):
[1658843440575774/start/1 (pid 67239)] Task is starting.
[1658843440575774/start/1 (pid 67239)] current.username: eddie
[1658843440575774/start/1 (pid 67239)] current.namespace: user:eddie
[1658843440575774/start/1 (pid 67239)] Task finished successfully.
[1658843440575774/end/2 (pid 67242)] Task is starting.
[1658843440575774/end/2 (pid 67242)] Random choice was 1
[1658843440575774/end/2 (pid 67242)] Task finished successfully.
Done!

2aAccess Results from your Namespace

By default, the Client API assumes it should pull data from the current user name. This means by default can use any Client API calls and you don't need to worry about others who are running this same flow and storing results in the same S3 bucket (or other storage location) as you. You will only get results from your namespace unless you explicitly set the namespace.

from metaflow import Flow
run = Flow('TeamCollabFlow').latest_successful_run
run_id, choice = run.id, run.data.choice
print("Run with id={} has choice={}".format(run_id,choice))
    Run with id=1658843440575774 has choice=1

2bShare Results with Teammates

If you want to your teammate access a run ID from your namespace, they will need to first switch the namespace before making the corresponding Client API call. The following example shows an error that occurs after switching to a namespace that doesn't contain the run_id that you ran in the previous section. This is what happens when your teammate tries to access your result before switching to your namespace, where the run_id exists.

from metaflow import namespace, get_namespace, Flow
from metaflow.exception import MetaflowNamespaceMismatch

not_my_namespace = 'user:my-teammate'
namespace(not_my_namespace) # teammate's default namespace
flow_name = 'TeamCollabFlow'
try:
run = Flow(flow_name).latest_successful_run
except MetaflowNamespaceMismatch as m:
print(m)
print("\tNo {} results in the {} namespace".format(flow_name, get_namespace()))
    Object not in namespace 'user:my-teammate'
No TeamCollabFlow results in the user:my-teammate namespace

Your teammate can use your namespace to access the result. The following snippet shows how you can get your namespace as using default_namespace. This will return a string that you or any of your colleagues can pass to namespace before fetching your flow results:

my_namespace = default_namespace()
namespace(my_namespace) # give the my_namespace string to your colleague
run = Flow(flow_name).latest_successful_run
run_id, choice = run.id, run.data.choice
print("Run with id={} has choice={}".format(run_id,choice))
    Run with id=1658843440575774 has choice=1

You can use these any time to activate your default name space:

from metaflow import namespace, default_namespace
_ = namespace(default_namespace())

2cUse the Run ID to access in a Global Namespace

This example shows how to access results across all namespaces represented in your flow data storage location, regardless of the user. This is done by setting namespace(None) and using the run.id.

namespace(None)
run = Run('TeamCollabFlow/{}'.format(run_id))
print("Run with id={} has data={}".format(run_id,data))
    Run with id=1658843440575774 has data=5

3The Production Namespace

Metaflow also maintains a production namespace that is separate from any user namespace. This is used when you schedule production flows to run automatically. In the case where a flow run is triggered via a production scheduler it may not make sense to associate the runs to a single user. You can read more about the production name space here.

How do I?

Use the Client API to manage deployment auth, resume production runs in a local namespace, and more?

4aAccessing Results in a Second Flow

This flow shows how to:

  • Access data from another flow using the get_flow_data function
    • Use the namespace call to change active namespaces.
    • Access results from past runs of other_flow_name.
    • Use the default_namespace call to return to the original namespace.
  • Print the data from the other flow during the AccessOtherNamespace run.
access_namespace_in_flow.py
from metaflow import (Flow, FlowSpec, step, namespace, 
default_namespace, Parameter)

def get_flow_data(flow, new_ns, original_ns=default_namespace()):
try:
namespace(new_ns)
run = Flow(flow).latest_successful_run
except:
return
namespace(original_ns)
return run

class AccessOtherNamespace(FlowSpec):

other_flow_name = Parameter('other-flow-name',
default='TeamCollabFlow')
other_namespace = Parameter('other-namespace',
default=default_namespace())
msg = "{}.latest_successful_run.data. has value {}."

@step
def start(self):

# access other_flow_name in other_namespace
run = get_flow_data(
flow = self.other_flow_name,
new_ns = self.other_namespace
)
if run is None:
print("Flow {} not found in {} namespace.".format(
self.other_flow_name,
self.other_namespace
))
else:
print(self.msg.format(
self.other_flow_name,
run.data.choice,
))
self.next(self.end)

@step
def end(self):
pass

if __name__ == "__main__":
AccessOtherNamespace()

4bRun the Second Flow

python access_namespace_in_flow.py run
     Workflow starting (run-id 1658847270992887):
[1658847270992887/start/1 (pid 67909)] Task is starting.
[1658847270992887/start/1 (pid 67909)] TeamCollabFlow.latest_successful_run.data. has value 1.
[1658847270992887/start/1 (pid 67909)] Task finished successfully.
[1658847270992887/end/2 (pid 67912)] Task is starting.
[1658847270992887/end/2 (pid 67912)] Task finished successfully.
Done!

Further Reading