Skip to main content

Use Artifacts in Metaflow Join Step

Question

How can I pass data artifacts of a Metaflow flow through a join step? What are my options for merging artifacts?

Solution

You can merge_artifacts in the join step. There are additional Metaflow features that allow you to exclude upstream artifacts during the merge. You will also want to be aware of any potential collisions with upstream artifact names.

This flow shows how to:

  • Access upstream values after branches are joined.
  • Select a value from a specific branch because there is a naming collision.
  • Exclude an upstream value from the merge.
join_step_artifacts.py
from metaflow import FlowSpec, step

class JoinArtifacts(FlowSpec):

@step
def start(self):
self.pre_branch_data = 0
self.next(self.branch_a, self.branch_b)

@step
def branch_a(self):
self.x = 1 # define x
self.a = "a"
self.next(self.join)

@step
def branch_b(self):
self.x = 2 # define another x!
self.b = "b"
self.next(self.join)

@step
def join(self, inputs):
# pick which x to propagate
self.x = inputs.branch_a.x
self.merge_artifacts(inputs, exclude=["a"])
self.next(self.end)

@step
def end(self):
print("`pre_branch_data` " + \
f"value is: {self.pre_branch_data}.")
print(f"`x` value is: {self.x}.")
print(f"`b` value is: {self.b}.")
try:
print(f"`a` value is: {self.a}.")
except AttributeError as e:
print("`a` was excluded! \U0001F632")


if __name__ == "__main__":
JoinArtifacts()
python join_step_artifacts.py run
     Workflow starting (run-id 1654221288038724):
[1654221288038724/start/1 (pid 71304)] Task is starting.
[1654221288038724/start/1 (pid 71304)] Task finished successfully.
[1654221288038724/branch_a/2 (pid 71314)] Task is starting.
[1654221288038724/branch_b/3 (pid 71315)] Task is starting.
[1654221288038724/branch_a/2 (pid 71314)] Task finished successfully.
[1654221288038724/branch_b/3 (pid 71315)] Task finished successfully.
[1654221288038724/join/4 (pid 71337)] Task is starting.
[1654221288038724/join/4 (pid 71337)] Task finished successfully.
[1654221288038724/end/5 (pid 71375)] Task is starting.
[1654221288038724/end/5 (pid 71375)] `pre_branch_data` value is: 0.
[1654221288038724/end/5 (pid 71375)] `x` value is: 1.
[1654221288038724/end/5 (pid 71375)] `b` value is: b.
[1654221288038724/end/5 (pid 71375)] `a` was excluded! 😲
[1654221288038724/end/5 (pid 71375)] Task finished successfully.
Done!

Further Reading