Skip to main content

Store Artifacts across Metaflow Steps


How can I use Metaflow to save and version data artifacts such as numpy arrays, pandas dataframes, or other Python objects with Metaflow. How can I access and update artifacts throughout the steps of a flow?


In this example you will see how you can save any Python object that can be pickled as an artifact - called some_data in this example - by storing it in self. You can then later access and update the artifact with self to propagate changes.

1Run Flow

This flow shows how to

  • Store a flow artifact.
  • Update the artifact in a downstream step.
  • Watch how the artifacts change during the flow.
from metaflow import FlowSpec, step

class ArtFlow(FlowSpec):

def start(self):
self.some_data = [1,2,3] # define artifact state

def middle(self):
print(f'the data artifact is: {self.some_data}')
self.some_data = [1,2,4] # update artifact state

def end(self):
print(f'the data artifact is: {self.some_data}')

if __name__ == '__main__':

When you run the flow, the artifact is correctly accessed across steps. Note that this functionality works regardless if you are running your flows locally or remotely (for example with @batch).

python run --run-id-file artifacts-run.txt
[1654221288112057/middle/2 (pid 71321)] Task is starting.
[1654221288112057/middle/2 (pid 71321)] the data artifact is: [1, 2, 3]
[1654221288112057/middle/2 (pid 71321)] Task finished successfully.
[1654221288112057/end/3 (pid 71343)] Task is starting.
[1654221288112057/end/3 (pid 71343)] the data artifact is: [1, 2, 4]
[1654221288112057/end/3 (pid 71343)] Task finished successfully.

2Access Artifacts Outside of Flow

You can use the client API to access data artifacts after a run is complete. There are many ways to access this data, but we show you several examples below.

You can reference Run(<FlowName>/<Run ID>) to access artifacts:

from metaflow import Run

# saved the id from previous run in artifacts-run.txt
run_id = open('artifacts-run.txt').read()
some_data = Run(f'ArtFlow/{run_id}').data.some_data
    [1, 2, 4]

You can also get the artifact from the latest run as demonstrated below:

from metaflow import Flow
assert Flow('ArtFlow') == [1,2,4]

Further Reading