Skip to main content

Testing a Flow with PyTest


How can I use PyTest with a flow?


There are two related cases to consider:

  • Test the logic within steps.
  • Test the flow itself.

1Testing Logic in Steps

It is a helpful design pattern to move non-orchestration logic out of the actual flows and write unit tests for the component functions. In other words, if you have logic in the step of a flow like the highlight flow you can refactor it in the following way.

Here is a pseudo-code example of a flow you may want to refactor in this way.
class MyFlow(FlowSpec):

def start(self):
# logic A
# logic B
# logic C

# rest of flow

To refactor you can first make a separate file to contain the logic that can be tested independent of the flow:
def do_logic():
# logic A
# logic B
# logic C

This is the suggested design pattern because now you can unit test this logic in the way you normally would, and then import it in the flow.
class MyFlow(FlowSpec):

def start(self):
from my_module import do_logic

# rest of flow

In general, separating the implementation of the logic from the flow makes code bases leveraging Metaflow easier to maintain and test. It is a particularly useful design pattern when you have multiple flows and/or steps that import the same logic.

2Testing a Flow

In the second case, suppose you have a flow you would like to write a unit test for.

In this example there is a data artifact x which is stored in self.x.
from metaflow import FlowSpec, step

class FlowToTest(FlowSpec):

def start(self):
self.x = 0

def end(self):
self.x += 1

if __name__ == '__main__':

Suppose you want to test that after running the flow the artifact value is what you expect.

assert x == 1 # goal: check this is true using PyTest

To do this you can:

  • Switch your Metaflow profile to ensure tests use a separate (local) metadata and datastore.
  • Define a test file and use PyTest to test the flow.

2.aSwitch Metaflow Profiles

By default, Metaflow creates a profile for you at ~/.metaflow_config/config.json. You can make and activate a custom profile that tells Metaflow to use different metadata and datastores. For example, you can define to ~/.metaflowconfig/config_test.json like:


to separate data from test runs from your actual runs.

2.bRun PyTest Script

Now you can define a PyTest script that will:

  • Run the flow.
  • Use Metaflow's client API to access the artifact of interest.
  • Test the artifact value is as expected.
import os
os.environ['METAFLOW_PROFILE'] = 'test'
from metaflow import Flow
import subprocess

def test_flow():
cmd = ['python', '', 'run', '--run-id-file', 'test_id']
with open('test_id') as f:
run = Flow('FlowToTest')[]
assert == 1
    ============================= test session starts ==============================
platform darwin -- Python 3.9.7, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
rootdir: /Users/eddie/Dev/outerbounds-docs/docs/docs/data-science/deployment
plugins: anyio-3.5.0
collected 1 item . [100%]

============================== 1 passed in 1.65s ===============================