Skip to main content

Testing a Flow with PyTest

Question

How can I use PyTest with a flow?

Solution

There are two related cases to consider:

  • Test the logic within steps.
  • Test the flow itself.

1Testing Logic in Steps

It is a helpful design pattern to move non-orchestration logic out of the actual flows and write unit tests for the component functions. In other words, if you have logic in the step of a flow like the highlight flow you can refactor it in the following way.

Here is a pseudo-code example of a flow you may want to refactor in this way.

class MyFlow(FlowSpec):

@step
def start(self):
# logic A
# logic B
# logic C
self.next(self.next_step)

# rest of flow
...

To refactor you can first make a separate file to contain the logic that can be tested independent of the flow:

def do_logic():
# logic A
# logic B
# logic C

This is the suggested design pattern because now you can unit test this logic in the way you normally would, and then import it in the flow.

class MyFlow(FlowSpec):

@step
def start(self):
from my_module import do_logic
do_logic()
self.next(self.next_step)

# rest of flow

Separating the implementation of the logic from the flow makes code leveraging Metaflow easier to maintain and test. It is a particularly useful design pattern when you have multiple flows and/or steps that import the same logic.

2Testing a Flow

In the second case, suppose you have a flow you would like to write a unit test for.

In this example there is a data artifact x which is stored in self.x.

simple_flow.py
from metaflow import FlowSpec, step

class FlowToTest(FlowSpec):

@step
def start(self):
self.x = 0
self.next(self.end)

@step
def end(self):
self.x += 1

if __name__ == '__main__':
FlowToTest()

Suppose you want to test that after running the flow the artifact value is what you expect.

assert x == 1 # goal: check this is true using PyTest

To do this you can:

  • Switch your Metaflow profile to ensure tests use a separate (local) metadata and datastore.
  • Define a test file and use PyTest to test the flow.

2.a(Optional) Switch Metaflow Profiles

Outerbounds user note

On Outerbounds platform, it is advised to not change your Metaflow config file, since Outerbounds will handle this for you. If you'd like to separate testing or staging from production on Outerbounds, consider using the "Perimeters" feature.

By default, Metaflow creates a profile for you at ~/.metaflow_config/config.json. You can make and activate a custom profile that tells Metaflow to use different metadata and datastores. For example, you can define to ~/.metaflowconfig/config_test.json like:

{
"METAFLOW_DEFAULT_DATASTORE": "local"
}

to separate data from test runs from your actual runs. See this guide for more details.

2.bRun PyTest Script

Now you can define a PyTest script that will:

  • Run the flow.
  • Use Metaflow's client API to access the artifact of interest.
  • Test the artifact value is as expected.
test_simple_flow.py
import os
os.environ['METAFLOW_PROFILE'] = 'test'
from metaflow import Flow
import subprocess

def test_flow():
cmd = ['python', 'simple_flow.py', 'run', '--run-id-file', 'test_id']
subprocess.check_call(cmd)
with open('test_id') as f:
run = Flow('FlowToTest')[f.read()]
assert run.data.x == 1
pytest
    ============================= test session starts ==============================
platform darwin -- Python 3.12.4, pytest-8.2.2, pluggy-1.5.0
plugins: anyio-4.4.0
collected 1 item

test_simple_flow.py . [100%]

============================== 1 passed in 2.01s ===============================