Skip to main content

Testing a Flow with PyTest

Question

How can I use PyTest with a flow?

Solution

There are two related cases to consider:

  • Test the logic within steps.
  • Test the flow itself.

1Testing Logic in Steps

It is a helpful design pattern to move non-orchestration logic out of the actual flows and write unit tests for the component functions. In other words, if you have logic in the step of a flow like the highlight flow you can refactor it in the following way.

Here is a pseudo-code example of a flow you may want to refactor in this way.

flow_with_logic_in_steps.py
class MyFlow(FlowSpec):

@step
def start(self):
# logic A
# logic B
# logic C
self.next(self.next_step)

# rest of flow

To refactor you can first make a separate file to contain the logic that can be tested independent of the flow:

my_module.py
def do_logic():
# logic A
# logic B
# logic C

This is the suggested design pattern because now you can unit test this logic in the way you normally would, and then import it in the flow.

flow_with_logic_imported.py
class MyFlow(FlowSpec):

@step
def start(self):
from my_module import do_logic
do_logic()
self.next(self.next_step)

# rest of flow

In general, separating the implementation of the logic from the flow makes code bases leveraging Metaflow easier to maintain and test. It is a particularly useful design pattern when you have multiple flows and/or steps that import the same logic.

2Testing a Flow

In the second case, suppose you have a flow you would like to write a unit test for.

In this example there is a data artifact x which is stored in self.x.

simple_flow.py
from metaflow import FlowSpec, step

class FlowToTest(FlowSpec):

@step
def start(self):
self.x = 0
self.next(self.end)

@step
def end(self):
self.x += 1

if __name__ == '__main__':
FlowToTest()

Suppose you want to test that after running the flow the artifact value is what you expect.

assert x == 1 # goal: check this is true using PyTest

To do this you can:

  • Switch your Metaflow profile to ensure tests use a separate (local) metadata and datastore.
  • Define a test file and use PyTest to test the flow.

2.aSwitch Metaflow Profiles

By default, Metaflow creates a profile for you at ~/.metaflow_config/config.json. You can make and activate a custom profile that tells Metaflow to use different metadata and datastores. For example, you can define to ~/.metaflowconfig/config_test.json like:

{
"METAFLOW_DEFAULT_DATASTORE": "local"
}

to separate data from test runs from your actual runs.

2.bRun PyTest Script

Now you can define a PyTest script that will:

  • Run the flow.
  • Use Metaflow's client API to access the artifact of interest.
  • Test the artifact value is as expected.
test_simple_flow.py
import os
os.environ['METAFLOW_PROFILE'] = 'test'
from metaflow import Flow
import subprocess

def test_flow():
cmd = ['python', 'simple_flow.py', 'run', '--run-id-file', 'test_id']
subprocess.check_call(cmd)
with open('test_id') as f:
run = Flow('FlowToTest')[f.read()]
assert run.data.x == 1
pytest
    ============================= test session starts ==============================
platform darwin -- Python 3.9.7, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
rootdir: /Users/eddie/Dev/outerbounds-docs/docs/docs/data-science/deployment
plugins: anyio-3.5.0
collected 1 item

test_simple_flow.py . [100%]

============================== 1 passed in 1.65s ===============================