Testing a Flow with PyTest

Question

How can I use PyTest with a flow?

Solution

There are two related cases to consider:

Test the logic within steps.
Test the flow itself.

1Testing Logic in Steps

It is a helpful design pattern to move non-orchestration logic out of the actual flows and write unit tests for the component functions. In other words, if you have logic in the step of a flow like the highlight flow you can refactor it in the following way.

Here is a pseudo-code example of a flow you may want to refactor in this way.

flow_with_logic_in_steps.py
class MyFlow(FlowSpec):

    @step
    def start(self):
        # logic A
        # logic B 
        # logic C
        self.next(self.next_step)
        
    # rest of flow

To refactor you can first make a separate file to contain the logic that can be tested independent of the flow:

my_module.py
def do_logic():
    # logic A
    # logic B 
    # logic C

This is the suggested design pattern because now you can unit test this logic in the way you normally would, and then import it in the flow.

flow_with_logic_imported.py
class MyFlow(FlowSpec):

    @step
    def start(self):
        from my_module import do_logic
        do_logic()
        self.next(self.next_step)
    
    # rest of flow

In general, separating the implementation of the logic from the flow makes code bases leveraging Metaflow easier to maintain and test. It is a particularly useful design pattern when you have multiple flows and/or steps that import the same logic.

2Testing a Flow

In the second case, suppose you have a flow you would like to write a unit test for.

In this example there is a data artifact x which is stored in self.x.

simple_flow.py
from metaflow import FlowSpec, step

class FlowToTest(FlowSpec):

    @step
    def start(self):
        self.x = 0
        self.next(self.end)

    @step
    def end(self):
        self.x += 1

if __name__ == '__main__':
    FlowToTest()

Suppose you want to test that after running the flow the artifact value is what you expect.

assert x == 1 # goal: check this is true using PyTest

To do this you can:

Switch your Metaflow profile to ensure tests use a separate (local) metadata and datastore.
Define a test file and use PyTest to test the flow.

2.aSwitch Metaflow Profiles

By default, Metaflow creates a profile for you at ~/.metaflow_config/config.json. You can make and activate a custom profile that tells Metaflow to use different metadata and datastores. For example, you can define to ~/.metaflowconfig/config_test.json like:

{
    "METAFLOW_DEFAULT_DATASTORE": "local"
}

to separate data from test runs from your actual runs.

2.bRun PyTest Script

Now you can define a PyTest script that will:

Run the flow.
Use Metaflow's client API to access the artifact of interest.
Test the artifact value is as expected.

test_simple_flow.py
import os
os.environ['METAFLOW_PROFILE'] = 'test'
from metaflow import Flow
import subprocess

def test_flow():
    cmd = ['python', 'simple_flow.py', 'run', '--run-id-file', 'test_id']
    subprocess.check_call(cmd)
    with open('test_id') as f:
        run = Flow('FlowToTest')[f.read()]
        assert run.data.x == 1

pytest

    ============================= test session starts ==============================
    platform darwin -- Python 3.9.7, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
    rootdir: /Users/eddie/Dev/outerbounds-docs/docs/docs/data-science/deployment
    plugins: anyio-3.5.0
    collected 1 item                                                               
    
    test_simple_flow.py .                                                    [100%]
    
    ============================== 1 passed in 1.65s ===============================

Question​

Solution​

1Testing Logic in Steps​

2Testing a Flow​

2.aSwitch Metaflow Profiles​