Skip to main content

Access Parent Directories from a Flow

Question

How do I import common packages from a parent directory in a flow?

Solution

From your Metaflow code, you can access higher-level directories by adding a symlink. The symlinks in this example are represented by arrows pointing to the common package defined in the parent directory ../common_utils.

flow_type_1
flow1.py
common_utils -> ../common_utils
flow_type_2
flow2.py
common_utils -> ../common_utils
common_utils
__init__.py
some_module.py

Since Metaflow version 2.5.2, you can add symlinks in the directories containing your flow script, and Metaflow will dereference the symlinks and include the contents (to ../common_utils in this case) with your code. This allows you to import from common_utils inside flow steps like those in the flow1.py script whether they are run locally or remotely.

The rest of this page goes through an example that uses symlinks in this way.

1Define Common Functionality

You can define generic functionality in a custom Python package and reuse it across flows in the directory structure shown above. Here is a definition for a function that will be used in flow_type_1/flow1.py:

common_utils/some_module.py
import os

def general_function():
return os.getcwd()

Then you can import the function in the initialization module for the common_utils package:

common_utils/__init__.py
from .some_module import general_function

In order to import general_function from the common_utils package in a flow, you need to add a symlink from the directory containing your flow script to the higher-level package you want to import in the flow.

The ln command is a standard Unix way to do this. It has an argument -s that creates a symbolic link between the file at ../common_utils and flow_type_1/common_utils:

ln -s ../common_utils flow_type_1/common_utils

3Write a Flow that Imports the Function

This flow imports the common_utils package in the start step. This works because of the symlink to the common_utils package from the last section.

flow_type_1/flow1.py
from metaflow import FlowSpec, step
import os

class Flow1(FlowSpec):

@step
def start(self):
from common_utils import general_function
self.result = general_function()
self.next(self.end)

@step
def end(self):
pass

if __name__ == "__main__":
Flow1()

4Run the Flow

python 'flow_type_1/flow1.py' run
     Workflow starting (run-id 1666836481231665):
[1666836481231665/start/1 (pid 81716)] Task is starting.
[1666836481231665/start/1 (pid 81716)] Task finished successfully.
[1666836481231665/end/2 (pid 81719)] Task is starting.
[1666836481231665/end/2 (pid 81719)] Task finished successfully.
Done!

Further Reading