Skip to main content

Load Local Data with IncludeFile

Question

How do I load data from a local directory structure on AWS Batch using Metaflow's IncludeFile?

Solution

When using Metaflow's @batch decorator as a compute environment for a step, there are several options for accessing data. This page will show how to use metaflow.IncludeFile to access a file on AWS Batch or Kubernetes.

1Acquire Data

The example will access this CSV file from a step the runs on AWS Batch in several ways including:

local_data.csv
1, 2, 3
4, 5, 6

2Run Flow

This flow shows how to:

  • Include flow artifacts with self.little_data.
  • Use artifacts to access the contents of a local file on AWS Batch.
local_data_on_batch_include.py
from metaflow import FlowSpec, step, IncludeFile, batch

class IncludeFileFlow(FlowSpec):
data = IncludeFile('data',
default='./local_data.csv')

@batch(cpu=1)
@step
def start(self):
print(self.data)
self.next(self.end)

@step
def end(self):
print('Finished reading the data!')

if __name__ == '__main__':
IncludeFileFlow()
python local_data_on_batch_include.py run
    ...
[468/end/2406 (pid 46569)] Task is starting.
[468/end/2406 (pid 46569)] Finished reading the data!
[468/end/2406 (pid 46569)] Task finished successfully.
...

Further Reading