How can I build a custom docker image to run a Metaflow step?
Metaflow has decorators to run steps on remote compute environments like
@kubernetes. The environments these jobs run in can both be created from a Docker image.
In some circumstances you may need to create your own image for running a step or a flow. In this case there are a few things to consider when building your image.
Specify an Image in a Flow
First it is important to mention how Metaflow knows which image to use. You can read more about using a custom image here.
If you do not specify the image argument like
@batch(image="my_image:latest"), Metaflow will look to see if you have configured a default container image for the compute plugin you are using in the
If this configuration is not specified and you do not specify the
image argument in the decorator, the image is built from the official Python image for the version of Python you are using in your local environment.
1Write a Dockerfile
Docker images are built using a Dockerfile. When building one to use with Metaflow there are a few considerations to keep in mind.
A minimum requirement is that you will need Python in the image - we suggest starting from an official Python image. For example, you can add the following at the start of your Dockerfile:
The image should come with standard CLI tools like tar, so we suggest avoiding starting the Dockerfile with
Metaflow needs to be able to write in the working directory. In the Dockerfile this concerns the
USER commands. You should make sure that the user running commands can write in the working directory, especially when you do explicitly set these in your Dockerfile. Note that many images use root user by default, and Metaflow does not so you may have to explicitly specify a non-root
USER in your Dockerfile. You can use the following to check the user for your image:
docker run --rm -it <YOUR IMAGE> bash -c id
For example, by default this Python image user id is root:
docker run --rm -it python:3.10 bash -c id
uid=0(root) gid=0(root) groups=0(root)
You can change the user in the Dockerfile like
Using ENTRYPOINT and CMD
We suggest you do not set either of these in your Dockerfile. Metaflow constructs a command to run the container for you, so defining the ENTRYPOINT too can produce unexpected errors.
Here is an example of a standard Dockerfile.
WORKDIRis changed and the
USERhas write permission.
COPYcommand moves a
requirements.txtfile into the image and installs the contents. You could follow a similar copying process to install custom modules that are not on PyPi.
- Also notice there is no CMD or ENTRYPOINT since Metaflow will override this for you anyways.
RUN mkdir /logs && chown 1000 /logs
RUN mkdir /metaflow && chown 1000 /metaflow
COPY requirements.txt requirements.txt
RUN pip install --no-cache-dir --upgrade pip && \
pip install --no-cache-dir -r requirements.txt
2Build your Image
This process is not unique to Metaflow. Once you have written a Dockerfile like the one above, you can build it from the directory like:
docker build .
If you are building or running your image on MacOS, and plan to later deploy to a Linux machine, you will need to specify
--platform=linux/amd64 in your build and run commands. For example, when using the EC2 instances that power AWS Batch environments you will want to make sure the image is built for the right platform. You can set the platform automatically when building and running images by using an environment variable:
Another alternative is to specify the platform in the beginning of your Dockerfile:
FROM --platform=linux/amd64 image:tag
3Configure Metaflow to Use your Image
Once you have built your image you need to tell Metaflow to use it. This requires pushing the image to a registry that you have permission to access. For example, in AWS you might want your image to reside in ECR.
In a flow the most direct way to tell Metaflow to use this image for a step is to use the plugin decorators like
@kubernetes(image=<my_image>:<my_tag>). You can also set default environment variables so Metaflow knows to look for a certain image in a specified container registry by default.
Some configuration variables to keep in mind for specifying a URI for a default image and container registry are
METAFLOW_DEFAULT_CONTAINER_IMAGEdictates the default container image that Metaflow should use.
METAFLOW_DEFAULT_CONTAINER_REGISTRYcontrols which registry Metaflow uses to pick the image - this defaults to DockerHub.
These will then be used as a default across compute plugins. Metaflow configuration variables can be set in the active
METAFLOW_PROFILE stored in
~/.metaflow-config/ or as environment variables.
For example, if your container registry is in AWS ECR you can set an environment variable like:
and then decorate your flow steps like:
Alternatively, you can specify the registry, image, and tag all in the decorator:
Note that if you are manually configuring the underlying resources for remote compute plugins (as opposed to automating deployment through CloudFormation or Terraform) you will need to make ensure that the appropriate roles are available for those resources.
- Use a custom image in your flow
- See configuration details in: metaflow_config.py
- See where in the Metaflow code image and container registry variables are used for @batch and @kubernetes
- Building a Dockerfile for a Python environment
- Understand how CMD and ENTRYPOINT interact
- Best practices for containerizing Python applications with Docker