Skip to main content

Build a Custom Docker Image

Question

How can I build a custom docker image to run a Metaflow step?

Solution

Metaflow has decorators to run steps on remote compute environments like @batch and @kubernetes. The environments these jobs run in can both be created from a Docker image.

In some circumstances you may need to create your own image for running a step or a flow. In this case there are a few things to consider when building your image.

Specify an Image in a Flow

First it is important to mention how Metaflow knows which image to use. You can read more about using a custom image here. If you do not specify the image argument like @batch(image="my_image:latest"), Metaflow will look to see if you have configured a default container image for the compute plugin you are using in the METAFLOW_DEFAULT_CONTAINER_IMAGE variable.

If this configuration is not specified and you do not specify the image argument in the decorator, the image is built from the official Python image for the version of Python you are using in your local environment.

1Write a Dockerfile

Docker images are built using a Dockerfile. When building one to use with Metaflow there are a few considerations to keep in mind.

Base Image

A minimum requirement is that you will need Python in the image - we suggest starting from an official Python image. For example, you can add the following at the start of your Dockerfile:

FROM python:3.10

The image should come with standard CLI tools like tar, so we suggest avoiding starting the Dockerfile with FROM scratch.

User Permissions

Metaflow needs to be able to write in the working directory. In the Dockerfile this concerns the WORKDIR and USER commands. You should make sure that the user running commands can write in the working directory, especially when you do explicitly set these in your Dockerfile. Note that many images use root user by default, and Metaflow does not so you may have to explicitly specify a non-root USER in your Dockerfile. You can use the following to check the user for your image:

docker run --rm -it <YOUR IMAGE> bash -c id

For example, by default this Python image user id is root:

docker run --rm -it python:3.10 bash -c id
    uid=0(root) gid=0(root) groups=0(root)

You can change the user in the Dockerfile like

FROM my_base_image:latest
USER my_user
...

Using ENTRYPOINT and CMD

We suggest you do not set either of these in your Dockerfile. Metaflow constructs a command to run the container for you, so defining the ENTRYPOINT too can produce unexpected errors.

Example

Here is an example of a standard Dockerfile.

  • The WORKDIR is changed and the USER has write permission.
  • The COPY command moves a requirements.txt file into the image and installs the contents. You could follow a similar copying process to install custom modules that are not on PyPi.
  • Also notice there is no CMD or ENTRYPOINT since Metaflow will override this for you anyways.
Dockerfile
FROM python:3.10

RUN mkdir /logs && chown 1000 /logs
RUN mkdir /metaflow && chown 1000 /metaflow
ENV HOME=/metaflow
WORKDIR /metaflow
USER 1000

COPY requirements.txt requirements.txt
RUN pip install --no-cache-dir --upgrade pip && \
pip install --no-cache-dir -r requirements.txt

2Build your Image

This process is not unique to Metaflow. Once you have written a Dockerfile like the one above, you can build it from the directory like:

docker build .

If you are building or running your image on MacOS, and plan to later deploy to a Linux machine, you will need to specify --platform=linux/amd64 in your build and run commands. For example, when using the EC2 instances that power AWS Batch environments you will want to make sure the image is built for the right platform. You can set the platform automatically when building and running images by using an environment variable:

export DOCKER_DEFAULT_PLATFORM=linux/amd64  

Another alternative is to specify the platform in the beginning of your Dockerfile:

FROM --platform=linux/amd64 image:tag

3Configure Metaflow to Use your Image

Once you have built your image you need to tell Metaflow to use it. This requires pushing the image to a registry that you have permission to access. For example, in AWS you might want your image to reside in ECR.

In a flow the most direct way to tell Metaflow to use this image for a step is to use the plugin decorators like @batch(image=<my_image>:<my_tag>) and @kubernetes(image=<my_image>:<my_tag>). You can also set default environment variables so Metaflow knows to look for a certain image in a specified container registry by default.

Some configuration variables to keep in mind for specifying a URI for a default image and container registry are METAFLOW_DEFAULT_CONTAINER_IMAGE and METAFLOW_DEFAULT_CONTAINER_REGISTRY.

  • METAFLOW_DEFAULT_CONTAINER_IMAGE dictates the default container image that Metaflow should use.
  • METAFLOW_DEFAULT_CONTAINER_REGISTRY controls which registry Metaflow uses to pick the image - this defaults to DockerHub.

These will then be used as a default across compute plugins. Metaflow configuration variables can be set in the active METAFLOW_PROFILE stored in ~/.metaflow-config/ or as environment variables.

For example, if your container registry is in AWS ECR you can set an environment variable like:

export METAFLOW_DEFAULT_CONTAINER_REGISTRY=<aws_account_id>.dkr.ecr.<region>.amazonaws.com

and then decorate your flow steps like:

@batch(image="image-in-my-registry:latest")
@step
def containerized_step(self):
...

Alternatively, you can specify the registry, image, and tag all in the decorator:

@batch(image="url-to-docker-repo/docker-image:version")
@step
def containerized_step(self):
...

Note that if you are manually configuring the underlying resources for remote compute plugins (as opposed to automating deployment through CloudFormation or Terraform) you will need to make ensure that the appropriate roles are available for those resources.

Further Reading