Skip to main content

Deploying to Google Cloud with Kubernetes

This page shows how to deploy a complete Metaflow stack powered by Kubernetes on Google Cloud. For more information about the deployment, see deployment details, advanced options and FAQ.

1. Preparation

Terraform Tooling

Terraform is a popular infrastructure-as-code tool for managing cloud resources. We have published a set of terraform templates here for setting up Metaflow on GCP. Terraform needs to be installed on your system in order to use these templates.

  1. Install terraform by following these instructions.
  2. Download Metaflow on GCP terraform templates: git clone git@github.com:outerbounds/metaflow-tools.git

GCloud Command Line Interface

This is the official CLI tool ("gcloud") published by Google for working with GCP. It will be used by Terraform when applying our templates (e.g. for authentication vs GCP). Please install it by following these instructions.

kubectl Command Line Interface

kubectl is a standard CLI tool for working with Kubernetes clusters. It will be used by Terraform when applying our templates (e.g. for deploying some services to your Google Kubernetes Engine (GKE) cluster). Please install it by following these instructions.

2. Provision GCP Resources

See here for the exact set of resources to be provisioned. Also, note the permissions that are needed.

Enable Google Cloud APIs

You need to manually enable APIs used by the Metaflow stack on the Google Cloud console. Make sure that the following APIs are enabled:

  • Cloud Resource Manager
  • Compute Engine API
  • Service Networking
  • Cloud SQL Admin API
  • Kubernetes Engine API

If you have used the account/project for other deployments in the past, it is possible that these APIs are already enabled. Also note that enabling these APIs automatically enables a bunch of other required APIs.

Login to GCP

You must be logged onto GCP as an account with sufficient permissions to provision the required resources. Use the GCloud CLI (gcloud)

gcloud auth application-default login

Initialize your Terraform Workspace

From your metaflow-tools/gcp/terraform directory, run:

terraform init

Set Terraform Variables

Create a FILE.tfvars file with the following content (updating relevant values):

org_prefix = "<ORG_PREFIX>"
project = "<GCP_PROJECT_ID>"

For org_prefix, choose a short and memorable alphanumeric string. It will be used for naming the Google Cloud Storage bucket, whose name must be globally unique across GCP.

For GCP_PROJECT_ID, set the GCP project ID you wish to use.

You may rename FILE.tfvars to a more friendly name appropriate for your project. E.g. metaflow.poc.tfvars.

The variable assignments defined in this file will be passed to terraform CLI.

Optional: Enable Argo Events

To enable event triggering for Metaflow, add the following line in FILE.tfvars:

enable_argo=true

For more technical context, see this page about event triggering.

Optional: Enable Airflow

Optionally, you can include Apache Airflow as the production orchestrator for Metaflow in your deployment by including the following lines in FILE.tfvars:

deploy_airflow=true

Setting deploy_airflow=true will deploy Airflow in the GKE cluster with a LocalExecutor.

Apply Terraform Template to Provision GCP Infrastructure

From your local metaflow-tools/gcp/terraform directory, run:

terraform apply -target="module.infra" -var-file=FILE.tfvars

A plan of action will be printed to the terminal. You should review it before accepting. See details for what to expect.

Common Resource Provisioning Hiccups

Cloud SQL instance name conflicts

Cloud SQL instance (the "PostgreSQL DB") names must be unique within your GCP project - including instances that have been deleted within the last 7 days. It means that if you should want to reprovision the entire set of GCP resources within that time window, a fresh name must be chosen. In this scenario, please update the DB generation variable here.

3. Deploy Metaflow Services to GKE cluster

Apply Terraform Template to Deploy Services

From your local metaflow-tools/gcp/terraform directory, run:

terraform apply -target="module.services" -var-file=FILE.tfvars

4. End User Setup Instructions

When the command above completes, it will print a set of setup instructions for Metaflow end users (folks who will be writing and running flows). These instructions are meant to get end users started on running flows quickly.

You can access the terraform instruction output at any time by running (from metaflow-tools/gcp/terraform directory):

terraform output -raw END_USER_SETUP_INSTRUCTIONS

If the output is not available, run

terraform apply -var-file=FILE.tfvars

and try the output command again.

Sample Output

Setup instructions for END USERS (e.g. someone running Flows vs the new stack):
-------------------------------------------------------------------------------
There are three steps:
1. Ensuring GCP access
2. Configure Metaflow
3. Run port forwards
4. Install necessary GCP Python SDK libraries

STEP 1: Ensure you have sufficient access to these GCP resources on your local workstation:

- Google Kubernetes Engine ("Kubernetes Engine Developer role")
- Google Cloud Storage ("Storage Object Admin" on bucket ob-metaflow-storage-bucket-ci)

Option 1: Login with gcloud CLI

Login as a sufficiently capabable user: $ gcloud auth application-default login.

Option 2: Use service account key

Ask for the pregenerated service account key (./metaflow_gsa_key_ci.json) from the administrator (the person who stood up the Metaflow stack).
Save the key file locally to your home directory. It should be made to be accessible only by you (chmod 700 <FILE>)

Configure your local Kubernetes context to point to the the right Kubernetes cluster:

$ gcloud container clusters get-credentials metaflow-kubernetes-ci --region=us-west2

STEP 2: Configure Metaflow:

Option 1: Create JSON config directly (recommended)

Create the file "~/.metaflowconfig/config.json" with this content. If this file already exists, keep a backup of it and
move it aside first.

{
"METAFLOW_DATASTORE_SYSROOT_GS": "gs://ob-metaflow-storage-bucket-ci/tf-full-stack-sysroot",
"METAFLOW_DEFAULT_DATASTORE": "gs",
"METAFLOW_DEFAULT_METADATA": "service",
"METAFLOW_KUBERNETES_NAMESPACE": "default",
"METAFLOW_KUBERNETES_SERVICE_ACCOUNT": "metaflow-service-account",
"METAFLOW_SERVICE_INTERNAL_URL": "http://metadata-service.default:8080/",
"METAFLOW_SERVICE_URL": "http://127.0.0.1:8080/"
}

Option 2: Interactive configuration

Run the following, one after another.

$ metaflow configure gs
$ metaflow configure kubernetes

Use these values when prompted:

METAFLOW_DATASTORE_SYSROOT_GS=gs://ob-metaflow-storage-bucket-ci/tf-full-stack-sysroot
METAFLOW_SERVICE_URL=http://127.0.0.1:8080/
METAFLOW_SERVICE_INTERNAL_URL=http://metadata-service.default:8080/
[For Argo only] METAFLOW_KUBERNETES_NAMESPACE=argo
[For Argo only] METAFLOW_KUBERNETES_SERVICE_ACCOUNT=argo

Note: you can skip these:

METAFLOW_SERVICE_AUTH_KEY
METAFLOW_KUBERNETES_CONTAINER_REGISTRY
METAFLOW_KUBERNETES_CONTAINER_IMAGE

STEP 3: Setup port-forwards to services running on Kubernetes:

option 1 - run kubectl's manually:
$ kubectl port-forward deployment/metadata-service 8080:8080
$ kubectl port-forward deployment/metaflow-ui-backend-service 8083:8083
$ kubectl port-forward deployment/metadata-service 3000:3000
$ kubectl port-forward -n argo deployment/argo-server 2746:2746

option 2 - this script manages the same port-forwards for you (and prevents timeouts)

$ python metaflow-tools/scripts/forward_metaflow_ports.py [--include-argo]

STEP 4: Install GCP Python SDK
$ pip install google-cloud-storage google-auth