Skip to main content

Kubernetes on Azure - Advanced Options

Here are some advanced options for deploying Metaflow on Azure.

Remote state backends for Terraform

Terraform manages the state of Azure resources in tfstate files locally by default.

If you plan to maintain the minimal stack for any significant period of time, it is highly recommended that these state files be stored in cloud storage (e.g. Azure Blob Storage) instead.

Some reasons include:

  • More than one person needs to administer the stack (using Terraform). Everyone should work off a single copy of tfstate.
  • You wish to mitigate the risk of data loss on your local disk.

For more details, see Terraform docs.

Deploying Multiple Metaflow Stacks

If you want to run more than one instance of this stack, you can use Terraform workspaces.

Authenticated Public Endpoints for Metaflow Services

The deployment approach taken by the terraform templates minimizes publicly accessible surface area. Only the AKS Kubernetes API is available publicly. This allows authorized users (through the secure Kubernetes API) to:

  • Inspect cluster's workloads
  • CRUD Kubernetes objects (e.g. submit job pods).

However, this deployment style does not include publicly accessible endpoints for the web services running within the AKS cluster. For the purpose of this sample deployment template, users must use the Kubernetes API to set up port-forwarding in order to access these services from their workstations.

For a more friendly user experience, publicly accessible endpoints can be authenticated and authorized using technologies like:

  • OIDC (OpenID Connect)
  • JWT tokens
  • Identity-as-a-service providers (e.g. Auth0).

Please talk to us for more information about this topic.

AKS Workload Identities

In the Metaflow stack generated by these terraform templates, all Metaflow workloads running within AKS access Azure resources as a specific Service Principal identity. For finer grain control, it is possible to map separate identities to each workload and Metaflow task. The strategy currently recommended by Azure is Azure AD Workload Identity.