You can now access secrets securely in Metaflow flows using the new
@secrets decorator. This video shows how to do it in less than a minute using AWS Secrets Manager (no sound):
Consider a Metaflow flow that needs to access an external resource, say, a database requiring authentication such as a username and password. Cases like this are common.
Thus far, there have been two main ways to handle this:
Delegating authentication to the execution environment, e.g. to an IAM user executing the code.
Accessing credentials from a file or an environment variable.
The first option can be secure, easy to manage centrally, and hence preferable in many cases. Unfortunately, it is mainly applicable to a handful of services like S3 which work with IAM natively. If you want to connect to 3rd party services like Snowflake, you need another approach.
The second option works with any service but storing secrets in local files is considered bad practice for many good reasons. Locally stored secrets are hard to manage - what happens if the database password changes - and they can leak both easily and inconveniently.
Secret managers, such as AWS Secrets Manager, provide a third option that combines the best of the two approaches. They allow arbitrary secrets to be stored and managed centrally. Accessing secrets is controlled through IAM roles that are available through the execution environment. Additionally, secrets are never stored in any environment outside the manager.
Earlier, there was a small speed bump if you wanted to use a secrets manager: Accessing a secret e.g. using the
boto library takes 15-20 lines of boilerplate infastructure code which, as a data scientist, you would rather not worry about.
To make it easier to write production-ready code without cognitive overhead, Metaflow now provides a
@secrets decorator that handles this with one line. Besides being a convenient abstraction,
@secrets provides a standardized way to access secrets in all projects across environments.
Here is an example that uses the
@secrets decorator to access a secret named
db-credentials. The secret contains four key-value pairs that specify everything needed to connect to a Postgres database:
from metaflow import FlowSpec, step, secrets
from psycopg import connect
host=os.environ['DB_HOST']) as conn:
with conn.cursor() as cur:
cur.execute("SELECT * FROM data")
if __name__ == '__main__':
Assuming you have
db-credentials stored in AWS Secrets Manager, you can execute the flow on your workstation:
python dbflow.py run
or run it at scale on
@kubernetes as usual:
python dbflow.py run --with kubernetes
Often, data scientists develop and test their code and models using a non-production dataset. The
@secrets decorator supports this scenario smoothly.
Consider the above code snippet featuring
DBFlow but with the
@secrets line removed. If you have a test database deployed locally, you can simply set the environment variables manually without having to use a secrets manager. This is ok, as the local database is not accessible to anyone outside your workstation (and it shouldn't contain sensitive data, in any case):
python dbflow.py run
Alternatively, your company may have a shared database containing test data. In this case, you can access its credentials in a secret, say,
test-db-credentials, and run the flow like this:
As usual in Metaflow, the
--with option attaches the decorator to all steps without having to hardcode it
in the code.
To deploy the flow in production, you can have a CI/CD pipeline with a separate IAM role that has exclusive access to production credentials. It can deploy the flow to production like this:
In this scenario, the IAM roles assigned to data scientists may disallow access to the
prod-db-credentials altogether. The production credentials and the database are only accessible to production tasks running on Argo Workflows.
Crucially, in all these cases you don't have to change anything in the code as you moved from prototype to production.
@secrets to success
You can start using the
@secrets decorator today! For additional features and setup instructions, read the documentation for
If you need help getting started or if you have any other questions, join us and thousands of other data scientists and engineers on the Metaflow community Slack! In particular, we would like to hear from you if you would like to see support for other backends besides AWS Secrets Manager.