Earlier this week, we released features that help you develop and scale ML/AI more effectively. Today, we focus on three new features related to deploying ML/AI into production:
- The foundations 🏗️: Use Outerbounds Perimeters and perimeter-specific security policies to secure production deployments and isolate them from development environments.
- Automated workflows 🤖: Deploy and operate event-triggered, highly-available production workflows confidently with new UIs.
- Human workflows with Outerbounds Apps 👀: Create and deploy custom dashboards and other services to test and observe results using your favorite tools, like Streamlit, FastAPI, and others - securely authenticated with your SSO provider.
To motivate these features, remember that a big part of production concerns relate to engineering, i.e. the yellow-hat side of the diagram:
Like we discussed in our post on Monday, just focusing on the development experience and scalability might make the purple hats happy, but to deliver continuous value with ML and AI, it is essential to address core concerns such as security, data governance, high availability, and compliance. Hence, let’s start with the foundations.
The foundations
Outerbounds always deploys within your cloud account(s) using the Bring-Your-Own-Cloud (BYOC) deployment model:
- All data, metadata, and processing remain securely within your cloud accounts, i.e the top box in the diagram. As no data leaves your account, concerns related to compliance and privacy are greatly reduced.
- Crucially, this promise applies to the state-of-the-art GenAI models, such as the latest LLMs, as well, thanks to our native integration with NVIDIA NIM, which provides these models as highly optimized, cost-effective services inside your deployment.
- Optionally, you can manage consistent development environments powered by VSCode and notebooks securely inside your cloud accounts too, thanks to Outerbounds Workstations.
- All access within Outerbounds is authorized and authenticated with your SSO provider such as Google, GitHub, Azure AD, Okta, or others.
- Outerbounds is fully managed through a control plane (the bottom box), so you can free up engineering resources from maintaining the foundational infrastructure.
Let’s zoom in the middle box where most action, workstations and scalable compute, take place. By default, Outerbounds allows you to run Metaflow flows both in the cluster and on workstations without any additional configuration - simple enough! A common follow-up question is: How can our workloads access data and services outside Outerbounds?
New: Perimeters and Perimeter Roles
Often, organizations have existing IAM roles and policies that define permissions for their users and workloads. Instead of asking your security team to redo and audit new policies, we make it easy for them: just Bring Your Own Policies.
Specifically, you can just take your existing roles, the yellow box below, and configure them to be used for workloads and workstations on Outerbounds:
You can use all your standard tools, such as Terraform, to manage the roles and policies that govern access to data, secrets, and other services in your environment. You just need to configure a trust policy that allows workloads on Outerbounds to use the role. If you don't know how, don’t worry! The new integrations UI contains step-by-step instructions for this.
Policies are rarely one-size-fits-all. Different teams may require access to different data. Also, it is advisable to separate production, staging, and testing environments, connected via CI/CD pipelines. Or, in regulated industries you may need to, for instance, segregate geographic entities to meet compliance requirements.
Outerbounds Perimeters enable you to meet these requirements by creating fully isolated environments within the platform. Importantly, each perimeter can have their own IAM roles attached to them, allowing you to use your existing policies to separate, say, data access in production and development environments.
Should you have a more advanced use cases that needs more granular permissioning within a perimeter, even at the level of an individual Metaflow step, you can make it happen with a decorator:
@iam_role(role_arn="arn:aws:iam::123456789012:role/role-name")
As a practical example, customers have used the feature to set up consistent policies for all their data scientists using notebooks on workstations, so that they can access the data and services they need without having to pay any attention to access keys or other permissioning. From the point of view of developers, everything just works.
New: Improved UIs for managing event-triggered production workflows
One of the key features of Outerbounds is the ability to create event-triggered production deployments, which can be used to set up continuous training pipelines, automatically updating batch inference, or to compose sophisticated reactive systems out of modular components - such as those powering Netflix behind the scenes.
Given the importance of the feature, we improved all views that govern deployment and operation of event-triggered workflows. Take a look at this quick tour:
As shown in the video, with a few clicks you can:
- Establish a secure connection to a data source, such as Snowflow, Databricks, and open data lake, or a database. Naturally, these connections follow the perimeter boundaries described above.
- Receive real-time events whenever new data is available in the data source. There are flexible ways for configuring the policies for what counts as “new data” exactly.
- Deploy workflows that react to new data automatically, fetching the latest data securely through the established connection.
We have many more improvements coming on this front over the coming months, so stay tuned!
New: Host Streamlit, Plotly, and other apps on Outerbounds
Judging by the response from early users, this could be the most exciting new feature for many! You can now host Streamlit, Plotly Dash, FastHTML, and any other such apps on Outerbounds, securely behind an SSO-based authentication. It couldn’t be easier:
Crucially, you can limit access to everyone in the deployment, users in the same perimeter, or just yourself.
There's a bigger story behind Outerbounds Apps. As of today, they support three main use cases:
- Internal dashboards with Streamlit etc.
- Internal UIs that can trigger runs on Outerbounds (e.g. using the recently released async runner API) or to support human-in-the-loop workflows, such as LLM evaluations (see the first video in this blog post).
- Development of real-time inference endpoints with FastAPI, BentoML, NVIDIA Triton, and others.
Stay tuned for examples!
Did I hear real-time inference?
Regarding real-time inference, we are targeting a specific pattern with today’s release: You have an existing solution for real-time inference, e.g. Sagemaker or Vertex AI endpoints, which already integrate with Outerbounds, but you want to develop and test inferencing rapidly on workstations without having to wait for minutes for a production deployment to happen.
By deploying a test endpoint as an app, you can test the results easily, and let others hit your endpoint securely as well. Once everything works to your liking, you can ship your code and models into production.
This pattern applies to real-time inference for custom ML models, which is a surprisingly diverse topic in itself. If you are interested in inferencing with GenAI / LLM models, we support NVIDIA NIM with fine-tuning (read more here and here), so you don’t have to get into the weeds of LLM inferencing which is a very rapidly moving field.
Stay tuned for more updates!
Start deploying today
To recap our action-packed launch week, you are now able to
- Develop with
- Guided journeys that help you build end-to-end ML/AI systems for the real world, and
- blazing fast, automated Docker image builds - no infrastructure, no boilerplate required.
- Scale with
- New decorators,
@checkpoint
,@huggingface_hub
, and@model
which allow you to train and fine-tune even large models smoothly, - distributed training as an integral part of your overall system, not as a separate island, and
@slurm
, allowing you to modernize your existing HPC environments without migrations.
- New decorators,
- Deploy with
- Secure perimeters that define the outer bounds for various teams, projects, and workloads using your existing security policies, so developers can avoid all security-related boilerplate,
- event-triggered workflows, securely connected to your data sources, and
- domain-specific transparency and control, enabled by custom apps and dashboards on Outerbounds, built using your favorite tools.
Instead of reading about swimming, you get a much better idea by dipping your toes in the water, in your own environment, with your own code, data, and models. It takes only 15 minutes to get started!
While this release week is nearing the end, there’s much more to come! To spark your creativity, join our webinar with NVIDIA tomorrow and an in-depth fireside chat about bleeding edge AI research on Tuesday with Hailey Schoelkopf from EleutherAI.