New in Outerbounds: Access NVIDIA Cloud GPUs

Authors

We are thrilled to unveil the latest result from our collaboration with NVIDIA: Outerbounds customers now have the ability to directly access GPU resources from the large-scale NVIDIA DGX Cloud by simply adding @nvidia in their workflows. If the prospect of easily accessing cloud GPUs appeals to you, sign up here to join the waitlist!

GPU poor no more

Over the past six months, we have released a number of features to improve the handling of demanding compute workloads on Outerbounds. These enhancements include expanding support for popular ML frameworks and distributed training, introducing new tools for observability, and implementing cost reporting and optimization. We have also developed patterns for inferencing with NVIDIA Triton and, most recently, introduced support for multi-cloud compute. Additionally, we've provided numerous examples showcasing how to leverage the RAG pattern, open-source LLMs, and other foundation models.

All these features depend on one key constraint: access to GPUs. We have assisted a number of customers in accessing GPUs cost-effectively, both through their cloud providers and on-premise. Yet, as anyone who attempted to access A100s & H100s last year knows, the hardware has not been broadly available — at a reasonable price.

Our mission is to help you build more ML and AI systems better and faster, motivating us to remove bottlenecks that limit your ability to innovate. Today, there is one less bottleneck on your way, as we offer you direct access to NVIDIA DGX GPU cloud, thanks to our ongoing collaboration with NVIDIA.

The magic of @nvidia

The feature itself, @nvidia, works in a delightfully straightforward manner:

@kubernetes(cpu=2)
def start(self):
    self.instruction_set = load_data(...)
    self.next(self.tune_llama2)

@pypi(packages=LLAMA2_PACKAGES)
@gpu_profile(interval=1)
@nvidia(instance_type='A100')
@step
def tune_llama2(self):
    ...

If you are not familiar with Metaflow, take a look at how Metaflow handles cloud compute in general. The snippet illustrates Llama2 fine-tuning at a high-level:

  • The start step downloads an instruction set for fine-tuning. Importanty, notice how this step runs on Outerbounds' built-in @kubernetes cluster as usual, as it doesn't require GPU resources.
  • The tune_llama2 step is powered by a handful of decorators:

Let's run the flow simply by executing

python llama_tune.py run

and observe how we grab eight A100 GPUs from the NVIDIA DGX Cloud in less than 15 seconds 🔥

@nvidia in action: Eight A100 GPUs allocated to a task

Compute is necessary but not sufficient

Together with data, we consider compute to be one of the foundational building blocks of ML/AI systems, as depicted in our full-stack of ML/AI infrastructure:

Having a solution for compute at a sufficient scale is essential - @nvidia may be a good match if GPUs are required. However, merely possessing a data center filled with GPUs does not create a functional ML/AI system. For that, the additional layers of the infrastructure stack are also necessary.

Crucially, our @nvidia decorator integrates seamlessly with the rest of the infrastructure: It allows you to securely and efficiently feed your models data from your data warehouses, include GPUs as part of your production workflows, and track and observe everything along the way. For those looking to build production-grade systems powered by @nvidia, it's reassuring to know that everything operates within secure and compliant environments.

Sign up today!

Today, we announced @nvidia at the NVIDIA GTC conference. This feature is brand new, and we are eager to discover how it integrates into your use cases!

If you're interested in exploring this new avenue to running your GPU experiments and production workloads at scale, sign up for the waitlist, and we will contact you shortly!