Table of Contents

7 Ways to Use Outerbounds with Snowflake

January 23, 2025

Learn how Outerbounds integrates with Snowflake to help you develop fully observable, scalable, and cost-efficient AI/ML systems and workflows.

For many teams, ensuring secure data access is a top priority. Since Snowflake is the primary data warehouse for many of our customers, we’ve made significant investments in creating a first-class integration between Snowflake and Outerbounds.

Unlike most data processing tools that merely provide a connection to access data in Snowflake - often leaving you to handle shared secrets - Outerbounds takes integration to the next level. It enables you to process data at any scale, train models, and build AI applications, all while seamlessly working with live data.

Seeing is believing

Our ethos is to strike a thoughtful balance between freedom and responsibility across every aspect of the platform. Developers should be empowered to experiment quickly and effortlessly - especially in the fast-evolving world of AI - while remaining within the outer bounds of the company's data governance and policy frameworks.

This is one of those cases where showing is more convincing than telling, so we’ve prepared four short videos to demonstrate common patterns when using Snowflake with Outerbounds. If you want to code along or take a deeper look at the code, take a look at the example repository, weather-flowproject featured in the videos.

Manage access to Snowflake without passwords

Let’s begin with the most fundamental question: How do I access a Snowflake database on Outerbounds? This video guides you through the simple steps to set it up, which utilizes Snowflake’s Security Integration under the hood:

Here are the key takeaways from the clip:

1. Use named integrations to simplify access management

You can define any number of named integrations, enabling you to manage multiple permission boundaries - for instance, separating different projects - without placing the burden of access management on developers. There are no passwords to manage - permissioning can be managed on the Snowflake side as usual, which is a delight for security teams.

Crucially, these integrations are perimeter-specific, allowing workflows to access the appropriate data based on their environment (e.g. production vs. testing) without cluttering the codebase with conditional logic.

‍

2. Ensure consistent data access from local development to production

Few things are more frustrating than code that works perfectly in a local environment but breaks when scaled to a remote environment - often due to missing credentials or misapplied policies in tasks running remotely. Outerbounds eliminates this headache by ensuring consistent data access across all environments, from local laptops and workstations to scaled-out compute clusters and production deployments.

Build and iterate reactive data pipelines

Outerbounds is a platform for building real-world AI and ML systems, that is, orchestrating the interplay between data, code, and models with minimal human intervention. Systems like these are developed iteratively, so it is beneficial to adopt software development best practices, such as GitOps and CI/CD, right from the start. Take a look how this works in the context of Snowflake:

Here are the takeaways:

3. Utilize event-triggering to build reactive systems

Event-triggered workflows are core building blocks of systems that react to updating data automatically. You can generate events from Snowflake either by watching changes in tables periodically, as exemplified in the clip, or by initiating events in Snowflake through a stored procedure.

Events carry payloads that can indicate changes in data, allowing you to build systems that process (mini-)batches of data continuously. By using the @trigger_on_finish decorator, you can build arbitrarily advanced data pipelines by combining modular workflows that seamlessly pass data from one to another.

‍

4. Adopt CI/CD and continuous delivery on the day one

The pace of software development changed drastically with the advent of continuous delivery which made it possible to deploy any number of branched versions to (semi-)production quickly and effortlessly. Just consider the simplicity of deploying a pull request as a live website using a tool like Vercel, without having to worry about branches interfering with one another.

Data pipelines should be no different. Just open a pull request and let CI/CD like GitHub Actions deploy a whole set of interconnected pipelines as an isolated deployment. This enables teams to test new ideas and improve existing systems continuously at unprecedented speeds.

If you are worried about the cost of branched data processing, keep on reading.

Build smarter systems, fuelled by inexpensive and abundant compute

Historically, much of data engineering has been focused on orchestrating data pipelines with systems like Airflow. While workflow orchestration remains a core component of data systems, many organizations are realizing that the question of compute - where and how data processing happens in practice - is becoming even more central. This is driven by three major trends:

AI is compute-hungry, but it can produce tremendous amounts of value. In 2025 alone, capital commitments to expanding compute capacity nearly equal the cost of the Apollo Program. The compute landscape is undergoing a monumental shift which will affect all applications, not just AI hyperscalers.
The Rise of High-Performance Data Processing - the Big Data of 2020 is not that big anymore. Thanks to highly optimized tools like Polars and DuckDB, you can shift many parts of data processing closer to the end applications, striking a healthy balance between the benefits of a centralized warehouse like Snowflake and highly efficient, easily distributable processing.
Cost-consciousness - many organizations would like to innovate faster without being bottlenecked by compute resources, but there is little appetite to increase budgets. Teams are becoming increasingly conscious about costs and ways to optimize them.

Hence, enabling easy and cost-efficient access to compute is one of the key features of Outerbounds. Using Outerbounds with Snowflake lets you seamlessly select the most suitable and cost-efficient compute platform for each task, as demonstrated by this example that computes hourly weather forecasts:

Takeaways:

5. Unlock cost-efficient compute at any scale

Outerbounds deploys on your cloud account, or accounts across clouds, allowing you to leverage lowest cost cloud resources without additional margins. Should you need GPU resources, Outerbounds integrates with various GPU providers, including NVIDIA’s own GPU cloud.

Building on top of cost-efficient compute capacity, Outerbounds provides a software layer that allows you to scale out workloads easily, including automatic containerization, so you can process data, train models, fine-tune LLMs, run batch inference, or perform demanding distributed training of massive-scale models, as easily as writing any Python code.

This allows you to leverage Snowflake for its strength - executing SQL conveniently at scale - while handling additional processing and training on Outerbounds.

‍

6. Welcome humans in the loop with hosted apps

As seasoned systems engineers often say, a system is only as effective as its observability tools. You can provide visibility to every workflow running on Outerbounds in real-time through Metaflow Cards. This allows you to observe the status of processing and get an overview of results, conveniently right in the UI that shows all current and past executions.

Recently, we released support for hosted apps on Outerbounds, so you can host your Streamlit, Plotly, Gradio, and other similar dashboards behind an authentication wall. These apps can access data from Snowflake through the integration introduced above, as well as access versioned results of any workflows through the Metaflow Client API.

As a result, you can equip your systems with real-time visualizations as well as interactive dashboards with a few lines of Python.

Compliance, covered

A key benefit of a mature data warehouse like Snowflake is its focus on data governance and compliance, making it suitable for managing sensitive data. Understandably, organizations need to be careful in building applications around such data assets.

Often, there are specific aspects of data that require special attention, such as any Personally Identifiable Information (PII) which may need to be managed strictly within your existing governance boundaries in Snowflake. However, once sensitive columns are filtered out or anonymized, you may move beyond the locked-in environment - while respecting the permissioning and access control defined in Snowflake, as enabled by Outerbounds’ Snowflake integration. Any subsequent processing happening on Outerbounds is governed by our HIPAA and SOC2 certifications.

To support sensitive data processing, Outerbounds integrates with Snowflake Container Services which makes it possible to execute steps of a workflow inside Snowflake. Simply annotate your step with @snowpark, as shown here:

This highlights our last pattern:

7. Run sensitive tasks inside Snowflake with @snowpark

The @snowpark decorator allows you to run arbitrary Python code inside your Snowflake warehouse, subject to the governance boundaries managed and enforced by Snowflake. Thanks to this feature, you can perform data anonymization or other sensitive data extraction as a seamless part of your workflow.

In contrast to manually defined Snowpark jobs, a major benefit of @snowpark is that it is fully integrated with Metaflow and Outerbounds: You can use artifacts, cards, and all other Metaflow features in Snowpark jobs. In particular, you can benefit from Metaflow’s built-in dependency management and use @pypi and @conda to include any libraries needed, which provides a much smoother development experience than having to craft Snowpark jobs by hand.

Conclusion

The combination of Snowflake and Outerbounds offers the best of both worlds: a mature data warehouse with robust data governance and SQL capabilities, seamlessly integrated with an AI/ML-native platform that enables the development of fully observable, scalable, and cost-efficient AI/ML systems with battle-hardened Metaflow.

If you are curious to test any of the features covered above,

Use named integrations to simplify access management,
Ensure consistent data access from local development to production,
Utilize event-triggering to build reactive systems,
Adopt CI/CD and continuous delivery on the day one,
Unlock cost-efficient compute at any scale,
Welcome humans in the loop with hosted apps,
Run sensitive tasks inside Snowflake with @snowpark,

You can start a free trial of Outerbounds today!