Large Language Models and the Future of the ML Infrastructure Stack

This article shares our vision on how large language models (LLMs) will affect machine learning projects and the infrastructure stack powering them. We believe that LLMs and other foundation models will get adopted as a powerful new model family in the ML stack - augmenting, not replacing it. For technical details on how you can start experimenting with LLMs using Metaflow today, take a look at our article about training the Dolly model with Metaflow.

Large language models and computer vision have experienced their man-on-the-moon moment over the past year. There is a feeling that a whole new universe of possibilities has opened up, although no one knows exactly what will change, how, when, and by whom.

It is electrifying to witness impossible becoming possible. This can happen quickly and viscerally – just imagine witnessing the first flight by the Wright brothers, Armstrong’s first step on the moon, or now Chat-GPT. However, as with many times before, we may overestimate the impact of these technologies in the short-term and underestimate their effect in the long run.

While it is easy to see how the initial excitement produces overly vivid dreams and nightmares, the long-term effects tend to get underestimated, as it is hard to visualize what will happen when a new technology becomes gradually accessible to an increasingly diverse set of people and companies.

Although the internet and the web were very much in the public consciousness in 2001, no one could have predicted the success of TikTok or the ubiquity of internet-enabled rentable scooters. Long-term success stories tend to feel oddly mundane compared to initial utopias, as the shiny new technology gets interwoven into our daily lives, blending the past and the future.

While we can’t foresee how exactly the new models will affect our lives in the future, we know what the journey there looks like. We need to make the new technology easy to use by offering it in a human-centric package, so all curious organizations can start innovating with it independently, exploring a diverse landscape of use cases driven by their own domain knowledge and visions for the future.

Illustration from the book, Effective Data Science Infrastructure

In 1943, the president of IBM estimated that “there is a world market for maybe five computers”, which at that point had been proven to be feasible to build (at least if cost wasn’t a concern) but absolutely painful to operate (and having a questionable ROI in most use cases). Similarly, in 2023, one might estimate that the world needs only a handful of large language models that will be used through APIs, as LLMs are eye-wateringly expensive to build and painful to operate today.

Large language models and the ML infrastructure stack

As we have seen happen in the past with computers, the internet, or mobile apps, the real revolution happens when the new technology matures from being an expensive party trick, controlled by a few, to something that is easily accessible by countless companies and individuals. Learning from these past adoption curves, we can anticipate what is ahead of us.

Firstly, there’s no reason to believe that LLMs and other foundation models would change the way companies adopt technology. As with any other technical component, the adoption of LLMs requires rounds of evaluation and experimentation. Workflows using LLMs need to be developed and integrated into the surrounding systems and they need to be constantly improved, adjusted, monitored, and controlled by human beings. As before, some companies are ready to embark on this journey earlier than others.

Secondly, beyond proofs of concept, new technology doesn’t give a free pass to provide unpredictable or undesirable user experiences. LLMs will become a new tool in the toolbox, enabling companies to deliver delightful and differentiated experiences, which some companies will learn to leverage better than others. In particular, companies in regulated industries need to treat LLMs as carefully as any other part of their tech stack, as regulators will surely have their spotlight on AI.

From this point of view, LLMs are models amongst other models, a layer amongst other layers, in the ML infrastructure stack:

In our original illustration of the stack above, the data scientist on the left mostly focused on the concerns closely related to the modeling, and gradually less on the lower-level infrastructure, while infrastructure providers, such as we at Outerbounds or internal platform teams, have an inverse focus. Lately, LLMs have made everyone’s focus more extreme, as depicted below:

Most of the discussion has revolved around the models themselves and the exciting demos they have enabled. Or, on the infrastructure side, the focus has been on the massive-scale datasets or the immense amount of compute power that it takes to train the models. The operational concerns in the middle have received less attention. We believe that is about to change, as companies start seriously considering how to use these models in practice.

ML projects in 2025 and beyond

We believe that LLMs and other foundation models will get adopted as a powerful new model family in the ML stack – augmenting, not replacing it. In some cases, especially when text and images are involved, the new techniques will enable whole new use cases. In other cases, they will be used to improve existing ML models and data science projects, as depicted below.

Depending on the use case, companies choose to own some models, like a fine-tuned LLM or an embedding-producing LLM based on a common foundation model in the illustration. In other cases, it makes sense to use a vendor-owned model through an API. In all cases, companies want to carefully consider what parts of the system they want to own and control within their governance boundary, and what functionality can be safely outsourced to a 3rd party vendor.

All these systems will be engineered by humans by combining data, code, and models. Correspondingly, all the fundamental concerns addressed by the ML infrastructure stack remain intact: The systems need easy access to data and compute. They are structured as workflows, built iteratively through multiple versions. The results are deployed in various ways, leveraging models that are best suited for each job.

Today, some believe that LLMs and other foundation models will require a fundamentally different approach compared to other ML models. Maybe the new models are too large, complex, and expensive to be handled by most companies. Or maybe a universal model can be provided as a service, alleviating the need for companies to deal with their own.

We believe that there are two strong tailwinds acting as counterforces:

  1. On the engineering side, the collective force of academia, vendors, and non-profits will produce permissively-licensed, optimized foundation models, datasets, and libraries, which will make training and fine-tuning of domain-specific models radically cheaper and hence attainable to reasonable-scale companies. In addition, techniques like model distillation make it possible to reduce the size and cost of production models radically.
  2. On the modeling side, many companies will realize that they can – and they must – treat AI as a part of their usual ML initiatives. They want to retain control over their data, product experience, and proprietary IP as before, keeping their custom models inside their governance boundary.

As a result, more companies are able and willing to innovate independently. It is likely that we will see smaller models built by companies with deep domain expertise outperform generic models built without domain expertise. The new AI techniques will become a competitive advantage to companies that learn to integrate them seamlessly into their existing products and workflows.

Next steps

An important implication of the new AI techniques becoming a part, not a replacement, of the ML infrastructure stack is that you can continue working on modernizing your data and ML stacks with future-proof tools like open-source Metaflow. Realistically, a vast majority of companies are still early on their ML journey, so it makes sense to start by laying a solid foundation.

All work towards this goal remains relevant and will pay dividends once you are ready to start adopting the latest AI techniques. If you want a solid foundation with minimal engineering effort, we can get you started quickly with the managed Outerbounds Platform which provides an easy-to-use, optimized, and secure environment for all ML/AI experiments and production applications.

To learn more about Metaflow and foundation models, take a look at our newest article on the topic: Training a large language model with Metaflow, featuring Dolly which demonstrates how to fine-tune a state-of-the-art LLM, Databricks’ Dolly, using Metaflow. You can also read our previous articles about image generation using Stable Diffusion and text-to-speech translation using Whisper.

PS. To learn more about LLMs in the context of business use cases, join our next fireside chat on May 4th!

Authors

Start building today

Join our office hours for a live demo! Whether you're curious about Outerbounds or have specific questions - nothing is off limits.


Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.