Today, we are launching a knowledge base of free, practical data science materials at outerbounds.com. Take a look at the initial set of how-to articles and topic guides aimed at practicing data scientists, as well as materials that helps engineers set up a modern data science infrastructure based on Metaflow, an open-source project actively developed by us.
Let’s say you are a data scientist who is asked to design and build a recommendation system. The following story applies to other real-life data science applications too, but recommendation systems are a fun, well-known application of machine learning.
Countless papers and articles have been written about recommendation systems, so even if you haven’t built one before, the challenge seems to be mainly about figuring out what goes inside this circle:
After reading a bunch of papers about modern modeling techniques, you realize that besides models, you need to solve a number of related problems before the company’s customers can actually benefit from personalized recommendations:
You need to find out how, where, and what data the models should use. Also, you need to figure out how the recommendations are integrated into the company’s product in practice. And, as your colleagues point out, everyone should use a common development environment that allows the whole team to experiment and develop the system together.
As you meet with various stakeholders – infrastructure engineers, product managers, data engineers, and fellow data scientists – you learn about a whole new set of concerns and requirements:
The picture is starting to look much more complex! It would be tempting to ignore some of these tasks, or at least outsource them to someone else, but all of them are fundamentally connected to the question of providing high-quality recommendations. Maybe the best option is to roll up our sleeves and start tackling the subtasks one by one.
As you start working on the subtasks and sub-sub-tasks, more details emerge as in an infinite fractal:
It turns out that building a recommendation system is not a matter of solving the dark blue circle in the middle but solving a myriad of diverse subtasks seemingly at the edges – at the outer bounds – of the original task. If you could peek under the hood of the best recommendation systems, say, those of TikTok’s or Netflix’s, you would find out that the recommendation system is what exists here at the edges – a carefully orchestrated collection of hundreds of subsystems – instead of a neat machine with well-defined boundaries in the middle.
This story applies to most data science applications in the industry. Correspondingly, the job of most data scientists involves solving myriads of small problems, some of which are mathematically or technically novel and challenging but most of which are rather mundane, such as trying to get libraries installed correctly or getting SQL queries to execute fast enough.
Some hope that a grand unified solution for data science will emerge, replacing all the small tasks with a silver bullet. If the history of software engineering is any proof, this is unlikely to happen, which, in fact, is good news: It would be undesirable to have one solution to rule them all, since the diversity and avoidance of a monoculture is a great source for innovation and differentiated products. To be successful in this world, data scientists should be using their human creativity to the fullest, crafting elegant solutions to specific needs instead of being just cogs in a machine.
What we do at Outerbounds
At Outerbounds, we help data scientists in their day-to-day tasks, both big and small. We believe that the best way to do this is by investing in both technology as well as human empowerment.
First, we reduce unnecessary complexity in machine learning and data science applications by providing delightfully usable and reliable infrastructure for data scientists and engineers in the form of Metaflow, an open-source library we started developing at Netflix years ago. Metaflow doesn’t remove all the tasks at the edges but it allows you to focus on tasks that matter: Consider it a foundation for your innovative, custom solutions.
Today, Metaflow is used by hundreds of enterprises across industries. Outerbounds helps them set up modern, full-stack data science environments so they can build end-to-end data science solutions quickly using Metaflow and the infrastructure underneath it.
Second, we share real-world knowledge and experience with data scientists and engineers so they are able to build data science applications successfully. Starting today, we offer a free, growing knowledge base of how-to articles here at Outerbounds.com covering a diverse set of topics, techniques, and libraries. To put these articles in context, we provide topic guides, such as this one about reproducibility, which help you connect the dots. If you prefer long-form content, you can find many of these topics also covered in a new book, Effective Data Science Infrastructure.
In addition, we will continue investing in the open-source documentation at docs.metaflow.org. In the space of different types of documentation, the resources at outerbounds.com fall on the practical axis, since helping companies and data scientists is our day job, whereas the open-source documentation gives a broad overview of the project. We hope that these complementary materials will help you as you solve problems, big and small, at the outer bounds of data science projects.
Help us help you
We have bootstrapped the initial batch of how-to articles based on hundreds of questions we have heard in the Metaflow community. A big thanks to all of you who contributed to the effort by patiently explaining your challenges! You can join our community Slack today to share your challenges and solutions, and get support from us and over a thousand other like-minded data scientists and engineers.
If there’s a topic or a question – no matter how simple, advanced, mundane, or sophisticated that you would like to see covered by an article, or you need help setting up modern data science infrastructure in your organization, don’t hesitate to contact us on Slack!