PyData NYC 2022

Human-Friendly, Production-Ready Data Science Stack with Metaflow & Kubernetes
11-10, 13:30–14:15 (America/New_York), Central Park East (6th floor)

There is a pressing need for tools and workflows that meet data scientists where they are. This is also a serious business need: How to enable an organization of data scientists, who are not software engineers by training, to build and deploy end-to-end machine learning workflows and applications independently. In this talk, we discuss the problem space and the approach we took to solving it with Metaflow, the open-source framework we developed at Netflix, which now powers hundreds of business-critical ML projects at Netflix and other companies from bioinformatics and drones to real estate. We wanted to provide the best possible user experience for data scientists, allowing them to focus on parts they like (modeling using their favorite off-the-shelf libraries) while providing robust built-in solutions for the foundational infrastructure: data, compute, orchestration, and versioning.


In this talk, you will learn about

  • What to expect from a modern ML infrastructure stack.
  • Using tools such as Metaflow & Kubernetes to boost the productivity of your data science organization, based on lessons learned from Netflix and many other companies.
  • Deployment strategies for a full stack of ML infrastructure that plays nicely with your existing systems and policies.

Prior Knowledge Expected

No previous knowledge expected

Savin is the co-founder and CTO of Outerbounds - where his team is building the modern ML stack to accelerate the impact of data science. Previously, he was at Netflix, where he built and open-sourced Metaflow, a full stack framework for data science.