PyData NYC 2022

Improving Your Data Modeling Work Through Open-Source Software
11-11, 09:00–10:30 (America/New_York), Central Park East (6th floor)

Most data practitioners use Jupyter notebooks for data modeling; however, iterating, testing, and collaborating on top if it can be quite complex. This talk will show how a few open-source tools, will allow you to run notebooks the right way, including concepts like debugging on production, profiling the memory consumption, and easily parallelizing them.

Data practitioners with experience analyzing data with Python and Jupyter are welcome to attend. Basic experience with at least one of the supported orchestration backends is helpful but not necessary.


In this tutorial we'll cover how data practitioners can leverage Jupyter notebooks to it's full potential. Jupyter is the best place to start a proof of concept, but when talking about production we'd like our code to be robust and scalable.

We'll cover a few concepts via open-source frameworks that will allow us to achieve improved processes with the tools we already know. Some of those concepts will cover fast iterations (through caching), debugging and profiling our notebooks, and parameterization/parallelization.

Data practitioners with experience analyzing data with Python and Jupyter are welcome to attend. Basic experience with at least one of the supported orchestration backends is helpful but not necessary.

[ 0 - 5 minute] Introduction

[5 - 20] Ploomber basics

[20 - 30] Generating experiment grid (parallelize)

[30 - 40] Running experiments (Caching)

[40 - 50] Break

[50 - 65] Notebook profiling

[65 -75] Debugging

[75 - 90] Analyzing results


Prior Knowledge Expected

Previous knowledge expected

Ido Michael co-founded Ploomber to help data scientists build faster. He'd been working at AWS leading data engineering/science teams. Single-handedly he built 100’s of data pipelines during those customer engagements together with his team. He came to NY for his MS at Columbia University. He focused on building Ploomber after he constantly found that projects dedicated about 30% of their time just to refactoring the dev work (prototype) into a production pipeline.

Eduardo Blancas is the Co-Founder and CEO of Ploomber, a Y Combinator-backed company developing tools to bridge the gap between interactive data work and production. Before that, he was a Data Scientist at Fidelity Investments, where he deployed the first customer-facing Machine Learning model for asset management. Eduardo holds an M.S. in Data Science from Columbia University and a B.S. in Mechatronics Engineering from Tecnológico de Monterrey.