PyData NYC 2022

Expressive and fast dataframes in Python with polars
11-09, 11:00–11:45 (America/New_York), Central Park East (6th floor)

The pandas library is one of the key factors that enabled the growth of Python in the Data Science industry and continues to help data scientists thrive almost 15 years after its creation. Because of this success, nowadays several open-source projects claim to improve pandas in various ways, either by bringing it to a distributed computing setting (Dask), accelerating its performance with minimal changes (Modin), or offering slightly different API that solves some of its shortcomings (Polars).

In this talk we will dive into Polars, a new dataframe library backed by Arrow and Rust that offers an expressive API for dataframe manipulation with excellent performance.

If you are a seasoned pandas user willing to explore alternatives, or a beginner user wondering what all the fuzz about these new dataframe libraries is, this talk is for you!


The outline of the talk goes as follows:

  1. We will make a very brief introduction to pandas, we will talk about its importance, and we will point out some of its shortcomings (as its own creator did half a decade ago (10 minutes)
  2. We will enumerate some of the current pandas alternatives and classify them (pandas-like vs bespoke, single-node vs distributed) (5 minutes)
  3. We will do a live demo of how to analyze and manipulate a relatively big dataset using Polars inside Orchest Cloud y and showcase some of its unique capabilities (20 minutes).
  4. Recommendations and conclusions (5 minutes).

After the talk, you will have more information on how some of the modern alternatives to pandas fit into the ecosystem, and will understand why Polars is so exciting and promising. Prior exposure to data manipulation with Python (not necessarily with pandas) will help make the most of the presentation.

The talk will build upon this blog post about Polars.


Prior Knowledge Expected

Previous knowledge expected

Juan Luis (he/him/él) is an Aerospace Engineer with a passion for STEM, programming, outreach, and sustainability. He works as Data Scientist Advocate at Orchest, where he empowers data scientists by building an open-source, scalable, easy-to-use workflow orchestrator. He has worked as Developer Advocate at Read the Docs, as software engineer in the space, consulting, and banking industries, and as a Python trainer for several private and public entities.

Apart from being a long-time user and contributor to many projects in the scientific Python stack (NumPy, SciPy, Astropy) he has published several open-source packages, the most important one being poliastro, an open-source Python library for Orbital Mechanics used in academia and industry.

Finally, Juan Luis is the founder and former chair of the Python España association, the point of contact for the Spanish Python community, former organizer of PyCon Spain, which attracted more than 800 attendees in its last in-person edition in 2019, and current organizer of the PyData Madrid monthly meetups.