PyData NYC 2022

Parallelism in Numerical Python Libraries
11-09, 16:15–17:00 (America/New_York), Winter Garden (5th floor)

Python libraries can compute on multiple CPU cores using a variety of parallel programming interfaces such as multiprocessing, pthreads, or OpenMP. Some libraries use an ahead-of-time compiler like Cython or a just-in-time compiler like Numba to parallelize their computational routines. When many levels of parallelism operate simultaneously, it can result in oversubscription and degraded performance. We will learn how parallelism is implemented and configured in various Python libraries such as NumPy, SciPy, and scikit-learn. Furthermore, we will see how to control these mechanisms for parallelism to avoid oversubscription.


Python libraries such as NumPy, SciPy, or scikit-learn can run computational routines on multiple CPU cores. These libraries implement parallelism with a wide range of programming interfaces. We will learn when and how to use these interfaces by examining how Python libraries implement parallelism. Specifically, we will discuss high-level interfaces such as Python's multithreading and multiprocessing modules. We will cover lower-level parallel primitives such as pthreads and OpenMP. Some libraries use an ahead-of-time compiler like Cython or a just-in-time compiler like Numba to parallelize their computational routines. Throughout this talk, we will explore each interface's advantages, disadvantages, and potential issues for writing parallelized code. When multiple forms of parallelism run simultaneously, controlling how many cores your program uses is essential to prevent oversubscription. We will learn to use context managers with threadpoolctl, environment variables, and library-specific APIs to control parallelism. This talk is for an intermediate audience that wants to understand parallelism in the PyData stack.


Prior Knowledge Expected

Previous knowledge expected

Thomas J. Fan is a Staff Software Engineer at Quansight Labs and is a maintainer for scikit-learn, an open-source machine learning library for Python. Previously, Thomas worked at Columbia University to improve interoperability between scikit-learn and AutoML systems. He is a maintainer for skorch, a neural network library that wraps PyTorch. Thomas has a Masters in Mathematics from NYU and a Masters in Physics from Stony Brook University.