PyData NYC 2022

Kevin Kho

Kevin Kho is a maintainer for the Fugue project, an abstraction layer for distributed computing. Previously, he was an Open Source Community Engineer at Prefect, an workflow orchestration management system. Before working on data tooling, he was a data scientist for 4 years.


Sessions

11-11
13:30
90min
Fast and Scalable Timeseries Modelling with Fugue and Nixtla
Kevin Kho

Timeseries modeling has been one of the weak points of the Python ecosystem compared to R. Statistical modeling libraries such as pmdarima and statsmodels are orders of magnitude slower than R, and state-of-the-art algorithms remain challenging to implement. In this tutorial, we introduce a set of open-source libraries that allow for fast and scalable time series modeling in Python. Using StatsForecast and NeuralForecast on different python backends for distributed computing like Dask, Ray, and Spark, we will show the participants how to do forecasting at scale and even how to outperform current benchmarks in the R ecosystem.

We’ll walk through general best practices when working with time series data and explore the various kinds of time series modeling techniques: statistical, hierarchical, and deep learning based approaches.

Using the Fugue abstraction layer, we’ll learn how to port Python and Pandas code to distributed computation clusters with a few lines of code and leverage the power of Dask, Spark and Ray. This will allow the participants to learn how to train millions of time series models in a few minutes.

Winter Garden (5th floor)