PyData NYC 2022

Scaling Python - Bank Edition
11-10, 14:15–15:00 (America/New_York), Central Park East (6th floor)

In this talk, we will explore the replacement of a legacy C++ engine at a large, multinational bank with a modern, scalable solution while taking into account technical, stakeholder and policy constraints. The existing technology fails to process data at the scale required in order to make informed decisions. We demonstrated the ability to scale using the PyData stack and met the bank’s requirements.


Our team was tasked with building a distributed compute environment that could be used to run valuation adjustment models for a large multinational bank. The bank used a single threaded C++ codebase to create and run these valuation adjustment models. This legacy system fails to process data at the scale required by the bank to make informed decisions. For a single set of models, this process involved loading and summing up 14,000 files consisting of 6 million rows each (24.5 TB). Then, the results are run through one or more valuation adjustment models before the final sets of data can be generated. In the future, the bank wants to be able to process 1,400+ model sets. We experimented with several scaling solutions using several architectures (GPU, SQL Engines, etc.) and settled on a solution that scales sufficiently using Dask. This talk explores our learning process building out this solution while taking into account technical, stakeholder and policy constraints. During this process we tested OmnisciDB(Heavy.AI), RAPIDS, BlazingSQL, Dask, Prefect and Argo Workflows, among many other solutions. In this talk we will present our solutions to remedy these issues, leading to the successful implementation and deployment of a modernized HPC Cluster.


Prior Knowledge Expected

No previous knowledge expected

Anirrudh is a senior software engineer at Quansight, working on problems ranging from computation to full stack application development. In the past, he has worked as a data scientist and has contributed to open source projects. Always curious and excited to learn, he is currently interested in low-powered IoT devices and systems programming. He holds degrees in Physics and Computer Science, and currently resides in New York.

Software Engineer focused on data science and data engineering.

Ph.D. Data Scientist with a passion for solving problems using big data and a little elbow grease.