PyData NYC 2022

Building highload ML powered service
11-09, 16:15–17:00 (America/New_York), Radio City (6th floor)

At Ntropy, we build the core API for financial transaction enrichment, based on a pipeline of ML-based operations. It requires fast (<200 ms) responses for hundreds of requests per second. The speech summarizes our experience of making it fast enough while keeping in the Python ecosystem that is often claimed to be slow for highload systems.


Many engineers tend to think Python-based backends are not a good solution for highload services. We can't agree and would like to share our experience making it swift and robust. Some of the practices include:
- labeling optimization based on fast clustering;
- inference service design;
- cascades of smart caching, rule-based heuristics, and heavy ML;
- using Rust for optimizing most latency-affecting functions;

Overall, we'd like to share the principles we follow while designing the system - keeping it fast, maintainable, and easy to change.

The speech will be interesting for those who design ML-related systems oriented to high load and availability.


Prior Knowledge Expected

No previous knowledge expected

I'm a machine learning engineer, delivering ML projects since 2015 in individual contributor and leadership roles, mainly focusing on deep learning and ML ops-related problems.

ML engineer at Ntropy | Co-founder at AccountingBox | MSc Machine Learning at UCL