ML Latency No More: Useful Patterns to Reduce ML Prediction Latency to Sub X ms PyData NYC 2022

ML Latency No More: Useful Patterns to Reduce ML Prediction Latency to Sub X ms
.ical

11-10, 10:15–11:00 (America/New_York), Music Box (5th floor)

Machine Learning (ML) systems don’t exist until they are deployed. Unfortunately, prediction latency is one of those edges that hurt badly, and it hurts too late in the product cycle. Stop optimizing that offline TensorFlow/Scikit-learn/PyTorch model performance! Focus on the ML serving latency first, that’s what the client sees first! So, what are some common ways to reduce ML Latency?
This presentation will introduce the audience to the most useful patterns for deploying low-latency ML serving systems.

ML Latency is an all-too-common issue with ML system. Don't let this kill your product. So, how do we minimize the prediction serving latency of ML systems? Here are some critical questions to ask yourself before starting your next ML project:

1) Does it need to be <100ms or offline?
2) Do you know your approximate “optimizing” and “satisficing” metrics thresholds?
3) Did you verify that your input features can be looked up in a low-read-latency DB?
4) Could you find everything that can be precomputed and cached?

In this talk, we will cover battle-tested strategies and tactics that will benefit your current ML project as well as your future ML career.

First, we will figure out the differences between Online vs Offline and Real-time vs Batch predictions. Then, we will look at the pros and cons of Async vs Sync strategies for ML serving. Finally, we will look at features pre-computation tactics as well as prediction caching strategies.

More than a quick-fix talk, this presentation will provide you with a map to navigate the multitude of patterns that are available to you to optimize your next ML serving pipeline.

Get a head start today and check the companion blog post: https://towardsdatascience.com/ml-latency-no-more-9176c434067b

See you there and don’t be late to see this presentation! (pun intended)

Prior Knowledge Expected –

No previous knowledge expected

Moussa Taifi

Moussa Taifi is currently a Senior Data Science Platform Engineer II at Xandr-Microsoft.

He holds a PhD in Computer and Information Science from Temple university. He is a machine learning and big data systems engineer, focused on data science productivity, reliability, performance and cost. He is interested in designing and implementing large scale AI products, through data collection, analysis and warehousing.

ML Latency No More: Useful Patterns to Reduce ML Prediction Latency to Sub X ms .ical 11-10, 10:15–11:00 (America/New_York), Music Box (5th floor)

ML Latency No More: Useful Patterns to Reduce ML Prediction Latency to Sub X ms
.ical

11-10, 10:15–11:00 (America/New_York), Music Box (5th floor)