PyData NYC 2022

Serving Pytorch Models in Production
11-11, 13:30–15:00 (America/New_York), Central Park West (6th floor)

This talk is for a data scientist or ML engineer looking to serve their Pytorch models in production.

It will cover post training steps that should be taken to optimize the model such as quantization and JIT.
It will also walk the user in packaging and serving the model through Facebook’s TorchServe.


Intro (5 mins)
- Introduce the deep learning BERT model- classification models at walmart
- Walk over the notebooks on custom JupyterHub
- Show the end model served

Review Some Deep Learning Concepts (10 mins)
- Review sample trained pytorch model code
- Review sample model transformer architecture
- Tokenization / pre and post processing

Optimizing the model (30 mins)
-Two modes of pytorch: eager vs script mode
- Benefits of script mode and Pytorch JIT
- Post training optimization methods : pruning, mixed precision training
- Hands on:
-- Quantizing model
-- Converting the Bert model with torch script

Deploying the model ( 30 mins)
- Overview of deployment options : Pure flask app vs model servers like Torch Serve / Triton / TF-Serving
- Benefits of Torch Serve: high performance serving , multi model serving , model version for A/B testing, server side batching, support for pre and post processing
- Exploring the built in model handlers and how to write your own
- Managing the model through management api
- Exploring built and custom metrics provided by Torch Serve
- Hands on :
-- Package the given model using Torch Model Archive
-- Write a custom handler to support preprocessing and post processing

Lessons Learned:(5min)
- share some performance characteristics of model served inside company
- future next steps

Q&A (5 mins)


Prior Knowledge Expected

No previous knowledge expected

Senior Machine Learning Engineer at Walmart E-commerce Search

Machine Learning Engineer at Walmart Search

This speaker also appears in: