PyData NYC 2022

Contextual Multi-Arm Bandit and its applications to digital experiments
11-09, 11:45–12:30 (America/New_York), Radio City (6th floor)

Multi-Arm Bandit (MAB) is a reinforcement learning method that seeks to quickly converge to the best action among a set of candidate actions and is often used as an alternative to AB testing. We have developed a custom implementation of Contextual MAB that makes use of context to allow optimization at a more granular and personalized level. We will provide a high-level overview of this methodology and demonstrate its success through use cases relating to digital experiments.


Multi-Arm Bandit (MAB) is a reinforcement learning method commonly used in recommendation settings where we seek to maximize positive feedback by selecting the best action among set of candidate actions in relatively fast iterations. Example use cases for MAB include selecting the best digital content to display to users in an online advertising setting, or selecting the best subject line to send to individuals in an email campaign to maximize click-through rate. MAB balances the exploration and exploitation trade-off at the population level, however, Contextual MAB (CMAB), an extension of vanilla MAB can make use of context to further optimize decisions at a more granular, personalized level. We have developed an optimization framework which includes a novel, custom built CMAB agent with python that can be applied to a number of different use cases to optimize customer engagement, value and satisfaction. Data scientists, machine learning engineers, and analytics leaders will come away from this talk with a high-level understanding of CMAB and the potential benefits it could bring to other use cases, as well as a high-level understanding of some specific technical aspects of this optimization framework. Attendees will benefit from, but not be required to have, a basic understanding of the general reinforcement learning paradigm as well as basic machine learning techniques, e.g., regression, classification, clustering, sampling methods.

Minutes 0-5: Introduction and motivation (non-technical)
Minutes 5-15: Overview of reinforcement learning, vanilla MAB that motivates new method development
Minutes 15-30: Our contextual MAB optimization framework and tools for implementation
Minutes 30-40: Extensions and broad applications of our developed methodology and Q&A


Prior Knowledge Expected

Previous knowledge expected

Li is Principal Data Scientist at CVS Health.  Li has PhD in Biostatistics from UPENN and 15+ years’ experience creating advanced DS & ML solutions for challenging, complex real problems in academia and industry.

Reed Peterson is a Data Scientist at Aetna, a CVS Health Company where he contributes to various health intervention campaigns and machine learning optimization projects focused on reinforcement learning and natural language processing.