Jon Wiggins PyData NYC 2022

Jon Wiggins
.ical

Machine Learning Engineer

Sessions

11-09

14:45

45min

Understanding the News around the World with Web Scraping and NLP at Scale

Jon Wiggins

Everyday, media companies around the world publish millions of articles spanning multiple languages, and at Chartbeat we process this data to understand what is driving reader engagement. In this talk we discuss real-world lessons learned in building a production pipeline for scraping and extracting metadata in real time from this multitude of news articles. The pipeline leverages a mix of pre-trained and custom-built machine learning models in Python for content extraction, natural language processing, categorization, translation, and entity linking, enabling availability of metadata for an article in just three seconds on average.

Music Box (5th floor)

Jon Wiggins .ical

Sessions

Jon Wiggins
.ical