PyData NYC 2022

Adam Lewis

Ph.D. Data Scientist with a passion for solving problems using big data and a little elbow grease.

  • Scaling Python - Bank Edition
Alex Merose

Alex is a senior software engineer at Google Research focused on democratizing climate & weather data.

  • Apache Beam on Dask: Portable, Scalable, Scientific Python (AKA Data Engineering for the Climate)
Allen Downey

Allen Downey is a Staff Scientist at DrivenData and Professor Emeritus at Olin College.
He is the author of several textbooks -- including Think Python, Think Bayes, and Elements of Data Science -- and "Probably Overthinking It", a blog about data science and Bayesian statistics. He received a Ph.D. in computer science from U.C. Berkeley and Bachelor's and Master's degrees from MIT.

  • Chasing the Overton Window
Andrew Fulton

Software Engineer focused on data science and data engineering.

  • Scaling Python - Bank Edition
Anirrudh Krishnan

Anirrudh is a senior software engineer at Quansight, working on problems ranging from computation to full stack application development. In the past, he has worked as a data scientist and has contributed to open source projects. Always curious and excited to learn, he is currently interested in low-powered IoT devices and systems programming. He holds degrees in Physics and Computer Science, and currently resides in New York.

  • Scaling Python - Bank Edition
Arseny Kravchenko

I'm a machine learning engineer, delivering ML projects since 2015 in individual contributor and leadership roles, mainly focusing on deep learning and ML ops-related problems.

  • Building highload ML powered service
Benjamin Batorsky

Ben is a Senior Data Scientist at the Institute for Experiential AI. He obtained his Masters in Public Health (MPH) from Johns Hopkins and his PhD in Policy Analysis from the Pardee RAND Graduate School. Since 2014, he has been working in data science for government, academia and the private sector. His major focus has been on Natural Language Processing (NLP) technology and applications. Throughout his career, he has pursued opportunities to contribute to the larger data science community. He has spoken at data science conferences, taught courses in Data Science, and helped organize the Boston chapter of PyData. He also contributes to volunteer projects applying data science tools for public good.

  • Bagging to BERT: A tour of applied NLP
Brian Bush

Brian heads up the machine learning group at Shift5. He is a data scientist and entrepreneur with more than 25 years of experience across a variety of fields from NLP and speech processing to large-scale anomaly detection. Previously, he co-founded RuleSpace and served as chief architect, which was acquired by Symantec Corp. Brian received his Ph.D. in Computer Science from the Oregon Health & Science University of Portland, Oregon, a Masters in Manufacturing Engineering from Ohio University, Ohio, and a BA in Applied Mathematics and Computer Science from Hiram College, Ohio.

  • Practical MLOps: Do we need all the things?
Charles Cloud
  • Ibis: Expressive analytics in Python at any scale.
Chelle Gentemann

Chelle Gentemann studies the sea from space. The core of her research has been data production and analysis to understand and monitor changes in our weather and climate. Her approach has been to use cutting-edge technologies to bring sweeping changes to data and science, specifically targeting expansions data access and participation in science. At NASA headquarters she leads the Transform to Open Science (TOPS) mission. TOPS will create a scientific culture that is ready for 21st century challenges.

  • Keynote - I'm from the government and I'm here to help
Cheuk Ting Ho

Before working in Developer Relations, Cheuk has been a Data Scientist in various companies which demands high numerical and programmatical skills, especially in Python. To follow her passion for the tech community, now Cheuk is the Developer Advocate for Anaconda.

Besides her work, Cheuk enjoys speaking at various conferences. Cheuk also organises events for developers. Cheuk has organised conferences including EuroPython (of which she is a board member), PyData Global and Pyjamas Conf. Believing in Tech Diversity and Inclusion, Cheuk constantly organizes workshops and mentored sprints for minority groups. In 2021, Cheuk has become a Python Software Foundation fellow.

  • Using Numba Effectively Today
  • I hate writing tests, that's why I use Hypothesis
Colin Carroll

Colin Carroll is a software engineer at Google Research. In this role he focuses Bayesian computation and research, and contributes to a number of open source libraries, including TensorFlow Probability, PyMC[3], and ArviZ. He received his PhD in mathematics from Rice University, where he researched geometric measure theory.

  • JAX for Bayes
Dagshayani Kamalaharan

Senior Machine Learning Engineer at Walmart E-commerce Search

  • Serving Pytorch Models in Production
Daniel Chen

Daniel is a Postdoctoral Research and Teaching Fellow at the University of British Columbia, a Data Science Educator at RStudio, PBC (Posit, PBC), and the author of "Pandas for Everyone". He primarily focuses on teaching data science skills in R and Python.

  • Install Python. Quarto Render All the Things
David Chudzicki
  • Dask
Diego Torres Quintanilla

I am an engineering manager at Two Sigma, where my team is in charge of maintaining the base tools of the PyData stack internally. Together, we build an ecosystem for our internal researchers and contribute back to open-source. If you're excited about the PyData stack, my team is hiring! Shoot me an email at [email protected] if you're interested.

I was born and raised in Monterrey, México, and moved to the US in 2012 to start college. In my free time I love to ride my bicycle, read fiction, and volunteering with organizations that work with the Hispanic community of New York.

  • How we upstreamed our internal goals to JupyterLab 4
Eduardo Blancas

Eduardo Blancas is the Co-Founder and CEO of Ploomber, a Y Combinator-backed company developing tools to bridge the gap between interactive data work and production. Before that, he was a Data Scientist at Fidelity Investments, where he deployed the first customer-facing Machine Learning model for asset management. Eduardo holds an M.S. in Data Science from Columbia University and a B.S. in Mechatronics Engineering from Tecnológico de Monterrey.

  • Improving Your Data Modeling Work Through Open-Source Software
Elijah ben Izzy

Elijah has always enjoyed working at the intersection of math and engineering. More recently, he has focused his career on building tools to make data scientists more productive. At Two Sigma, he was building infrastructure to help quantitative researchers efficiently turn ideas into production trading models. At Stitch Fix he leads the Model Lifecycle team — a team that focuses on streamlining the experience for data scientists to create and ship machine learning models. In his spare time, he enjoys geeking out about fractals, poring over antique maps, and playing jazz piano.

  • Scalable Feature Engineering with Hamilton
Emmanuel Naziga

Emmanuel Naziga is a machine learning engineer at Munich RE. He currently works on developing ML models as well as the infrastructure to enable the deployment of production ML systems. Previously he obtained a doctorate in computational science and carried out postdoctoral research in computational biophysics and genomics.

  • Model Upgrade Schemes: Considerations for Updating Production Models
Eskild Eriksen

Software engineer at Quansight.

  • Nebari: Easily deploy and maintain an open source data science platform on the cloud of your choice
Estefania Barreto-Ojeda

Estefania Barreto-Ojeda is a computational scientist at Cyclica Inc., where she develops and maintains machine learning pipelines for drug discovery. A physicist by training, she has a PhD in Biophysical Chemistry from the University of Calgary where she developed open source tools to analyze MD simulations. Estefania is an occasional open-source contributor, full time data visualization fan, and seasonal bicycle lover.

  • Data and Model Version Control: Applications in ML Drug Discovery pipelines
Fabio Buso

Fabio Buso is VP of Engineering at Hopsworks, leading the Feature Store development team. Fabio holds a master’s degree in Cloud Computing and Services with a focus on data intensive applications.

  • How to build a serverless electricity price prediction service in just Python with Hopsworks and Streamlit
Fabio Pliger
  • PyScript & Data Science: PyData stack on the Browser
Gil Forsyth

Gil Forsyth is a software engineer at Voltron Data. He followed the common career path of Japanese language specialist -> administrative assistant -> mechanical engineer -> computational fluid dynamicist -> data scientist -> software engineer -> machine learning engineer -> software engineer.
Gil contributes to several projects in the PyData ecosystem and is a core maintainer of xonsh and helps maintain Ibis. He served as the program chair for the Scientific Computing with Python (SciPy) conference from 2016 to 2020.

  • Ibis: Expressive analytics in Python at any scale.
Han Wang

Han Wang is the lead of Lyft Machine Learning Platform, focusing on distributed computing and training. Before joining Lyft, he worked at Microsoft, Hudson River Trading, Amazon, and Quantlab. Han is the founder of the Fugue project, aiming at democratizing distributed computing and machine learning.

  • Testing Big Data Applications (Spark, Dask, and Ray)
Harini Srinivasan

Harini Srinivasan is a Senior Technical Staff Member in the IBM Sustainability Software organization. She currently leads a team of Data Scientists in building products incorporating advanced AI solutions using geo spatial and remote sensing data such as weather, satellite imagery and Lidar. In her 28 year career at IBM, she has also contributed significantly in the areas of Programming Languages and Runtimes, Performance Analysis and Tools, Social Media Analytics and Software Patterns. She has worked in IBM Research and IBM Software product divisions, published in major conferences and journals and has over 10 patents issued.

  • Predicting Weather-Caused Rare Events: A Utility Outage Prediction Use Case
Hemant Jain

I work on Machine Learning Inference at Cohere AI. Prior to this I spent 3 years at NVIDIA developing Triton Inference Server, an open source solution used to deploy machine learning models into production. I have a Masters in Data Science from the University of Washington.

  • Large Language Models for Real-World Applications - A Gentle Intro
Hosted by Quizmaster James Powell

James Powell has hosted PyData pub quizzes since the first conference. Come see what he has prepared this year.

  • Pub Quiz
Iain Campbell

TBC

  • A Graph-based Machine Learning early warning system to detect Ransomware
Ido Michael

Ido Michael co-founded Ploomber to help data scientists build faster. He'd been working at AWS leading data engineering/science teams. Single-handedly he built 100’s of data pipelines during those customer engagements together with his team. He came to NY for his MS at Columbia University. He focused on building Ploomber after he constantly found that projects dedicated about 30% of their time just to refactoring the dev work (prototype) into a production pipeline.

  • Improving Your Data Modeling Work Through Open-Source Software
Ilinca Barsan

Ilinca Barsan is a director of data science at Wunderman Thompson. A founding member of Wunderman Thompson’s Global Creative Data Group and a social scientist by training, she has an overactive imagination and a passion for weird side projects, storytelling with data, and that sweet spot where code and creativity collide. Ilinca was born in Romania, grew up in Germany, and is currently based in New York after stints in Singapore and London. She has an MSc in Social Science of the Internet from the Oxford Internet Institute, with a specialization in network science.

  • A Guide to Data Science as a Creative Discipline
Isaac Godfried

Isaac Godfried is a data scientist at SimSpace. Isaac specializes in utilizing deep learning on real world problems in cybersecurity, climate, healthcare, and agriculture. He is the author and principal maintainer of Flow Forecast a deep learning for time series framework in PyTorch.

  • Deep learning for time series forecasting and classification in practice
Isabel Zimmerman

Isabel Zimmerman is a software engineer on the open source team at RStudio, where she works on building MLOps frameworks. When she's not geeking out over new data science techniques, she can be found hanging out with her dog or watching Marvel movies.

  • Holistic MLOps for better science
James Powell

James Powell is the founder and lead instructor at Don’t Use This Code. A professional Python programmer and enthusiast, James got his start with the language by building reporting and analysis systems for proprietary trading offices; now, he uses his experience as a consultant for those building data engineering and scientific computing platforms for a wide range of clients using cutting-edge open source tools like Python and React.

He also currently serves as a Board Director, Chair, and Vice President at NumFOCUS, the 501©3 non-profit that supports all the major tools in the Python data analysis ecosystem (i.e., pandas, numpy, jupyter, matplotlib). At NumFOCUS, he helps build global open source communities for data scientists, data engineers, and business analysts. He helps NumFOCUS run the PyData conference series and has sat on speaker selection and organizing committees for 18 conferences. James is also a prolific speaker: since 2013, he has given over seventy (70) conference talks at over fifty (50) Python events worldwide.

  • Why do I need to know Python? I'm a pandas user…
Jamie DeMaria

Jamie is a software engineer working on Dagster. She has also built data analysis tools (using Dagster!) for a robotics startup and developed software to train mission planners for the Mars Curiosity rover.

  • Troubleshooting your Data Workflows with Noteable + Dagster: A live debugging of failed jobs.
Jeff Hale

Jeff Hale is passionate about helping people and organizations learn data skills and use data more effectively. As a Developer Advocate at Prefect, Jeff helps people coordinate their dataflows. He has taught over 200 data science lessons and written widely on data-related topics. Jeff has been designated a top Medium writer in the areas of Artificial Intelligence and Technology and authored several books. He co-organizes the Data Science DC Meetup and is a board member of Data Community DC.

  • Supercharge your Python code with Blocks
Jeff Reback

As a former quant Jeff Reback has much experience in building financial trading systems, using python and working with very large data. He has been a core committer to the pandas project since 2011, and has managed the project since 2013. Jeff is a Managing Director at Two Sigma, overseeing the research environment. Jeff holds a B.S. in Computer Science from the Massachusetts Institute of Technology.

  • pandas at a Crossroads, the Past, Present, and Future
Joe Cheng

Joe Cheng is the Chief Technology Officer and first employee at Posit, PBC (formerly known as RStudio), where he helped create the RStudio IDE and Shiny web framework, along with countless complementary tools and packages.

  • Shiny for Python: Interactive apps and dashboards made easy-ish
Jon Wiggins

Machine Learning Engineer

  • Understanding the News around the World with Web Scraping and NLP at Scale
Joshua E. Jodesty

Joshua has been a data & software engineer who implemented scalable, parallelized, concurrent, and distributed stochastic simulation software for digital twin implementations and sociotechnical system design of the decentralized web. He also implemented machine learning enabled big data processing solutions for viewership forecasting in AdTech, a cross-disciplinary data product for supply chain management, and conducted machine learning research enabling the prediction of student performance in online courses.

  • CATs: Content-Addressable Transformers
Juan Luis

Juan Luis (he/him/él) is an Aerospace Engineer with a passion for STEM, programming, outreach, and sustainability. He works as Data Scientist Advocate at Orchest, where he empowers data scientists by building an open-source, scalable, easy-to-use workflow orchestrator. He has worked as Developer Advocate at Read the Docs, as software engineer in the space, consulting, and banking industries, and as a Python trainer for several private and public entities.

Apart from being a long-time user and contributor to many projects in the scientific Python stack (NumPy, SciPy, Astropy) he has published several open-source packages, the most important one being poliastro, an open-source Python library for Orbital Mechanics used in academia and industry.

Finally, Juan Luis is the founder and former chair of the Python España association, the point of contact for the Spanish Python community, former organizer of PyCon Spain, which attracted more than 800 attendees in its last in-person edition in 2019, and current organizer of the PyData Madrid monthly meetups.

  • Expressive and fast dataframes in Python with polars
Jules S. Damji

Richard Liaw is an engineering manager at Anyscale, where he leads a team in building open source libraries on top of Ray. He is on leave from the PhD program at UC Berkeley, where he worked at the RISELab advised by Ion Stoica, Joseph Gonzalez, and Ken Goldberg. In his time in the PhD program, he was part of the Ray team, building scalable ML libraries on top of Ray.

Jules S. Damji is a lead developer advocate at Anyscale and an MLflow contributor. He is a hands-on developer with over 20 years of experience and has worked at leading companies such as Sun Microsystems, Netscape, @Home, Opsware/Loudcloud, VeriSign, ProQuest, Hortonworks, and Databricks, building large-scale distributed systems. He holds a BSc and MSc in computer science (from Oregon State University and Cal State, Chico, respectively), and an MA in political advocacy and communication (from Johns Hopkins University).

  • Distributed Python with Ray: Hands on with the Ray 2.0 APIs for scaling Python Workloads
Kei Nemoto

Kei currently works as a data scientist in the healthcare field. He uses his expertise in data science/software engineering to automate machine learning workflows at scale. He has a Master of Science in Data Science degree from the Graduate Center, City University of New York, where he extensively focused on deep learning for information retrieval. He is passionate about learning new technologies to achieve what was impossible yesterday.

  • Gentle introduction to scaling up ML service with Kubernetes + Mlflow
Kevin Kho

Kevin Kho is a maintainer for the Fugue project, an abstraction layer for distributed computing. Previously, he was an Open Source Community Engineer at Prefect, an workflow orchestration management system. Before working on data tooling, he was a data scientist for 4 years.

  • Fast and Scalable Timeseries Modelling with Fugue and Nixtla
Kjell Wooding

Kjell is a computer engineer and mathematician who splits his time between Big Data and Little Learners. By day he is the Supervisor of Data Science and Machine Learning research at the Tutte Institute for Mathematics and Computing. By night he's the co-founder of Learn Leap Fly, an educational software company using AI and Machine learning to help teach the world to learn.

  • Zeno Does Data Science: The Paradoxical Quest for Reproducibility
Kshetrajna Raghavan

Kshetrajna is a Staff Data Scientist at Shopify working on the capital algorithms team. He has built and productionalized many models in various domains including retail, ad-tech and healthcare. His interests are mainly applied ML and ML systems. Outside of work, Kshetrajna loves to spend time with his dogs, play music on his guitar, and is an avid gamer.

  • Using Interconnected ML Models to Tackle Retail Challenges
Lara Kattan

Lara is a data scientist (but who isn't these days), curriculum developer and instructor. She's a data science manager at Ernst & Young (EY) and an adjunct at the University of Chicago's Booth School of Business. When she's not writing Python code, she's probably taking care of foster kittens or falling on the ice (thanks to a recent but so far not-very-successful attempt to learn to figure skate).

  • Simulations in Python: Discrete Event Simulation with SimPy
Lauren Oldja

Lauren is a Principal Data Scientist at the social good software company Bonterra, working primarily on political and not-profit fundraising business lines.

Lauren is also a frequent NumFOCUS volunteer. She is the Inaugural Chair of the NumFOCUS Champions Circle, a volunteer-led committee focused on generating leads and informing fundraising strategy for NumFOCUS. This is also Lauren's time as Executive Conference Chair for PyData NYC, having also Chaired the previous two events in 2018 and 2019

  • NumFOCUS Champions Circle presents: How to make friends and generate impact through open source communities
Lawrence Wilson Gray

Dr. Gray is the Head of Data Science at KPMG Spark, where he builds predictive products with machine learning and Python that are reinventing how Bookkeepers work. He teaches Data Science and Data Analytics at Georgetown University. He is also a frequent volunteer and committee member for PyData, PyCon, and NumFocus. He is a core contributor and maintainer for the open-source software Project, Yellowbrick. He earned his Ph.D. from Johns Hopkins University, School of Medicine in Cellular and Molecular Physiology.

  • 20 ideas to build social capital in the Data Science ecosystem
Li Qin

Li is Principal Data Scientist at CVS Health.  Li has PhD in Biostatistics from UPENN and 15+ years’ experience creating advanced DS & ML solutions for challenging, complex real problems in academia and industry.

  • Contextual Multi-Arm Bandit and its applications to digital experiments
Marcin Ziemiński

I graduated from Theoretical Computer Science at Jagiellonian University in Kraków. Since then, I have been designing and developing ML solutions at small to medium size startups. I am currently working at Ntropy where my main focus is entity search.

  • Herding Entities: Information Search and Synthesis in the Context of Transaction Data
Martin Hirzel

Martin Hirzel is a researcher and the manager of the AI Programming Models team at IBM Research AI. Martin received his PhD from the University of Colorado at Boulder in 2004; his thesis adviser was Amer Diwan. At IBM, Martin works on tools and languages for artificial intelligence and streaming systems. Martin's papers won awards at several conferences and he is an ACM Distinguished Scientist.

  • Fairness for Scikit-Learn Pipelines with Lale
Martin Shell

Martin Shell is VP Customer Success at Avaiga. Prior to Avaiga he has held numerous roles in development, consulting and technical sales for organizations including ILOG, Manhattan Associates and IBM. His focus has been working with organizations to fit appropriate analytical techniques to the solution of business problems and maximizing ROI. He holds an M.S. degree in Operations Research from M.I.T where his focus was Prescriptive Analytics/Decision Optimization

  • Turning Data/AI algorithms into production-ready applications in no time with Taipy, the next-gen Python application builder
Matthew Rocklin

Matthew is an open source software developer in the PyData ecosystem. He primarily works on Dask, a library for parallel computing in Python. Matthew worked for Anaconda and NVIDIA before starting a company, Coiled with a mission to enable scalable computing for the Python community.

  • Deploying Dask
  • Dask
Max Mergenthaler

CEO and Co-Founder of Nixtla, a time-series forecasting startup. Previously he was CTO and Co-Founder of Levo (YC S21). Max has worked in the ML industry for the last decade, where he has built and led ML teams. He has co-authored different papers on forecasting algorithms and decision theory. He is a co-maintainer of different open source libraries in the python ecosystem. His passion is the intersection between business and technology.

  • NixtlaVerse, bridging the gap between statistics and deep learning for time series.
Melissa McNeill

Melissa McNeill is a senior data scientist at the University of Chicago Crime Lab working to build and evaluate prediction models that are accurate, fair, and useful in the real world. She is a core contributor to Name Match, an open source probabilistic record linkage tool. Melissa holds a B.S. in Computer Science from Texas A&M and an M.S. in Analytics from Northwestern.

  • Customizable probabilistic record linkage with Name Match
Melissa Mendonça

Melissa is an applied mathematician and former university professor who fell in love with open source communities. She has been involved with the Python and PyData communities for some time, with a focus on outreach, education and DEI. She works at Quansight as a Senior Developer Experience Engineer, is a maintainer for NumPy and SciPy, and believes in the power of contributions beyond code.

  • Keynote - Can we optimize communities?
Michalis Xyntarakis

I am an Assistant Professor of Practice at Rutgers where I teach data science related topics. I am also a principal at a private consulting company, Cambridge Systematics, where I build data products based on location-based services data using Apache Spark. I have given talks at conferences before but never at a python conference. I have attended several PyCon conferences since 2009 (I believe) and few pyData conferences in NYC.

  • High-Dimensional Data Visualizations with MDS, t-SNE, and UMAP
Moussa Taifi

Moussa Taifi is currently a Senior Data Science Platform Engineer II at Xandr-Microsoft.

He holds a PhD in Computer and Information Science from Temple university. He is a machine learning and big data systems engineer, focused on data science productivity, reliability, performance and cost. He is interested in designing and implementing large scale AI products, through data collection, analysis and warehousing.

  • ML Latency No More: Useful Patterns to Reduce ML Prediction Latency to Sub X ms
Munaf A Qazi

Munaf is a Machine Learning Engineer at Munich Re standardizing MLOPs processes and the model retraining infrastructure for the North American Integrated Analytics team. Previously, Munaf was a Research Scientist at NYU passionate about data provenance and Auto ML. He also has multiple years of experience as a data scientist analyzing financial, consumer and digital data.

In his free time he tinkers with raspberry pis building fun gadgets in his miniworkshop.

  • Model Upgrade Schemes: Considerations for Updating Production Models
Mustafa Zengin

Staff Data Scientist

  • Building a Semantic Search Engine
Natalia Clementi
  • Dask
Paul Romer

Paul Romer, a University Professor at NYU, was co-recipient of the 2018 Nobel Prize in Economics Sciences. His work lies in the intersection of economics, innovation, technology, and urbanization. The central conclusion is that there are many feasible ways to speed human progress. Before coming to NYU, Paul taught at Stanford, and while there, started Aplia, an education technology company he later sold to Thomson Learning. Prior to his current role at NYU, Paul taught at Stanford, UC Berkeley, the University of Chicago, and the University of Rochester.

  • Keynote - Making Jupyter Ubiquitous and Making Billions from Crypto
Peter Vidos

Peter is the CEO & Co-Founder of Vizzu.

His primary focus is finding and utilizing use cases matching Vizzu's innovative data visualization approach. Peter has been involved with digital product development for over 15 years. He worked on products covering mobile app testing, online analytics, data visualization, e-learning & educational administration. Still, building a selfie teleport for fun is what he likes to brag about when asked to share previous experiences.

  • ipyvizzu-story - a new, open-source tool to build, create and share animated data stories with Python in Jupyter
Piero Ferrante

Piero Ferrante is a Senior Principal Data Scientist at CVS Health, a Fortune 4 health solutions company, where he and his team are focused on building scalable machine learning systems and developing tools to enhance the productivity and efficacy of hundreds of fellow data scientists and engineers.

Piero has nearly 15 years of applied experience in healthcare, telecom, insurance, mobile advertising, and fintech at companies ranging in size from unicorn startups to Fortune 500s. He holds an M.S. in Predictive Analytics from Northwestern University, a B.S. in Finance and Management Information Systems from the University of Delaware, and has served as an adjunct at New York University, the University of Kansas, and Rockhurst University. Piero also advises Play-it Heath, a digital health startup, on algorithms and data strategy.

  • Coldstart: A library for automatic data curation and feature engineering
Pierre Brunelle

Pierre Brunelle is the CEO and Co-Founder of Noteable, a collaborative data notebook that enables data-driven teams to use and visualize data, together. Prior to Noteable, Pierre led Amazon’s notebook initiatives both for internal use as well as for SageMaker. He also worked on many open source initiatives including a standard for Data Quality work and an open source collaboration between Amazon and UC Berkeley to advance AI and machine learning. Pierre helped launch the first Amazon online car leasing store in Europe. At Amazon Pierre also launched a Price Elasticity Service and pushed investments in Probabilistic Programming Frameworks. And Pierre represented Amazon on many occasions to teach Machine Learning or at conferences such as NeurIPS. Pierre also writes about Time in Organization Studies. Pierre holds an MS in Building Engineering from ESTP Paris and an MRes in Decision Sciences and Risk Management from Arts et Métiers ParisTech.

  • Troubleshooting your Data Workflows with Noteable + Dagster: A live debugging of failed jobs.
Popescu Daniel

I've been a developer for over a decade now. I have extensive working experience in different areas of IT, and for the last 4 years I've transitioned to Machine Learning, out of passion for the field. I've taken certifications in Machine Learning and competed on Kaggle. I've specialized in Natural Language Processing (NLP), working on multiple projects in which I've learned how to combine the different techniques and models available today, and how to tackle the different problems in the field. I've spent the last 3 years building Intelligent AI assistants by developing NLP and ML approaches for the interface between the human and the device, and using regression methods. My main motivation is making an impact and creating wonderful and innovative products that help others and deliver the best value.

  • Prompt Engineering ⚙️ - Addressing the sensitivity of Large language models
Ravi
  • Building a Semantic Search Engine
Reed Peterson

Reed Peterson is a Data Scientist at Aetna, a CVS Health Company where he contributes to various health intervention campaigns and machine learning optimization projects focused on reinforcement learning and natural language processing.

  • Contextual Multi-Arm Bandit and its applications to digital experiments
Robert Alvarez

Robert loves to break deep technical concepts down to be as simple as possible.

Robert has data science experience in companies both large and small. He is currently VP of Data Science for Podium Education, where he builds models to improve student outcomes, and an Artificial Intelligence Lead at NASA's Frontier Development Lab. Prior to Podium Education, he was a Senior Data Scientist at Metis teaching Data Science and Machine Learning. At Intel, he tackled problems in data center optimization using cluster analysis, enriched market sizing models by implementing sentiment analysis from social media feeds, and improved data-driven decision making in one of the top 5 global supply chains. At Tamr, he built models to unify large amounts of messy data across multiple silos for some of the largest corporations in the world. He earned a PhD in Applied Mathematics from Arizona State University where his research spanned image reconstruction, dynamical systems, mathematical epidemiology and oncology.

  • Hands-On Computer Vision with PyTorch
Rohit Supekar

Rohit Supekar is a data scientist at The New York Times, and he currently works on developing and deploying causal machine learning models to power The Times’s paywall. He is broadly passionate about understanding the world around us using data, building mathematically rigorous models, and deploying them using modern production-quality engineering tools.

Prior to joining The Times, he obtained a Ph.D. in 2021 and a Master's degree in 2017 from M.I.T., and a Bachelor's degree in 2015 from I.I.T. Madras in India. His Ph.D. thesis work involved building mathematical models for active fluids, such as a dense suspension of bacteria, by using a combination of partial differential equations, machine learning, and principles from fluid mechanics.

Outside of work, Rohit enjoys reading, long-distance running, and alpine skiing.

  • Causal machine learning for a smart paywall at The New York Times
Roni Kobrosly

I am a former epidemiology researcher who has spent approximately a decade employing causal modeling and inference. The bulk of my academic career was spent conducting data analyses to estimate the population-level effects of harmful environment exposures, when traditional randomized experiments were infeasible or unethical.

Since leaving the academic world, I've been loving my second life in the tech industry as a data scientist, ML engineer, and more recently as the Head of Data Science at a medium-sized health tech company based in Washington DC. I love mentoring junior data folks and explaining the magic of data analysis and modeling to non-technical audience.

I also am a member of the open-source community, being the author and maintainer of the causal-curve python package. This package provides a set of tools for estimating the causal impact of continuous/non-binary treatments (e.g. estimating the causal impact of a neighborhood's income inequality on local crime, or understanding the causal effect of increasing a product's price on conversion rates).

  • Introduction to Causal Inference
Sanjay Siddhanti

Sanjay Siddhanti joined AKASA as an early engineer in 2019 and currently serves as director of engineering. He has a passion for working on software to help people have a better experience with healthcare. His teams focus on AKASA’s AI-driven automation platform and data engineering problems. He has a B.S. in computer science and M.S. in biomedical informatics from Stanford University.

  • Implementing a Workflow Engine in Python
Savin Goyal

Savin is the co-founder and CTO of Outerbounds - where his team is building the modern ML stack to accelerate the impact of data science. Previously, he was at Netflix, where he built and open-sourced Metaflow, a full stack framework for data science.

  • Human-Friendly, Production-Ready Data Science Stack with Metaflow & Kubernetes
Sophia Yang

Sophia Yang is a Senior Data Scientist at Anaconda, Inc., where she uses data science to facilitate decision-making for various departments across the company. She volunteers as a Project Incubator at NumFOCUS to help Open Source Scientific projects grow. She is also the author of multiple Python open-source libraries such as condastats, cranlogs, PyPowerUp, intake-stripe, and intake-salesforce. She holds an M.S. in Statistics and Ph.D. in Educational Psychology from The University of Texas at Austin.

  • Level up your viz skills: from Matplotlib to HoloViz
Thomas Caswell

Tom Caswell, a trained physicist and staff scientist at NSLS-II at Brookhaven National Lab (BNL), works to bring modern programming and computational tools to the daily life of research scientists. At BNL, Caswell is a founder and core architect of the Bluesky Project which aims to simplify the scaling of particle physics research projects. Caswell is also a regular contributor to many core projects in the Scientific Python ecosystem, and was named a PSF Fellow in Q2 2022 in recognition of his contribution and leadership across the broader Python community.

  • Keynote - From Science to Open Source and Back Again
Thomas J. Fan

Thomas J. Fan is a Staff Software Engineer at Quansight Labs and is a maintainer for scikit-learn, an open-source machine learning library for Python. Previously, Thomas worked at Columbia University to improve interoperability between scikit-learn and AutoML systems. He is a maintainer for skorch, a neural network library that wraps PyTorch. Thomas has a Masters in Mathematics from NYU and a Masters in Physics from Stony Brook University.

  • Parallelism in Numerical Python Libraries
Thymo ter Doest

ML engineer at Ntropy | Co-founder at AccountingBox | MSc Machine Learning at UCL

  • Building highload ML powered service
Tonya Sims

Tonya is a former Professional Basketball player turned Python enthusiast. She is currently a Python Developer Advocate for Deepgram, a speech-to-text company that has revolutionized the market. Her path to Python is unconventional. Her career started in athletics and then transitioned to pharmaceutical sales. She finally landed in her destination spot, the tech industry. Driven by her passion for teaching, she takes pride in helping others and loves connecting with her fellow Pythonistas! Outside of coding, Tonya enjoys all things sports. She is also an avid reader who loves writing and spending time with her nieces and nephews.

  • Discover Inspirational Insights in Motivational Sports Speeches Using Speech-to-Text
Vishal Rathi

Software Engineer at Walmart Search

  • Building a Semantic Search Engine
Zach Musgrave

Zach leads development for Dolt, the world's first SQL database that you can fork and clone, branch and merge, push and pull just like a git repository. Zach studied computer science at the University of Washington, and spent the first 13 years of his career split between Amazon and Google before joining DoltHub. He's a fierce advocate for the value of client-side software in a server-side world.

  • Git for Data: Data Versioning for Reproducible Data Science with Dolt
Zhangziman Song

Zhangziman Song is a Data Scientist in the Sustainability Software Division at IBM. She's worked on building ML models using weather data for six years. She’s passionate about using AI and ML to enable businesses to prepare better for adverse weather conditions. More recently, she is working on AI models using other geospatial datasets such as satellite imagery and Lidar.

  • Predicting Weather-Caused Rare Events: A Utility Outage Prediction Use Case
conda/conda-forge, PyMC, NumPy/SciPy, & Matplotlib

Sprint Leads:
- conda/conda-forge — Marius van Niekerk, conda-forge core developer
- PyMC — Dr. Christian Luhmann, PyMC core developer
- NumPy/SciPy — Ganesh Kathiresan, NumPy core developer & Juan Luis Cano Rodríguez, - Prolific Open Source Contributor
- Matplotlib — Hannah Aizenman, matplotlib core developer

  • Open Source Project Sprints
nidhin pattaniyil

Machine Learning Engineer at Walmart Search

  • Serving Pytorch Models in Production
  • Building a Semantic Search Engine