Han Wang
Han Wang is the lead of Lyft Machine Learning Platform, focusing on distributed computing and training. Before joining Lyft, he worked at Microsoft, Hudson River Trading, Amazon, and Quantlab. Han is the founder of the Fugue project, aiming at democratizing distributed computing and machine learning.
Sessions
Data practitioners use distributed computing frameworks such as Spark, Dask, and Ray to work with big data. One of the major pain points of these frameworks is testability. For testing simple code changes, users have to spin up local clusters, which have a high overhead. In some cases, code dependencies force testing against a cluster. Because testing on big data is hard, it becomes easy for practitioners to avoid testing entirely. In this talk, we’ll show best practices for testing big data applications. By using Fugue to decouple logic and execution, we can bring more tests locally and make it easier for data practitioners to test with low overhead.