PyData NYC 2022

Predicting Weather-Caused Rare Events: A Utility Outage Prediction Use Case
11-09, 11:45–12:30 (America/New_York), Music Box (5th floor)

The rarity and diversity of weather events and the large range of impacts of these events presents unique challenges in various phases of model building – feature engineering, model training, model evaluation and model selection. We discuss best in class approaches to optimize all relevant parameters and continuously improve model performance to deliver accurate actionable results via a highly scalable ML operational environment, enabling them to mitigate effects of climate change. We describe the challenges and approach using the Outage Prediction use case for Utility companies. These companies spend billions of dollars every year restoring power outages, majority of which are weather related. Climate change is creating more frequent and longer lasting power outages and making it harder to predict everyday weather events. Our approach has been used successfully in predicting weather caused outages that are then used to proactively mobilize the power restoration process.


This talk will first introduce the Outage Prediction Problem for Utility Companies – Utility companies’ operational teams have a need to be proactive while responding to power outages that occur after weather events. Proactive mobilization of crews for power restoration after a weather event requires a good understanding of the number of outages possible in their various mobilization zones.
The above problem can be translated to a machine learning problem where the main inputs to the machine learning module are historic weather data and historic weather-caused outage data. Several other additional data sets can be used to strengthen the solution.
The goal of this talk is to provide the audience a good understanding of the challenges in building such an outage prediction model, operationalizing, and measuring performance. Specifically, we will cover:

  1. Geo-spatial aspects of outage and weather data: Calculating weather features from raw weather data at scale; matching the granularity of weather and outage; and scaling the feature generation process.
  2. Data Quality analysis of outage data – a necessary step prior to generating model training data.
  3. Model training should handle the following challenges:
    - Sparsity of weather caused outages
    - Range of outages – weather events can result in fewer outages (e.g., a few hundred across the entire territory) to a large number of outages (e.g., after a hurricane or blizzard resulting in 1000s of outages). The weather events can also range from very small convective storms (e.g., 1–2-hour storms) to long lasting storms (e.g., hurricanes, blizzards).
    - Measuring model performance given the range of outages in weather events can make using standard techniques to compute errors a challenge.
    - Identifying the number of models to build for one utility territory to ensure there is enough data and range of data for each model.
  4. Operationalizing the models to predict outages every hour for the next several hours or days.
  5. Using Explainability to continuously understand and improve model performance as we get results from the operational environment and updated actual outages.
  6. Applying the above approach for other problems such as predicting weather-caused anomalies in sales and store traffic.

We expect the audience to have working knowledge of machine learning, Python, Pyspark and a basic understanding of operationalizing ML models.


Prior Knowledge Expected

No previous knowledge expected

Zhangziman Song is a Data Scientist in the Sustainability Software Division at IBM. She's worked on building ML models using weather data for six years. She’s passionate about using AI and ML to enable businesses to prepare better for adverse weather conditions. More recently, she is working on AI models using other geospatial datasets such as satellite imagery and Lidar.

Harini Srinivasan is a Senior Technical Staff Member in the IBM Sustainability Software organization. She currently leads a team of Data Scientists in building products incorporating advanced AI solutions using geo spatial and remote sensing data such as weather, satellite imagery and Lidar. In her 28 year career at IBM, she has also contributed significantly in the areas of Programming Languages and Runtimes, Performance Analysis and Tools, Social Media Analytics and Software Patterns. She has worked in IBM Research and IBM Software product divisions, published in major conferences and journals and has over 10 patents issued.