PyData NYC 2022

A Graph-based Machine Learning early warning system to detect Ransomware
11-09, 15:30–16:15 (America/New_York), Winter Garden (5th floor)

This talk investigates the use of Graph based Machine Learning as an alternative method to detect the early stages of a ransomware attack, that often involve the deployment of tools like Cobalt strike. We introduce a methodology for Graph based machine learning, from developing a graph model and feature extraction, through to supervised and unsupervised machine learning techniques. The talk will provide the audience with a practical hands-on approach that can be used to deploy Graph based techniques on their own datasets.


Ransomware increasingly involves phishing attacks combined with tools like the Cobalt Strike penetration suite, that deploys Command and Control (C2) beacons to end-user computers. Using data from the open-source Security Datasets project, including AWS, Linux and Windows logs, we leverage Graph based Machine Learning as an alternative method to detect simulated Cobalt strike beacon activity. We walk through the process of developing a graph model, align with the MITRE ATT&CK framework to categorize post-compromise behavior such as defence evasion; privilege escalation and lateral movement. Features extracted include launching PowerShell commands, DLL payloads, and task schedules. We convert the data into a graph representation and review the effectiveness of supervised and unsupervised machine learning techniques. Firstly, we develop a baseline of the graph data, representing normal activity and train a model to detect anomalies such as a Cobalt Strike C2 beacon. Secondly, we train a graph machine learning model on specific attacks in the MITRE ATT&CK framework and measure the effectiveness of detecting individual steps within an attack chain. The talk will use NetworkX and SurrealDb, and is focussed at an intermediatory level. The takeaway is a methodology for unlocking the power of graph-based machine learning, that can be applied more generally to other datasets.


Prior Knowledge Expected

No previous knowledge expected