PyData NYC 2022

Herding Entities: Information Search and Synthesis in the Context of Transaction Data
11-10, 14:15–15:00 (America/New_York), Winter Garden (5th floor)

Entity search and synthesis underlies transaction enrichment at Ntropy. Access to clean and detailed company information is a prerequisite to deep understanding of financial data. In this talk we outline how various data sources including some popular search engines, document and vector databases, with the help of clever heuristics and ranking models, are utilized to fully recognize participants of a transaction, and consequently enable more involved analysis.


At Ntropy we deal with noisy and frequently obfuscated transaction information. We strive to extract as much relevant information as possible from available data sources in order to enrich transactions with data.

Full understanding of the nature of a transaction requires us to obtain correct and useful information on the entities recognized in its description. For this purpose we make use of popular search engines as well as our own company databases. We show how carefully constructed indexes of documents, vector embeddings, together with ranking models help us find relevant pieces of information in the sea of noise and stitch it together into a final form.

We show how in the environment of cluttered, frequently inaccurate data with very little ground truth, we are able to tweak our heuristics, train models to deliver valuable entity information in performant and robust ways.

We also discuss how our internal labeling services aid us in this process, and how all of the intricate steps are orchestrated as reusable pipelines.


Prior Knowledge Expected

No previous knowledge expected

I graduated from Theoretical Computer Science at Jagiellonian University in Kraków. Since then, I have been designing and developing ML solutions at small to medium size startups. I am currently working at Ntropy where my main focus is entity search.