Discover this podcast and so much more

Podcasts are free to enjoy without a subscription. We also offer ebooks, audiobooks, and so much more for just $11.99/month.

Moving Machine Learning Into The Data Pipeline at Cherre

Moving Machine Learning Into The Data Pipeline at Cherre

FromData Engineering Podcast


Moving Machine Learning Into The Data Pipeline at Cherre

FromData Engineering Podcast

ratings:
Length:
48 minutes
Released:
Apr 20, 2021
Format:
Podcast episode

Description

Most of the time when you think about a data pipeline or ETL job what comes to mind is a purely mechanistic progression of functions that move data from point A to point B. Sometimes, however, one of those transformations is actually a full-fledged machine learning project in its own right. In this episode Tal Galfsky explains how he and the team at Cherre tackled the problem of messy data for Addresses by building a natural language processing and entity resolution system that is served as an API to the rest of their pipelines. He discusses the myriad ways that addresses are incomplete, poorly formed, and just plain wrong, why it was a big enough pain point to invest in building an industrial strength solution for it, and how it actually works under the hood. After listening to this you'll look at your data pipelines in a new light and start to wonder how you can bring more advanced strategies into the cleaning and transformation process.
Released:
Apr 20, 2021
Format:
Podcast episode

Titles in the series (100)

Weekly deep dives on data management with the engineers and entrepreneurs who are shaping the industry