46 min listen
Moving Machine Learning Into The Data Pipeline at Cherre
Moving Machine Learning Into The Data Pipeline at Cherre
ratings:
Length:
48 minutes
Released:
Apr 20, 2021
Format:
Podcast episode
Description
Most of the time when you think about a data pipeline or ETL job what comes to mind is a purely mechanistic progression of functions that move data from point A to point B. Sometimes, however, one of those transformations is actually a full-fledged machine learning project in its own right. In this episode Tal Galfsky explains how he and the team at Cherre tackled the problem of messy data for Addresses by building a natural language processing and entity resolution system that is served as an API to the rest of their pipelines. He discusses the myriad ways that addresses are incomplete, poorly formed, and just plain wrong, why it was a big enough pain point to invest in building an industrial strength solution for it, and how it actually works under the hood. After listening to this you'll look at your data pipelines in a new light and start to wonder how you can bring more advanced strategies into the cleaning and transformation process.
Released:
Apr 20, 2021
Format:
Podcast episode
Titles in the series (100)
Dask with Matthew Rocklin - Episode 2 by Data Engineering Podcast