Charting A Path For Streaming Data To Fill Your Data Lake With Hudi

FromData Engineering Podcast

Start listening View podcast show

Charting A Path For Streaming Data To Fill Your Data Lake With Hudi

FromData Engineering Podcast

ratings:

Length:

70 minutes

Released:

Jul 31, 2021

Format:

Podcast episode

Description

Data lake architectures have largely been biased toward batch processing workflows due to the volume of data that they are designed for. With more real-time requirements and the increasing use of streaming data there has been a struggle to merge fast, incremental updates with large, historical analysis. Vinoth Chandar helped to create the Hudi project while at Uber to address this challenge. By adding support for small, incremental inserts into large table structures, and building support for arbitrary update and delete operations the Hudi project brings the best of both worlds together. In this episode Vinoth shares the history of the project, how its architecture allows for building more frequently updated analytical queries, and the work being done to add a more polished experience to the data lake paradigm.

Released:

Jul 31, 2021

Format:

Podcast episode

Titles in the series (100)

Weekly deep dives on data management with the engineers and entrepreneurs who are shaping the industry

Skip carousel

More Episodes from Data Engineering Podcast

Skip carousel

Related podcast episodes

Skip carousel

Discover this podcast and so much more

Charting A Path For Streaming Data To Fill Your Data Lake With Hudi

Charting A Path For Streaming Data To Fill Your Data Lake With Hudi

Description

Titles in the series (100)

More Episodes from Data Engineering Podcast

Related podcast episodes