Discover this podcast and so much more

Podcasts are free to enjoy without a subscription. We also offer ebooks, audiobooks, and so much more for just $11.99/month.

Charting A Path For Streaming Data To Fill Your Data Lake With Hudi

Charting A Path For Streaming Data To Fill Your Data Lake With Hudi

FromData Engineering Podcast


Charting A Path For Streaming Data To Fill Your Data Lake With Hudi

FromData Engineering Podcast

ratings:
Length:
70 minutes
Released:
Jul 31, 2021
Format:
Podcast episode

Description

Data lake architectures have largely been biased toward batch processing workflows due to the volume of data that they are designed for. With more real-time requirements and the increasing use of streaming data there has been a struggle to merge fast, incremental updates with large, historical analysis. Vinoth Chandar helped to create the Hudi project while at Uber to address this challenge. By adding support for small, incremental inserts into large table structures, and building support for arbitrary update and delete operations the Hudi project brings the best of both worlds together. In this episode Vinoth shares the history of the project, how its architecture allows for building more frequently updated analytical queries, and the work being done to add a more polished experience to the data lake paradigm.
Released:
Jul 31, 2021
Format:
Podcast episode

Titles in the series (100)

Weekly deep dives on data management with the engineers and entrepreneurs who are shaping the industry