59 min listen
Charting A Path For Streaming Data To Fill Your Data Lake With Hudi
Charting A Path For Streaming Data To Fill Your Data Lake With Hudi
ratings:
Length:
70 minutes
Released:
Jul 31, 2021
Format:
Podcast episode
Description
Data lake architectures have largely been biased toward batch processing workflows due to the volume of data that they are designed for. With more real-time requirements and the increasing use of streaming data there has been a struggle to merge fast, incremental updates with large, historical analysis. Vinoth Chandar helped to create the Hudi project while at Uber to address this challenge. By adding support for small, incremental inserts into large table structures, and building support for arbitrary update and delete operations the Hudi project brings the best of both worlds together. In this episode Vinoth shares the history of the project, how its architecture allows for building more frequently updated analytical queries, and the work being done to add a more polished experience to the data lake paradigm.
Released:
Jul 31, 2021
Format:
Podcast episode
Titles in the series (100)
Wallaroo with Sean T. Allen - Episode 12: Fast and Scalable Real-Time Stream Computation with Wallaroo (Interview) by Data Engineering Podcast