42 min listen
Improving The Performance Of Cloud-Native Big Data At Netflix Using The Iceberg Table Format with Ryan Blue - Episode 52
Improving The Performance Of Cloud-Native Big Data At Netflix Using The Iceberg Table Format with Ryan Blue - Episode 52
ratings:
Length:
54 minutes
Released:
Oct 15, 2018
Format:
Podcast episode
Description
With the growth of the Hadoop ecosystem came a proliferation of implementations for the Hive table format. Unfortunately, with no formal specification, each project works slightly different which increases the difficulty of integration across systems. The Hive format is also built with the assumptions of a local filesystem which results in painful edge cases when leveraging cloud object storage for a data lake. In this episode Ryan Blue explains how his work on the Iceberg table format specification and reference implementation has allowed Netflix to improve the performance and simplify operations for their S3 data lake. This is a highly detailed and technical exploration of how a well-engineered metadata layer can improve the speed, accuracy, and utility of large scale, multi-tenant, cloud-native data platforms.
Released:
Oct 15, 2018
Format:
Podcast episode
Titles in the series (100)
Rebuilding Yelp's Data Pipeline with Justin Cunningham - Episode 5 by Data Engineering Podcast