Maintaining Your Data Lake At Scale With Spark

FromData Engineering Podcast

Start listening View podcast show

Maintaining Your Data Lake At Scale With Spark

FromData Engineering Podcast

ratings:

Length:

51 minutes

Released:

Jun 17, 2019

Format:

Podcast episode

Description

Building and maintaining a data lake is a choose your own adventure of tools, services, and evolving best practices. The flexibility and freedom that data lakes provide allows for generating significant value, but it can also lead to anti-patterns and inconsistent quality in your analytics. Delta Lake is an open source, opinionated framework built on top of Spark for interacting with and maintaining data lake platforms that incorporates the lessons learned at DataBricks from countless customer use cases. In this episode Michael Armbrust, the lead architect of Delta Lake, explains how the project is designed, how you can use it for building a maintainable data lake, and some useful patterns for progressively refining the data in your lake. This conversation was useful for getting a better idea of the challenges that exist in large scale data analytics, and the current state of the tradeoffs between data lakes and data warehouses in the cloud.

Released:

Jun 17, 2019

Format:

Podcast episode

Titles in the series (100)

Weekly deep dives on data management with the engineers and entrepreneurs who are shaping the industry

Skip carousel

More Episodes from Data Engineering Podcast

Skip carousel

Related podcast episodes

Skip carousel

Discover this podcast and so much more

Maintaining Your Data Lake At Scale With Spark

Maintaining Your Data Lake At Scale With Spark

Description

Titles in the series (100)

More Episodes from Data Engineering Podcast

Related podcast episodes