Discover this podcast and so much more

Podcasts are free to enjoy without a subscription. We also offer ebooks, audiobooks, and so much more for just $11.99/month.

An Exploration Of Tobias' Experience In Building A Data Lakehouse From Scratch

An Exploration Of Tobias' Experience In Building A Data Lakehouse From Scratch

FromData Engineering Podcast


An Exploration Of Tobias' Experience In Building A Data Lakehouse From Scratch

FromData Engineering Podcast

ratings:
Length:
72 minutes
Released:
Dec 25, 2022
Format:
Podcast episode

Description

Summary
Five years of hosting the Data Engineering Podcast has provided Tobias Macey with a wealth of insight into the work of building and operating data systems at a variety of scales and for myriad purposes. In order to condense that acquired knowledge into a format that is useful to everyone Scott Hirleman turns the tables in this episode and asks Tobias about the tactical and strategic aspects of his experiences applying those lessons to the work of building a data platform from scratch.
Announcements
Hello and welcome to the Data Engineering Podcast, the show about modern data management
When you're ready to build your next pipeline, or want to test out the projects you hear about on the show, you'll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode (https://www.dataengineeringpodcast.com/linode) today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don't forget to thank them for their continued support of this show!
Atlan is the metadata hub for your data ecosystem. Instead of locking your metadata into a new silo, unleash its transformative potential with Atlan's active metadata capabilities. Push information about data freshness and quality to your business intelligence, automatically scale up and down your warehouse based on usage patterns, and let the bots answer those questions in Slack so that the humans can focus on delivering real value. Go to dataengineeringpodcast.com/atlan (https://www.dataengineeringpodcast.com/atlan) today to learn more about how Atlan’s active metadata platform is helping pioneering data teams like Postman, Plaid, WeWork & Unilever achieve extraordinary things with metadata and escape the chaos.
Struggling with broken pipelines? Stale dashboards? Missing data? If this resonates with you, you’re not alone. Data engineers struggling with unreliable data need look no further than Monte Carlo, the leading end-to-end Data Observability Platform! Trusted by the data teams at Fox, JetBlue, and PagerDuty, Monte Carlo solves the costly problem of broken data pipelines. Monte Carlo monitors and alerts for data issues across your data warehouses, data lakes, dbt models, Airflow jobs, and business intelligence tools, reducing time to detection and resolution from weeks to just minutes. Monte Carlo also gives you a holistic picture of data health with automatic, end-to-end lineage from ingestion to the BI layer directly out of the box. Start trusting your data with Monte Carlo today! Visit dataengineeringpodcast.com/montecarlo (http://www.dataengineeringpodcast.com/montecarlo) to learn more.
Your host is Tobias Macey and today I'm being interviewed by Scott Hirleman about my work on the podcasts and my experience building a data platform
Interview
Introduction
How did you get involved in the area of data management?
Data platform building journey
Why are you building, who are the users/use cases
How to focus on doing what matters over cool tools
How to build a good UX
Anything surprising or did you discover anything you didn't expect at the start
How to build so it's modular and can be improved in the future
General build vs buy and vendor selection process
Obviously have a good BS detector - how can others build theirs
So many tools, where do you start - capability need, vendor suite offering, etc.
Anything surprising in doing much of this at once
How do you think about TCO in build versus buy
Any advice
Guest call out
Be brave, believe you are good enough to be on the show
Look at past episodes and don't pitch the same as what's been on recently
And vendors, be smart, work with your customers to come up with a good pitch for them as guests...
Tobias' a
Released:
Dec 25, 2022
Format:
Podcast episode

Titles in the series (100)

Weekly deep dives on data management with the engineers and entrepreneurs who are shaping the industry