Discover this podcast and so much more

Podcasts are free to enjoy without a subscription. We also offer ebooks, audiobooks, and so much more for just $11.99/month.

Reduce Friction In Your Business Analytics Through Entity Centric Data Modeling

Reduce Friction In Your Business Analytics Through Entity Centric Data Modeling

FromData Engineering Podcast


Reduce Friction In Your Business Analytics Through Entity Centric Data Modeling

FromData Engineering Podcast

ratings:
Length:
73 minutes
Released:
Jul 9, 2023
Format:
Podcast episode

Description

Summary
For business analytics the way that you model the data in your warehouse has a lasting impact on what types of questions can be answered quickly and easily. The major strategies in use today were created decades ago when the software and hardware for warehouse databases were far more constrained. In this episode Maxime Beauchemin of Airflow and Superset fame shares his vision for the entity-centric data model and how you can incorporate it into your own warehouse design.
Announcements
Hello and welcome to the Data Engineering Podcast, the show about modern data management
Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack (https://www.dataengineeringpodcast.com/rudderstack)
Your host is Tobias Macey and today I'm interviewing Max Beauchemin about the concept of entity-centric data modeling for analytical use cases
Interview
Introduction
How did you get involved in the area of data management?
Can you describe what entity-centric modeling (ECM) is and the story behind it?
How does it compare to dimensional modeling strategies?
What are some of the other competing methods
Comparison to activity schema
What impact does this have on ML teams? (e.g. feature engineering)
What role does the tooling of a team have in the ways that they end up thinking about modeling? (e.g. dbt vs. informatica vs. ETL scripts, etc.)
What is the impact on the underlying compute engine on the modeling strategies used?
What are some examples of data sources or problem domains for which this approach is well suited?
What are some cases where entity centric modeling techniques might be counterproductive?
What are the ways that the benefits of ECM manifest in use cases that are down-stream from the warehouse?
What are some concrete tactical steps that teams should be thinking about to implement a workable domain model using entity-centric principles?
How does this work across business domains within a given organization (especially at "enterprise" scale)?
What are the most interesting, innovative, or unexpected ways that you have seen ECM used?
What are the most interesting, unexpected, or challenging lessons that you have learned while working on ECM?
When is ECM the wrong choice?
What are your predictions for the future direction/adoption of ECM or other modeling techniques?
Contact Info
mistercrunch (https://github.com/mistercrunch) on GitHub
LinkedIn (https://www.linkedin.com/in/maximebeauchemin/)
Parting Question
From your perspective, what is the biggest gap in the tooling or technology for data management today?
Closing Announcements
Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ (https://www.pythonpodcast.com) covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast (https://www.themachinelearningpodcast.com) helps you go from idea to production with machine learning.
Visit the site (https://www.dataengineeringpodcast.com) to subscribe to the show, sign up for the mailing list, and read the show notes.
If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com (mailto:hosts@dataengineeringpodcast.com)) with your story.
To help other people find the show please leave a review on Apple Podcasts (https://podcasts.apple.com/us/podcast/data-engineering-podcast/id1193040557) and tell your friends and co-workers
Links
Entity Centric Modeling Blog Post (https://preset.io/blog/introducing-entity-centric-data-modeling-for-analytics/?utm_source=pocket_saves)
Max's Previous Apperances
Defining Data Engineering with
Released:
Jul 9, 2023
Format:
Podcast episode

Titles in the series (100)

Weekly deep dives on data management with the engineers and entrepreneurs who are shaping the industry