Introduction to Data Platforms: How to leverage data fabric concepts to engineer your organization's data for today's cloud-based digital world

Ebook382 pages4 hours

Introduction to Data Platforms: How to leverage data fabric concepts to engineer your organization's data for today's cloud-based digital world

Name: Introduction to Data Platforms: How to leverage data fabric concepts to engineer your organization's data for today's cloud-based digital world
Author: Anthony David Giordano
ISBN: 9798885053877

By Anthony David Giordano

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Digital, cloud, and artificial intelligence (AI) have disrupted how we use data. This disruption has changed the way we need to provision, curate, and publish data for the multiple use cases in today's technology-driven environment. This text will cover how to design, develop, and evolve a data platform for all the uses of enterprise data needed in today's digital organization.

This book focuses on explaining what a data platform is, what value it provides, how is it engineered, and how to deploy a data platform and support organization. In this context, Introduction to Data Platforms

reviews the current requirements for data in the digital age and quantifies the use cases;

discusses the evolution of data over the past twenty years, which is a core driver of the modern data platform;

defines what a data platform is and defines the architectural components and layers of a data platform;

provides the architectural layers or capabilities of a data platform;

reviews cloud- and commercial-software vendors that populate the data-platform space;

provides a step-by-step approach to engineering, deploying, supporting, and evolving a data-platform environment;

provides a step-by-step approach to migrating legacy data warehouses, data marts, and data lakes/sandboxes to a data platform; and

reviews organizational structures for managing data platform environments.

Skip carousel

Computers

LanguageEnglish

PublisherFulton Books, Inc.

Release dateNov 3, 2022

ISBN9798885053877

Author

Anthony David Giordano

Related authors

Skip carousel

Related to Introduction to Data Platforms

Related ebooks

Skip carousel

Making Big Data Work for Your Business: A guide to effective Big Data analytics
Ebook
Making Big Data Work for Your Business: A guide to effective Big Data analytics
bySudhi Sinha
Rating: 0 out of 5 stars
0 ratings
Managing Data in Motion: Data Integration Best Practice Techniques and Technologies
Ebook
Managing Data in Motion: Data Integration Best Practice Techniques and Technologies
byApril Reeve
Rating: 0 out of 5 stars
0 ratings
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
Ebook
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
byAJIT DASH
Rating: 3 out of 5 stars
3/5
Data Virtualization: Selected Writings
Ebook
Data Virtualization: Selected Writings
byRick F. van der Lans
Rating: 0 out of 5 stars
0 ratings
Building Big Data Applications
Ebook
Building Big Data Applications
byKrish Krishnan
Rating: 0 out of 5 stars
0 ratings
Big Data for Enterprise Architects
Ebook
Big Data for Enterprise Architects
byDr Mehmet Yildiz
Rating: 5 out of 5 stars
5/5
Knight's Microsoft Business Intelligence 24-Hour Trainer: Leveraging Microsoft SQL Server Integration, Analysis, and Reporting Services with Excel and SharePoint
Ebook
Knight's Microsoft Business Intelligence 24-Hour Trainer: Leveraging Microsoft SQL Server Integration, Analysis, and Reporting Services with Excel and SharePoint
byBrian Knight
Rating: 3 out of 5 stars
3/5
Mastering Snowflake Platform: Generate, fetch, and automate Snowflake data as a skilled data practitioner (English Edition)
Ebook
Mastering Snowflake Platform: Generate, fetch, and automate Snowflake data as a skilled data practitioner (English Edition)
byPooja Kelgaonkar
Rating: 0 out of 5 stars
0 ratings
Big Data Analytics: From Strategic Planning to Enterprise Integration with Tools, Techniques, NoSQL, and Graph
Ebook
Big Data Analytics: From Strategic Planning to Enterprise Integration with Tools, Techniques, NoSQL, and Graph
byDavid Loshin
Rating: 5 out of 5 stars
5/5
RDBMS In-Depth: Mastering SQL and PL/SQL Concepts, Database Design, ACID Transactions, and Practice Real Implementation of RDBM (English Edition)
Ebook
RDBMS In-Depth: Mastering SQL and PL/SQL Concepts, Database Design, ACID Transactions, and Practice Real Implementation of RDBM (English Edition)
byDr. Madhavi Vaidya
Rating: 0 out of 5 stars
0 ratings
Big Data: Unleashing the Power of Data to Transform Industries and Drive Innovation
Ebook
Big Data: Unleashing the Power of Data to Transform Industries and Drive Innovation
byMay Reads
Rating: 0 out of 5 stars
0 ratings
Banking on Cloud Data Platforms: A Guide
Ebook
Banking on Cloud Data Platforms: A Guide
byDillip Kumar
Rating: 0 out of 5 stars
0 ratings
Data Lake Development with Big Data
Ebook
Data Lake Development with Big Data
byPasupuleti Pradeep
Rating: 0 out of 5 stars
0 ratings
Data Warehousing Fundamentals for IT Professionals
Ebook
Data Warehousing Fundamentals for IT Professionals
byPaulraj Ponniah
Rating: 3 out of 5 stars
3/5
Data Virtualization for Business Intelligence Systems: Revolutionizing Data Integration for Data Warehouses
Ebook
Data Virtualization for Business Intelligence Systems: Revolutionizing Data Integration for Data Warehouses
byRick van der Lans
Rating: 4 out of 5 stars
4/5
Learn Data Warehousing in 24 Hours
Ebook
Learn Data Warehousing in 24 Hours
byAlex Nordeen
Rating: 0 out of 5 stars
0 ratings
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
Ebook
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
byJanet Laane Effron
Rating: 0 out of 5 stars
0 ratings
Thriving in a Data World: A Guide for Leaders and Managers
Ebook
Thriving in a Data World: A Guide for Leaders and Managers
bySangeeta Krishnan
Rating: 0 out of 5 stars
0 ratings
Big Data Architecture A Complete Guide - 2019 Edition
Ebook
Big Data Architecture A Complete Guide - 2019 Edition
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
Data Architects A Complete Guide - 2021 Edition
Ebook
Data Architects A Complete Guide - 2021 Edition
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
Data vault modeling Complete Self-Assessment Guide
Ebook
Data vault modeling Complete Self-Assessment Guide
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
Building the Data Warehouse
Ebook
Building the Data Warehouse
byW.H. Inmon
Rating: 5 out of 5 stars
5/5
Data Quality Strategies A Complete Guide - 2020 Edition
Ebook
Data Quality Strategies A Complete Guide - 2020 Edition
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
BigData Analytics: Solution Or Resolution?
Ebook
BigData Analytics: Solution Or Resolution?
byBinayaka Mishra
Rating: 3 out of 5 stars
3/5
How Product Managers Can Learn To Understand Their Customers: Techniques For Product Managers To Better Understand What Their Customers Really Want
Ebook
How Product Managers Can Learn To Understand Their Customers: Techniques For Product Managers To Better Understand What Their Customers Really Want
byJim Anderson
Rating: 0 out of 5 stars
0 ratings
MDM and Metadata Standard Requirements
Ebook
MDM and Metadata Standard Requirements
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
Mastering Knowledge Management: A Comprehensive Guide to Achieving Competitive Advantage
Ebook
Mastering Knowledge Management: A Comprehensive Guide to Achieving Competitive Advantage
bySarah W Muriithi
Rating: 0 out of 5 stars
0 ratings
Data model Second Edition
Ebook
Data model Second Edition
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
Azure Databricks Strategy A Complete Guide - 2020 Edition
Ebook
Azure Databricks Strategy A Complete Guide - 2020 Edition
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
SQL Server Reporting Services Complete Self-Assessment Guide
Ebook
SQL Server Reporting Services Complete Self-Assessment Guide
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings

Computers For You

Skip carousel

Slenderman: Online Obsession, Mental Illness, and the Violent Crime of Two Midwestern Girls
Ebook
Slenderman: Online Obsession, Mental Illness, and the Violent Crime of Two Midwestern Girls
byKathleen Hale
Rating: 4 out of 5 stars
4/5
101 Awesome Builds: Minecraft® Secrets from the World's Greatest Crafters
Ebook
101 Awesome Builds: Minecraft® Secrets from the World's Greatest Crafters
byTriumph Books
Rating: 4 out of 5 stars
4/5
CompTIA Security+ Practice Questions
Ebook
CompTIA Security+ Practice Questions
byIP Specialist
Rating: 2 out of 5 stars
2/5
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
Ebook
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
byWalter Shields
Rating: 4 out of 5 stars
4/5
How to Create Cpn Numbers the Right way: A Step by Step Guide to Creating cpn Numbers Legally
Ebook
How to Create Cpn Numbers the Right way: A Step by Step Guide to Creating cpn Numbers Legally
byAlex Parkinson
Rating: 4 out of 5 stars
4/5
Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics
Ebook
Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics
byGary Smith
Rating: 4 out of 5 stars
4/5
The Invisible Rainbow: A History of Electricity and Life
Ebook
The Invisible Rainbow: A History of Electricity and Life
byArthur Firstenberg
Rating: 4 out of 5 stars
4/5
The ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology
Ebook
The ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology
byTJ Books
Rating: 0 out of 5 stars
0 ratings
Elon Musk
Ebook
Elon Musk
byWalter Isaacson
Rating: 4 out of 5 stars
4/5
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
Ebook
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
bySteven Cooper
Rating: 4 out of 5 stars
4/5
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
Ebook
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
byCea West
Rating: 5 out of 5 stars
5/5
Ultimate Guide to Mastering Command Blocks!: Minecraft Keys to Unlocking Secret Commands
Ebook
Ultimate Guide to Mastering Command Blocks!: Minecraft Keys to Unlocking Secret Commands
byTriumph Books
Rating: 5 out of 5 stars
5/5
Procreate for Beginners: Introduction to Procreate for Drawing and Illustrating on the iPad
Ebook
Procreate for Beginners: Introduction to Procreate for Drawing and Illustrating on the iPad
byAaron Smith
Rating: 0 out of 5 stars
0 ratings
Machine Learning for Beginners: An Introduction for Beginners, Why Machine Learning Matters Today and How Machine Learning Networks, Algorithms, Concepts and Neural Networks Really Work
Ebook
Machine Learning for Beginners: An Introduction for Beginners, Why Machine Learning Matters Today and How Machine Learning Networks, Algorithms, Concepts and Neural Networks Really Work
bySteven Cooper
Rating: 4 out of 5 stars
4/5
The Hacker Crackdown: Law and Disorder on the Electronic Frontier
Ebook
The Hacker Crackdown: Law and Disorder on the Electronic Frontier
byBruce Sterling
Rating: 4 out of 5 stars
4/5
Master Builder Roblox: The Essential Guide
Ebook
Master Builder Roblox: The Essential Guide
byTriumph Books
Rating: 4 out of 5 stars
4/5
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
Ebook
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
byArthur T. Brooks
Rating: 0 out of 5 stars
0 ratings
Deep Search: How to Explore the Internet More Effectively
Ebook
Deep Search: How to Explore the Internet More Effectively
byAlan Pearce
Rating: 5 out of 5 stars
5/5
Alan Turing: The Enigma: The Book That Inspired the Film The Imitation Game - Updated Edition
Ebook
Alan Turing: The Enigma: The Book That Inspired the Film The Imitation Game - Updated Edition
byAndrew Hodges
Rating: 4 out of 5 stars
4/5
Practical Lock Picking: A Physical Penetration Tester's Training Guide
Ebook
Practical Lock Picking: A Physical Penetration Tester's Training Guide
byDeviant Ollam
Rating: 5 out of 5 stars
5/5
The Professional Voiceover Handbook: Voiceover training, #1
Ebook
The Professional Voiceover Handbook: Voiceover training, #1
byPeter Baker
Rating: 5 out of 5 stars
5/5
Grokking Algorithms: An illustrated guide for programmers and other curious people
Ebook
Grokking Algorithms: An illustrated guide for programmers and other curious people
byAditya Bhargava
Rating: 4 out of 5 stars
4/5
Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are
Ebook
Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are
bySeth Stephens-Davidowitz
Rating: 4 out of 5 stars
4/5
Dark Aeon: Transhumanism and the War Against Humanity
Ebook
Dark Aeon: Transhumanism and the War Against Humanity
byJoe Allen
Rating: 5 out of 5 stars
5/5
The Simulation Hypothesis: An MIT Computer Scientist Shows Why AI, Quantum Physics and Eastern Mystics All Agree We Are In a Video Game
Ebook
The Simulation Hypothesis: An MIT Computer Scientist Shows Why AI, Quantum Physics and Eastern Mystics All Agree We Are In a Video Game
byRizwan Virk
Rating: 5 out of 5 stars
5/5
The Designer's Web Handbook: What You Need to Know to Create for the Web
Ebook
The Designer's Web Handbook: What You Need to Know to Create for the Web
byPatrick McNeil
Rating: 0 out of 5 stars
0 ratings
Web Designer's Idea Book, Volume 4: Inspiration from the Best Web Design Trends, Themes and Styles
Ebook
Web Designer's Idea Book, Volume 4: Inspiration from the Best Web Design Trends, Themes and Styles
byPatrick McNeil
Rating: 4 out of 5 stars
4/5
Learning the Chess Openings
Ebook
Learning the Chess Openings
byJef Kaan
Rating: 5 out of 5 stars
5/5
Remote/WebCam Notarization : Basic Understanding
Ebook
Remote/WebCam Notarization : Basic Understanding
byJeannie Eunice Franks
Rating: 3 out of 5 stars
3/5
People Skills for Analytical Thinkers
Ebook
People Skills for Analytical Thinkers
byGilbert Eijkelenboom
Rating: 5 out of 5 stars
5/5

Related podcast episodes

Skip carousel

Introduction to Data Governance with Skyflow’s Ashley Jose: In this episode, Ashley Jose, a product lead at Skyflow with a decade of experience in SaaS product management, explores the importance of data governance in today's data-driven world. He discusses the impact of growing data on business decisions and...
Podcast episode
Introduction to Data Governance with Skyflow’s Ashley Jose: In this episode, Ashley Jose, a product lead at Skyflow with a decade of experience in SaaS product management, explores the importance of data governance in today's data-driven world. He discusses the impact of growing data on business decisions and...
byPartially Redacted: Data, AI, Security, and Privacy
0 ratings
0% found this document useful
Ali Ghodsi – The Past, Present, and Future of Big Data – [Founder’s Field Guide, EP.18]: My Guest today is Ali Ghodsi, founder and CEO of Databricks, a data analytics platform for data scientists and developers. He's also the founder of Apache Spark, the open-source project that Databricks is built on, and is an accomplished researcher at...
Podcast episode
Ali Ghodsi – The Past, Present, and Future of Big Data – [Founder’s Field Guide, EP.18]: My Guest today is Ali Ghodsi, founder and CEO of Databricks, a data analytics platform for data scientists and developers. He's also the founder of Apache Spark, the open-source project that Databricks is built on, and is an accomplished researcher at...
byInvest Like the Best with Patrick O'Shaughnessy
0 ratings
0% found this document useful
Beam and Spark with Holden Karau: This week our colleague, Holden Karau, joins us to talk about Spark and Beam.
Podcast episode
Beam and Spark with Holden Karau: This week our colleague, Holden Karau, joins us to talk about Spark and Beam.
byGoogle Cloud Platform Podcast
0 ratings
0% found this document useful
Stryker on How to Connect Data Strategy to Business Value: Modern data leaders know creating a data-informed culture requires cross-functional partnership and collaboration across the entire business. IT by themselves can’t do it. Nor can individual business departments. Both the IT and business strategy must be in lock step to achieve results. On this episode of The Data Chief, Dora Boussias, Senior Director of Data Strategy and Architecture at Stryker, discusses the role of modern data executives, three keys to creating a data-informed culture, and her approach to breaking down silos based on her own 28 years of experience building effective data strategies across industries.
Podcast episode
Stryker on How to Connect Data Strategy to Business Value: Modern data leaders know creating a data-informed culture requires cross-functional partnership and collaboration across the entire business. IT by themselves can’t do it. Nor can individual business departments. Both the IT and business strategy must be in lock step to achieve results. On this episode of The Data Chief, Dora Boussias, Senior Director of Data Strategy and Architecture at Stryker, discusses the role of modern data executives, three keys to creating a data-informed culture, and her approach to breaking down silos based on her own 28 years of experience building effective data strategies across industries.
byThe Data Chief
0 ratings
0% found this document useful
Scaling Data Governance For Global Businesses With A Data Hub Architecture - Episode 123: An interview about how a data hub architecture can reduce the overhead of managing data governance and compliance across an organization
Podcast episode
Scaling Data Governance For Global Businesses With A Data Hub Architecture - Episode 123: An interview about how a data hub architecture can reduce the overhead of managing data governance and compliance across an organization
byData Engineering Podcast
0 ratings
0% found this document useful
Reframing Data Strategy Alignment: Reframing Data Strategy Alignment
Podcast episode
Reframing Data Strategy Alignment: Reframing Data Strategy Alignment
byInsights Tomorrow
0 ratings
0% found this document useful
Using Product Driven Development To Improve The Productivity And Effectiveness Of Your Data Teams: With all of the messaging about treating data as a product it is becoming difficult to know what that even means. Vishal Singh is the head of products at Starburst which means that he has to spend all of his time thinking and talking about the details of product thinking and its application to data. In this episode he shares his thoughts on the strategic and tactical elements of moving your work as a data professional from being task-oriented to being product-oriented and the long term improvements in your productivity that it provides.
Podcast episode
Using Product Driven Development To Improve The Productivity And Effectiveness Of Your Data Teams: With all of the messaging about treating data as a product it is becoming difficult to know what that even means. Vishal Singh is the head of products at Starburst which means that he has to spend all of his time thinking and talking about the details of product thinking and its application to data. In this episode he shares his thoughts on the strategic and tactical elements of moving your work as a data professional from being task-oriented to being product-oriented and the long term improvements in your productivity that it provides.
byData Engineering Podcast
0 ratings
0% found this document useful
Delivering Data and Analytics Value: CEOs cite data and analytics as the top capability for enabling growth over the next two years. In this podcast, Gartner’s chief of research for data and analytics, Carlie Idoine, highlights the top issues facing chief data and analytics officers (CDAOs) and how to demonstrate value.
Podcast episode
Delivering Data and Analytics Value: CEOs cite data and analytics as the top capability for enabling growth over the next two years. In this podcast, Gartner’s chief of research for data and analytics, Carlie Idoine, highlights the top issues facing chief data and analytics officers (CDAOs) and how to demonstrate value.
byTechWave: A Gartner Podcast for IT Leaders
0 ratings
0% found this document useful
Using Your Data Warehouse As The Source Of Truth For Customer Data With Hightouch: An episode about the Hightouch platform and how it allows you to maintain a single source of truth for all of your customer data in your data warehouse and keep all of your downstream systems accurate and up to date.
Podcast episode
Using Your Data Warehouse As The Source Of Truth For Customer Data With Hightouch: An episode about the Hightouch platform and how it allows you to maintain a single source of truth for all of your customer data in your data warehouse and keep all of your downstream systems accurate and up to date.
byData Engineering Podcast
0 ratings
0% found this document useful
ThoughtSpot’s Cindi Howson on Best Practices for Driving Impact from Data & Analytics (Ep. 26 Refresh): On this episode of The Data Chief, we look back at some of the key themes from season one, including the rise of the CDO, the intricacies of aligning your department’s goals with that of the businesses, and how you coped with accelerated timelines. While we relive these important conversations, we also discuss why culture and data fluency continue to be the biggest hurdles to becoming a truly data-driven business.
Podcast episode
ThoughtSpot’s Cindi Howson on Best Practices for Driving Impact from Data & Analytics (Ep. 26 Refresh): On this episode of The Data Chief, we look back at some of the key themes from season one, including the rise of the CDO, the intricacies of aligning your department’s goals with that of the businesses, and how you coped with accelerated timelines. While we relive these important conversations, we also discuss why culture and data fluency continue to be the biggest hurdles to becoming a truly data-driven business.
byThe Data Chief
0 ratings
0% found this document useful
Six Rules to Dominate the Decade of Data: Today’s digital economy is more competitive than ever, and making smart data decisions can be what sets the leaders apart from the rest of the field. With new technologies now democratizing data at an accelerated pace, how can companies ensure that their data strategy is helping them stay ahead of the curve? As our third season comes to a close, Cindi takes a look back at some of our most insightful conversations to lay out six essential rules that every Data Chief should follow to dominate the decade of data.
Podcast episode
Six Rules to Dominate the Decade of Data: Today’s digital economy is more competitive than ever, and making smart data decisions can be what sets the leaders apart from the rest of the field. With new technologies now democratizing data at an accelerated pace, how can companies ensure that their data strategy is helping them stay ahead of the curve? As our third season comes to a close, Cindi takes a look back at some of our most insightful conversations to lay out six essential rules that every Data Chief should follow to dominate the decade of data.
byThe Data Chief
0 ratings
0% found this document useful
Low Friction Data Governance With Immuta: An interview about how the Immuta platform simplifies the work of managing access control and data security as part of your data governance strategy.
Podcast episode
Low Friction Data Governance With Immuta: An interview about how the Immuta platform simplifies the work of managing access control and data security as part of your data governance strategy.
byData Engineering Podcast
0 ratings
0% found this document useful
Production data labeling workflows: with Mark Christensen, CEO of Xelex.ai
Podcast episode
Production data labeling workflows: with Mark Christensen, CEO of Xelex.ai
byPractical AI: Machine Learning, Data Science
0 ratings
0% found this document useful
Reflections On Designing A Data Platform From Scratch: A monologue by Tobias Macey, the host of the show, about the design considerations involved in building a data platform and how the lessons learned from running the Data Engineering Podcast are influencing the choices made.
Podcast episode
Reflections On Designing A Data Platform From Scratch: A monologue by Tobias Macey, the host of the show, about the design considerations involved in building a data platform and how the lessons learned from running the Data Engineering Podcast are influencing the choices made.
byData Engineering Podcast
100%
100% found this document useful
Welcome to the Data Driven Podcast -- Benjamin Shapiro // I Hear Everything
Podcast episode
Welcome to the Data Driven Podcast -- Benjamin Shapiro // I Hear Everything
byData Driven - Learn essential data literacy, AI and storytelling skills to future proof your career and fuel data informed decisions
0 ratings
0% found this document useful
Self Service Data Exploration And Dashboarding With Superset: An interview with Maxime Beauchemin about how to use Apache Superset as a platform for self-service data exploration and analytics.
Podcast episode
Self Service Data Exploration And Dashboarding With Superset: An interview with Maxime Beauchemin about how to use Apache Superset as a platform for self-service data exploration and analytics.
byData Engineering Podcast
0 ratings
0% found this document useful
#456: Data Architectures with AWS Hero Elliott Cordo: AWS Data Hero and Head of Data at Capsule, Elliott Cordo, has built many ground-up data architecture
Podcast episode
#456: Data Architectures with AWS Hero Elliott Cordo: AWS Data Hero and Head of Data at Capsule, Elliott Cordo, has built many ground-up data architecture
byAWS Podcast
0 ratings
0% found this document useful
EP 161 - How to maintain data quality across systems: This week, our guest is , Chief Data Officer of . Profisee is a cloud-native master data management solution that helps enterprises solve data quality and governance issues. In this talk, we discussed the challenges related to data management, from...
Podcast episode
EP 161 - How to maintain data quality across systems: This week, our guest is , Chief Data Officer of . Profisee is a cloud-native master data management solution that helps enterprises solve data quality and governance issues. In this talk, we discussed the challenges related to data management, from...
byIndustrial IoT Spotlight
0 ratings
0% found this document useful
Maintaining Your Data Lake At Scale With Spark - Episode 85: A conversation with the architect of Delta Lake on the challenges of building a sustainable data lake at scale
Podcast episode
Maintaining Your Data Lake At Scale With Spark - Episode 85: A conversation with the architect of Delta Lake on the challenges of building a sustainable data lake at scale
byData Engineering Podcast
0 ratings
0% found this document useful
Interview with Milan Guenther, Co-Author of Enterprise Design Patterns and President at Intersection Group and Marc Lankhorst, Chief Technology Evangelist at BiZZdesign
Podcast episode
Interview with Milan Guenther, Co-Author of Enterprise Design Patterns and President at Intersection Group and Marc Lankhorst, Chief Technology Evangelist at BiZZdesign
byEnterprise Architecture Podcast
0 ratings
0% found this document useful
Data Modeling That Evolves With Your Business Using Data Vault - Episode 119: An interview about the data vault method of data modeling and how it simplifies integrating the evolving data sources that you are dealing with in your enterprise data warehouse
Podcast episode
Data Modeling That Evolves With Your Business Using Data Vault - Episode 119: An interview about the data vault method of data modeling and how it simplifies integrating the evolving data sources that you are dealing with in your enterprise data warehouse
byData Engineering Podcast
0 ratings
0% found this document useful
Shining A Light on Shadow IT In Data And Analytics - Episode 121: A conversation about the conflicts that lead to shadow IT in data and analytics projects and how to work toward resolving those tensions.
Podcast episode
Shining A Light on Shadow IT In Data And Analytics - Episode 121: A conversation about the conflicts that lead to shadow IT in data and analytics projects and how to work toward resolving those tensions.
byData Engineering Podcast
0 ratings
0% found this document useful
The Top Trends in 2022 for Data Leaders from DataRobot, Databricks, and Google: On this episode of The Data Chief, top data and analytics executives from DataRobot, Databricks, and Google join Cindi to discuss trends shaping the future of analytics and provide bold predictions for the upcoming year.
Podcast episode
The Top Trends in 2022 for Data Leaders from DataRobot, Databricks, and Google: On this episode of The Data Chief, top data and analytics executives from DataRobot, Databricks, and Google join Cindi to discuss trends shaping the future of analytics and provide bold predictions for the upcoming year.
byThe Data Chief
0 ratings
0% found this document useful
Build Your Data Analytics Like An Engineer - Episode 81: An interview about how dbt enables your data teams to build better analytics in your data warehouse
Podcast episode
Build Your Data Analytics Like An Engineer - Episode 81: An interview about how dbt enables your data teams to build better analytics in your data warehouse
byData Engineering Podcast
0 ratings
0% found this document useful
An Agile Approach To Master Data Management with Mark Marinelli - Episode 46: Building A Master Data Catalog Using Machine Learning (Interview)
Podcast episode
An Agile Approach To Master Data Management with Mark Marinelli - Episode 46: Building A Master Data Catalog Using Machine Learning (Interview)
byData Engineering Podcast
100%
100% found this document useful
Using FoundationDB As The Bedrock For Your Distributed Systems - Episode 80: An interview about the FoundationDB project and how it simplifies the work of building custom distributed systems applications
Podcast episode
Using FoundationDB As The Bedrock For Your Distributed Systems - Episode 80: An interview about the FoundationDB project and how it simplifies the work of building custom distributed systems applications
byData Engineering Podcast
0 ratings
0% found this document useful
Solving Data Lineage Tracking And Data Discovery At WeWork - Episode 111: An interview about how the Marquez platform for metadata management powers data lineage tracking, data discovery, and health reporting at WeWork
Podcast episode
Solving Data Lineage Tracking And Data Discovery At WeWork - Episode 111: An interview about how the Marquez platform for metadata management powers data lineage tracking, data discovery, and health reporting at WeWork
byData Engineering Podcast
0 ratings
0% found this document useful
Exploring The TileDB Universal Data Engine - Episode 146: An interview with the creator of TileDB about building a universal data engine to support cross-domain collaboration and reduce the burden of data management.
Podcast episode
Exploring The TileDB Universal Data Engine - Episode 146: An interview with the creator of TileDB about building a universal data engine to support cross-domain collaboration and reduce the burden of data management.
byData Engineering Podcast
0 ratings
0% found this document useful
Managing non-REST APIs like GraphQL and gRPC with Nandan Sridhar and David Feuer: Alexandrina Garcia-Verdin and Stephanie Wong host this week's episode all about managing non-REST APIs.
Podcast episode
Managing non-REST APIs like GraphQL and gRPC with Nandan Sridhar and David Feuer: Alexandrina Garcia-Verdin and Stephanie Wong host this week's episode all about managing non-REST APIs.
byGoogle Cloud Platform Podcast
0 ratings
0% found this document useful
84: How to Transform your Analytics Dept into a Profit Center w/ Derrick Louis: Data is a company’s more important asset, so why aren’t more companies turning their analytics departments into profit centers? In today’s episode, my guest took this concept to the next level by bringing a dedicated accountant on staff to...
Podcast episode
84: How to Transform your Analytics Dept into a Profit Center w/ Derrick Louis: Data is a company’s more important asset, so why aren’t more companies turning their analytics departments into profit centers? In today’s episode, my guest took this concept to the next level by bringing a dedicated accountant on staff to...
byAnalytics on Fire
0 ratings
0% found this document useful

Skip carousel

Understanding ELT & ETL
Techfastly
Article
Understanding ELT & ETL
Apr 1, 2021
8 min read
Buying The Tool
Techfastly
Article
Buying The Tool
Apr 1, 2021
3 min read
Data Fabric
PC Pro Magazine
Article
Data Fabric
Aug 13, 2020
3 min read
Microsoft Viva Is Teams’ Attempt To Replace Your Company’s Intranet
Tech Advisor
Article
Microsoft Viva Is Teams’ Attempt To Replace Your Company’s Intranet
Mar 3, 2021
3 min read
Why Is ELT Better For Cloud Data Warehousing?
Techfastly
Article
Why Is ELT Better For Cloud Data Warehousing?
Apr 1, 2021
2 min read
The Future Of The Data Economy
The European Business Review
Article
The Future Of The Data Economy
Jun 1, 2022
6 min read
Saxo Bank And Thoughtworks: Enabling Data Democratization At A Global Investment Bank
Business Today
Article
Saxo Bank And Thoughtworks: Enabling Data Democratization At A Global Investment Bank
Jan 20, 2023
2 min read
Building Trends, Building Momentum
Facility Management
Article
Building Trends, Building Momentum
Oct 14, 2019
3 min read
Extending The Time Equation
The European Business Review
Article
Extending The Time Equation
Jul 26, 2021
4 min read
Facilities Systems
Facility Management
Article
Facilities Systems
Oct 21, 2018
5 min read
Inform And Enhance Your Business With Open Data
PC Pro Magazine
Article
Inform And Enhance Your Business With Open Data
Jun 10, 2021
7 min read
What is ELT?
Techfastly
Article
What is ELT?
Apr 1, 2021
It stands for extract, load, and transform- the processes a data pipeline uses for replicating the data from a source system into a target system such as a cloud data warehouse. 1. Extraction is the first step in which data is copied from the source
6 min read
How Technology Commons Revolutionise Industry Foundations
The European Business Review
Article
How Technology Commons Revolutionise Industry Foundations
Feb 11, 2022
9 min read
Mining Actionable Information with Smart Capture
The European Business Review
Article
Mining Actionable Information with Smart Capture
May 22, 2018
4 min read
Data-driven Decision Making That Uses Data, Mind And Heart
The European Business Review
Article
Data-driven Decision Making That Uses Data, Mind And Heart
Jan 31, 2020
14 min read
Cloudy With No Chance Of Erp
Architectural Review Asia Pacific
Article
Cloudy With No Chance Of Erp
Nov 11, 2019
ERP (enterprise resource planning) was born around the time the first ‘[Something] for Dummies’ book was published*. It’s typically inflexible, uncompromising software designed for large businesses, like banks, large corporations, manufacturing and s
2 min read
TimeXtender HELPS EUROPEAN COMPANIES FROM NUMEROUS INDUSTRIES MANAGE THEIR DATA
The European Business Review
Article
TimeXtender HELPS EUROPEAN COMPANIES FROM NUMEROUS INDUSTRIES MANAGE THEIR DATA
May 25, 2021
3 min read
Arnab PANDEY
Techfastly
Article
Arnab PANDEY
Apr 1, 2021
11 min read
Integrated Workplace Management Systems
Facility Management
Article
Integrated Workplace Management Systems
Dec 23, 2018
Property and facilities management are data-rich operating worlds. This is becoming even more complex as the Internet of Things (IoT) provides the capability to imbed sensors and diagnostic tools to monitor the use and performance of everything in re
4 min read
Empowering Small And Medium Enterprises Through The Synergy Of AI And Blockchain
The European Business Review
Article
Empowering Small And Medium Enterprises Through The Synergy Of AI And Blockchain
Jan 25, 2021
10 min read
On Cloud Nine
Business Today
Article
On Cloud Nine
Jul 8, 2022
8 min read
Five Technology Tips For Dark Factories Installation
Techfastly
Article
Five Technology Tips For Dark Factories Installation
Jun 1, 2021
6 min read
Real World Computing
PC Pro Magazine
Article
Real World Computing
May 11, 2023
Migrating to Azure isn’t necessarily the toughest part of a successful cloud migration, explains our guest columnist Many organisations succeed at deploying resources in or migrating to Microsoft Azure. But many of those same organisations fail to en
6 min read
CONNECTING THE UNCONNECTED IN THE AUTOMOTIVE INDUSTRY Four Ecosystems That Are Reshaping Automotive Industry Collaborations
The European Business Review
Article
CONNECTING THE UNCONNECTED IN THE AUTOMOTIVE INDUSTRY Four Ecosystems That Are Reshaping Automotive Industry Collaborations
Feb 1, 2023
7 min read
Switch From Zoom How To Run Your Own Videoconferencing Platform
PC Pro Magazine
Article
Switch From Zoom How To Run Your Own Videoconferencing Platform
Sep 9, 2021
7 min read
Doing Data Better: What You Should Know
Facility Management
Article
Doing Data Better: What You Should Know
Jun 2, 2022
3 min read
Doing Data Better: What You Should Know
Facility Management
Article
Doing Data Better: What You Should Know
Jun 2, 2022
3 min read
Leadership Forum: Making Digital Transformation A Reality
Rotman Management
Article
Leadership Forum: Making Digital Transformation A Reality
Jan 1, 2018
Glenda Crisp Senior Vice President and Chief Data Officer, TD Bank Group + Connie Bonello Associate Partner, Financial Services, IBM Canada IN MOST OF TODAY’S ORGANIZATIONS, data underpins every transaction, operation and interaction. And yet, the ab
8 min read
How European Companies Can Use The Cloud To Increase Their Competitiveness
The European Business Review
Article
How European Companies Can Use The Cloud To Increase Their Competitiveness
Nov 25, 2021
5 min read
The Rocky Road To Digitalization Success
The European Business Review
Article
The Rocky Road To Digitalization Success
Nov 30, 2020
7 min read

Related categories

Skip carousel

Reviews for Introduction to Data Platforms

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Introduction to Data Platforms - Anthony David Giordano

Title Page

First Edition

Fulton Books

Meadville, PA

Published by Fulton Books 2022

ISBN 979-8-88505-386-0 (paperback)

ISBN 979-8-88505-387-7 (digital)

Printed in the United States of America

I would like to dedicate this book to my daughters—Katie and Kelsie; they teach me something new and wonderful every day.

Preface

Acknowledgments

Introduction: The Rise of the Data Platform

The need for a data platform

The driver of a new approach: strategic inflexibility

The purpose of a book on data platforms

A blueprint for a data platform

Part 1: The Evolution of the Data Platform

Chapter 1: What Is a Data Platform?

The business drivers for a data platform

Definitions of data platform

Reasons why organizations have not built a data platform

Chapter 2: The Evolution of Use Cases for Data

The first evolution: the transactional data era

The second evolution: the data-warehousing era

The third evolution: the anti-data-warehouse era—the data lake era

Three-one evolution: digital data—the API-friendly data hub

Operational data: infusing AI into operational processes

A comprehensive, integrated approach: the data platform

Part 2: Capabilities of a Data Platform

Chapter 3: An Approach for a Data Platform

Overview of a reference architecture

Data platform architectures today

Data Fabric

Data Mesh

Detailed view of the data-platform reference architecture

Intelligent-Integration Capabilities

Data-Marketplace Capabilities

Insights Capabilities

Digital-Orchestration Capabilities

Experience Capabilities

Scaling the data platform horizontally and vertically

Horizontal Use Cases

Vertical Use Cases

Scaling a data platform across a multicloud environment

Chapter 4: The Intelligent-Integration Capability

An intelligent-integration processing framework

The intelligence in the integration process

Ingestion services

Batch Ingestion Services

Real-Time Ingestion Services

Profiling and Metadata-Capture Services

Data-quality services

Master-data management services

Curation services

Publish services

Management support services

Configuring an intelligent-integration environment

Chapter 5: The Data Marketplace

Raw layer

Conform layer

Consumption layer

Physical vs. Virtual Layer

Chapter 6: The Other Data-Platform Components

Insights Components

Data-Visualization Layer

Data-Science Predictive-Modeling Layer

Detailed view of the digital orchestration capabilities

Detailed view of the experience component

Part 3: Implementing a Data Platform

Chapter 7: How to Build a Data Platform

The need for a new approach for data platforms

Configure and evolve versus waterfall

Overview of a data-platform methodology

Evolution vs. Manage: Data Ops

Insights migration approaches

Chapter 8: Data-Platform Use Cases

Case study 1: a digital transformation of a retail bank

Case study 2: a data science and data governance transformation of a pharmaceutical company

Pharmaceutical Company

Chapter 9: Data-Platform Cloud Implementations

Detailed review of AWS data-platform technologies

Detailed review of Microsoft Azure’s data-platform technologies

Detailed review of Google Cloud data-platform technologies

Detailed review of IBM’s data-platform technologies

Commercial data platforms

C3.ai

Palantir

Other notable cloud-based data technologies

Snowflake

Databricks

Best practices on data-platform cloud implementations

Chapter 10: An Operating Model for a Modern Data Platform

The impact of the enterprise organizational structure

Primary Functions of a Data Organization

Data-Platform-Architecture Management Services

Information Governance Services

Data Development and Evolution Services

The Need for Change Management

Afterword

PREFACE

This text provides an overview for information technology executives, chief data officers, and data practitioners on a detailed review of what a data platform is, with the benefits and reasons why they should seriously consider migrating their current data estate to one. Throughout the text, there will be case studies for each of the topics on designing, implementing, managing, and evolving a data platform.

The text starts with an explanation of how the use cases for data have evolved over the past twenty years, starting with transactional data design to simple business-intelligence (BI) reporting and eventually evolving into today’s multipurpose, multi-use-case, real-time data environments instantiated as data platforms. These use cases include traditional reporting (it’s not going away), data visualization, data science, digital, and operational (integrated with ML/AI capabilities). It illustrates how the architectures for data have evolved over the past twenty years into next-generation concepts that have allowed a greater use of data that is more strategic and integral in the digital world, in which we are now doing business.

The text covers how information architecture has evolved from its early days of simple transactional concepts to the current focus of data fabric and data mesh. The text covers in detail the core layers or components of a data platform and how data is ingested, qualified, curated, and conformed into both enterprise and application layers, which create multiuse data environments that reduce redundancy and cost while ensuring flexibility. The text brokers a pragmatic conversation on when to use enterprise versus the application of data layers in a data-platform environment. In covering data-fabric concepts, it covers the benefits and cost of when to physicalize and when to virtualize data. The book covers the essential nondata layers or capabilities of a data platform that illustrate how to integrate a data platform into a broader digital ecosystem and how to engineer it to drive value out of it for each of the multiple use cases.

It will review commercial data technologies including the cloud vendors’ native technological approaches for a data platform, which include a conversation on how to best migrate your current data estate to a data platform. Finally, it covers how to create a data organization to deploy, sustain, and evolve a modern data platform.

Intended audience

This text serves many different audiences. It can be used by experienced information management executives and chief data officers for a better understanding of the business case for a data platform or simply present one with the best practices for blueprinting, engineering, implementing, populating, and operating a data platform. The intended audiences include the following:

chief information and technology officers

chief data officers

data and analytic consultants

data solution architects and data engineers

program/project managers

other information management practitioners

Scope of the text

This book focuses on explaining what a data platform is, what value it provides, how it is engineered, and how to deploy a data platform and a support organization.

With that goal in mind, An Introduction to Data Platforms

reviews the current requirements for data in the digital age and quantifies the use cases;

discusses the evolution of data over the past twenty years, which is a core driver of the modern data platforms;

defines what a data platform is and the architectural components and layers of a data platform;

provides the architectural layers or capabilities of a data platform;

reviews cloud and commercial software vendors that populate the data-platform space;

provides a step-by-step approach to engineering, deploying, supporting, and evolving a data-platform environment;

provides a step-by-step approach to migrating legacy data warehouses, data marts, and data lakes/sandboxes to a data platform; and

reviews organizational structures for managing data-platform environments.

ACKNOWLEDGMENTS

The art and science required for a data platform in the digital age requires a significant amount of experience in the field and countless hours of configuring data technologies on multiple clouds into easy-to-use capabilities. The architectural principles and data-management processes defined in this book are a result of actual project work that is a product of those countless hours of implementing architectures and hardening those architectural concepts and processes that, today, run in all evolved data platforms in our organizations. These efforts can only be performed in collaboration with knowledgeable, dedicated, and experienced practitioners. In particular, I would like to acknowledge Mehdi Charafeddine, Glenn Finch, Jay Houghton, Ron Koch, and Ron Shelby—all of whom played an integral part in the development of this book.

INTRODUCTION:

The Rise of the Data Platform

The need for a data platform

They say that change is inevitable, and it is. Some changes are visceral and are so revolutionary that everyone instantly sees it, recognizes it, and embraces it. Others are so subtle that when they occur, only the savvy see it and exploit it in order to take the competitive advantage that the change provides. Data platforms are that next quiet evolution in technology that will provide greater strategic flexibility with your data and better data governance and quality in a more cost-effective manner. A data platform is a common data environment that provisions multiple business use cases.

The driver of a new approach: strategic inflexibility

The era of the data platform has started in an already very mature data-management world. There are very few organizations today that do not have a host of data technologies in their environments. In fact, that is the problem: the era of the greenfield data environment passed twenty years ago, if not longer. Today, most organizations have multiple nonintegrated legacy data warehouses and marts, Hadoop clusters / data lakes, or NoSQL stores all performing some function in their environment but at a significant cost in terms of data integration and duplication. While all these technologies perform their specific purpose, in aggregate, they provide an inflexible, expensive infrastructure that tends to be difficult to extend and is poorly understood. Inevitably, there is always a specter of data-quality issues in these environments that always results when there are multiple data stores with multiple data-integration environments. This proliferation of data technologies and approaches has created significant challenges beyond just the data-quality issue. These symptoms include the following:

Long, costly data science modeling timelines. Finding the right training data then crafting it into a usable data set takes up 75–80 percent of the time for a data science experiment.

Lack of trusted data and metrics. Organizations often are paralyzed with the issue of having multiple reports with the same data and different totals, resulting in the data-quality issues mentioned earlier.

Lack of consistent metadata and reusable components. The good news is that many organizations now have a rudimentary metadata catalog. The bad news is that often, it is for perhaps one part of one data warehouse in their data portfolio. Very few organizations have metadata cataloguing capabilities for all their data technologies in their portfolio. The ability to capture all metadata on ingestion that is maintained and, most importantly, reused is a function that most organizations have not implemented and matured. Understand what data you have cataloged, where it is, and its definitions in the increasingly heterogeneous, hybrid, cloud-based data landscape. Metadata and model-management reusability concepts are particularly true in the data science space. Data science is a capability that has matured beyond the artisan phase, where every model needs to be developed from the ground up. Organizations that have built processes to develop an assets-based approach with reusable components are winning in the field. Having prebuilt data science blueprints and in-house and commercial algorithm libraries are providing many organizations the ability to increase their time-to-value and providing them with a competitive advantage.

Inability to integrate with digital channels. As many organizations continue in their digital transformation, they are finding that their legacy data environments are not agile and flexible enough to enable their organizational data for digital channels. Digital channels require real-time provisioning, decision-making, and action. Old batch-data warehouses that are producing daily reports and query environments are simply not engineered for the flexibility, speed, and throughput necessary for today’s digital environments. Most digital architectures of today portray data hubs with both batch and real-time ingestion and API (applications programming interface) layers that can orchestrate data in the digital channels.

With all these challenges, many are turning to the cloud to solve these issues. Many organizations are expecting the cloud to be their silver bullet: just move all these environments to the public cloud, and their cost, quality, and management issues will all be solved. The fact is that moving all these different environments to the cloud will not reduce their cost but very likely triple their cost. The reason why many of these organizations will find the cloud is most likely going to increase their costs is that migrating a Teradata data warehouse and Hadoop data lake means moving all the data structures, data, and data-integration processes to the that target cloud environment. Moving data to the cloud is not cheap. Unlike most organizations that do not truly manage next traffic from a cost perspective, cloud vendors do. In an on-premises environment that sources data from the same customer source system to three data wares, a Hadoop data lake for data science and a Cassandra-based digital environment has to pay for all of these data movements to the cloud.

The purpose of a book on data platforms

The purpose of this book is to define what a data platform is, what the components or layers in a data platform are, what the technologies and processes are for each layer, and the supporting organizational structure needed to sustain and evolve a data platform. It will cover the value a data platform will provide in comparison to a collection of data warehouses, data marts, and data lakes. It will start with a section on the evolution of the data platform and on how the different use cases for data have evolved transactional and analytics architectures over time with disruptive changes to the modern data platform. This includes a review of the influences of early transactional and analytic processing, which are still critical use cases and design patterns in the data platform. It reviews the anti-data-warehouse era, where organizations used Hadoop clusters to build data lakes along with data science sandboxes. The rise of digital processing created a whole new use case of sending events (both transactional and nontransactional) bidirectionally on digital channels using AI-embedded models to predict or recommend next-step activities. These required data technologies were engineered for those stateless use cases and are easily enabled with stateless APIs such as REST.

A blueprint for a data platform

The growing set of use cases for data and its increased importance in digital channels has generated the need for an architectural approach that provides commonality and consistency at an enterprise level but with the flexibility to easily enable components of data and analytics into digital channels via APIs. This need has generated multilayered blueprints, or, as referred to in the information technology community, a reference architecture. There are many reference architectures being discussed for data in the industry and are often referred to as a data fabric or data mesh. For this book, it will be referred to as architecture for a data platform. This reference architecture is designed to address the multiple use cases for data in the digital age, including digital, operational, and analytic data use cases, where each use case can stand independently or be integrated into a broader data framework. It will cover the following component layers:

Intelligent-integration capabilities. This covers the types of data ingested in a modern data platform, including batch and real-time technologies with automated AI-infused profiling capabilities. This includes a review on the expanded need for curated data beyond the traditional transformations in the traditional process of ETL (extract, transform, and load). Integration is now intelligent with AI (artificial intelligence) capabilities assisting in the curation processes to conform, calculate, and aggregate data based on use cases. It also covers AI-infused data quality, master-data management, data science sandbox engineering, and bidirectional digital interactions.

The data marketplace. This section will address the different data designs and technological approaches needed to meet the multiple use cases for data in the digital environment. It will also address the recent trend to discuss the opportunity and the reality of data virtualization.

Insights. The insights capability derives business value from the data marketplace. It develops different types of insights based on need, and this is to guide business decisions using data visualization and standard reporting, both through data science modeling. This is the interface where data is transformed into usable information. It takes a pragmatic look at the shift from thousands of BI (business intelligence) reports to modern data visualization tools for the digital age, and how the shift to embedding predictive models into digital channels, which creates intelligent workflow, is the next evolution of insights.

Digital orchestration component. This digital integration capability includes topics such a as APIs, that connect the data platform into digital channels and applications. It includes a review of integrating AI and ML applications in open-source capabilities such as Kubeflow as well as event-based interactions with nontraditional data sources such as IoT (Internet of Things) edge-based devices.

Experience component. This component combines insights, data, and orchestration capabilities into an organization’s digital channels. Examples of the experience layer are programmatic marketing (inbound and outbound) and e-commerce interactions.

This book also covers commercial data-platform technologies and cloud vendors such as AWS, Microsoft Azure, Google Cloud, and IBM’s data-platform offers.

It provides approaches and techniques on how to build out a data-platform environment, both greenfield and legacy data environment. Since most organizations today have an existing analytics data environment, it provides a point of view on how to migrate legacy data environment into a modern data platform.

Finally, the book covers the types of data-management operating models and organizational roles that are needed to build, sustain, and evolve a modern data-platform environment that address those many use cases of data needed in a digital organization.

PART 1

The Evolution of the Data Platform

What Is a Data Platform?

The Evolution of the Use Cases for Data

CHAPTER 1 What Is a Data Platform?

The first section of this text, The Evolution of a Data Platform, sets the stage by reviewing the evolution of data usage, which is driving the need for a new way to provision and store data, such as the data platform, for today’s digital environment. It covers how the industry has progressed in its use of data from static reports to real-time decision-making. It analyzes why those organizations that have chosen to not take advantage of this new capability will be at a competitive disadvantage in terms of strategic flexibility, digital enablement, and cost management. Next, it builds the technical case for a data platform by delving into earlier versions of data environments such as the data warehouse, data mart, data lake, and data science sandboxes. The book then reviews the architectural evolution of data architectures. It covers the certain business and technical problems they solved and those they created that drove the need for the next evolution. This evolution of capabilities and constraints has led to the concept of the data platform.

Chapter 1, What Is a Data Platform? provides the technical and business case for a data platform based on the evolving needs and their use cases for a data platform based on disruptive forces such as digital and artificial intelligence (AI). It will define what a data platform is and the risks of not having one. It will also cover reasons why organizations have not built a data platform.

The business drivers for a data platform

The business need for a data platform is based on the new uses for data, centered on three main factors: digital transformation, the advent of artificial intelligence, and the mass migration to the cloud. The discussion on data today first starts with a conversation on the digital transformation. Digital transformation is not new. In fact, it can be accurately stated that it is at least twenty-five years old. Early-adopter organizations that started as or moved to digital have gained a significant competitive advantage in their industries. Meanwhile, the rest of the world has recognized the imperative of going digital in the past ten years and has started transformation programs of some sort, trying to catch up. Social media, digital marketing, and e-commerce are all visceral aspects of the world’s pivot to digital. The COVID-19 pandemic has accelerated the world’s economy into those digital channels of working and purchasing as the stay-at-home orders descended from national to local governments. The fuel for this digital revolution is data. Every event on a digital channel is an opportunity to quantify and analyze behaviors that will drive usage, cost savings, or additional revenue.

Digital is not the only driver for a data platform; artificial intelligence (AI) / machine learning is every bit as disruptive as digital in its use of data and is a key enabler for real-time decision-making in digital channels. To develop these AI processes, the data scientist requires data science sandboxes for training and test data.

The final driver for a data platform is the cloud. The promise of lower cost and less management is driving organizations to plan and move their data estates to the cloud with legacy- and digitally driven use cases.

Figure 1.1. The multiple use cases for data.

A data platform provides a multipurpose environment to provision and provide data for all these use cases in a common, cost-effective manner that does not require massive duplication, ensures higher quality data, and reduces operational costs. The need for a common data environment to meet these use cases becomes readily apparent when one considers the technical and business drivers. One of the many (and maybe not the best) reasons

Enjoying the preview?

Page 1 of 1

Introduction to Data Platforms: How to leverage data fabric concepts to engineer your organization's data for today's cloud-based digital world

About this ebook

Anthony David Giordano

Related authors

Related to Introduction to Data Platforms

Related ebooks

Computers For You

Related podcast episodes

Related articles

Related categories

Reviews for Introduction to Data Platforms

What did you think?

Book preview

Introduction to Data Platforms - Anthony David Giordano

CONTENTS

PREFACE

ACKNOWLEDGMENTS

INTRODUCTION:

CHAPTER 1

What Is a Data Platform?