The Modern Data Warehouse in Azure: Building with Speed and Agility on Microsoft’s Cloud Platform

Ebook437 pages4 hours

The Modern Data Warehouse in Azure: Building with Speed and Agility on Microsoft’s Cloud Platform

Name: The Modern Data Warehouse in Azure: Building with Speed and Agility on Microsoft’s Cloud Platform
Author: Matt How
ISBN: 9781484258231

By Matt How

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Build a modern data warehouse on Microsoft's Azure Platform that is flexible, adaptable, and fast—fast to snap together, reconfigure, and fast at delivering results to drive good decision making in your business.

Gone are the days when data warehousing projects were lumbering dinosaur-style projects that took forever, drained budgets, and produced business intelligence (BI) just in time to tell you what to do 10 years ago. This book will show you how to assemble a data warehouse solution like a jigsaw puzzle by connecting specific Azure technologies that address your own needs and bring value to your business. You will see how to implement a range of architectural patterns using batches, events, and streams for both data lake technology and SQL databases. You will discover how to manage metadata and automation to accelerate the development of your warehouse while establishing resilience at every level. And you will know how to feed downstream analytic solutions such as Power BI and Azure Analysis Services to empower data-driven decision making that drives your business forward toward a pattern of success.
This book teaches you how to employ the Azure platform in a strategy to dramatically improve implementation speed and flexibility of data warehousing systems. You will know how to make correct decisions in design, architecture, and infrastructure such as choosing which type of SQL engine (from at least three options) best meets the needs of your organization. You also will learn about ETL/ELT structure and the vast number of accelerators and patterns that can be used to aid implementation and ensure resilience. Data warehouse developers and architects will find this book a tremendous resource for moving their skills into the future through cloud-based implementations.

What You Will Learn

Choose the appropriate Azure SQL engine for implementing a given data warehouse
Develop smart, reusable ETL/ELT processes that are resilient and easily maintained
Automate mundane development tasks through tools such as PowerShell
Ensure consistency of data by creating and enforcing data contracts
Explore streaming and event-driven architectures for data ingestion
Create advanced staging layers using Azure Data Lake Gen 2 to feed your data warehouse

Who This Book Is For
Data warehouse or ETL/ELT developers who wish to implement a data warehouse project in the Azure cloud, and developers currently working in on-premise environments who want to move to the cloud, and for developers with Azure experience looking to tighten up their implementation and consolidate their knowledge

Skip carousel

LanguageEnglish

PublisherApress

Release dateJun 15, 2020

ISBN9781484258231

Author

Matt How

Related authors

Skip carousel

Related to The Modern Data Warehouse in Azure

Related ebooks

Skip carousel

SQL: For Beginners: Your Guide To Easily Learn SQL Programming in 7 Days
Ebook
SQL: For Beginners: Your Guide To Easily Learn SQL Programming in 7 Days
byi Code Academy
Rating: 5 out of 5 stars
5/5
Mastering Azure Synapse Analytics: Learn how to develop end-to-end analytics solutions with Azure Synapse Analytics (English Edition)
Ebook
Mastering Azure Synapse Analytics: Learn how to develop end-to-end analytics solutions with Azure Synapse Analytics (English Edition)
byDebananda Ghosh
Rating: 0 out of 5 stars
0 ratings
SQL Server Big Data Clusters: Data Virtualization, Data Lake, and AI Platform
Ebook
SQL Server Big Data Clusters: Data Virtualization, Data Lake, and AI Platform
byBenjamin Weissman
Rating: 0 out of 5 stars
0 ratings
PolyBase Revealed: Data Virtualization with SQL Server, Hadoop, Apache Spark, and Beyond
Ebook
PolyBase Revealed: Data Virtualization with SQL Server, Hadoop, Apache Spark, and Beyond
byKevin Feasel
Rating: 0 out of 5 stars
0 ratings
Azure Data Factory by Example: Practical Implementation for Data Engineers
Ebook
Azure Data Factory by Example: Practical Implementation for Data Engineers
byRichard Swinbank
Rating: 0 out of 5 stars
0 ratings
SQL CODING FOR BEGINNERS: Step-by-Step Beginner's Guide to Mastering SQL Programming and Coding (2022 Crash Course for Newbies)
Ebook
SQL CODING FOR BEGINNERS: Step-by-Step Beginner's Guide to Mastering SQL Programming and Coding (2022 Crash Course for Newbies)
byFawn Watson
Rating: 0 out of 5 stars
0 ratings
Hands-on Data Virtualization with Polybase: Administer Big Data, SQL Queries and Data Accessibility Across Hadoop, Azure, Spark, Cassandra, MongoDB, CosmosDB, MySQL and PostgreSQL (English Edition)
Ebook
Hands-on Data Virtualization with Polybase: Administer Big Data, SQL Queries and Data Accessibility Across Hadoop, Azure, Spark, Cassandra, MongoDB, CosmosDB, MySQL and PostgreSQL (English Edition)
byPablo Alejandro Echeverria Barrios
Rating: 0 out of 5 stars
0 ratings
Cosmos DB for MongoDB Developers: Migrating to Azure Cosmos DB and Using the MongoDB API
Ebook
Cosmos DB for MongoDB Developers: Migrating to Azure Cosmos DB and Using the MongoDB API
byManish Sharma
Rating: 0 out of 5 stars
0 ratings
Sql : The Ultimate Beginner to Advanced Guide To Master SQL Quickly with Step-by-Step Practical Examples
Ebook
Sql : The Ultimate Beginner to Advanced Guide To Master SQL Quickly with Step-by-Step Practical Examples
byMark Robinson
Rating: 0 out of 5 stars
0 ratings
Migrating to the Cloud: Oracle Client/Server Modernization
Ebook
Migrating to the Cloud: Oracle Client/Server Modernization
byTom Laszewski
Rating: 0 out of 5 stars
0 ratings
Cloud Data Architectures Demystified: Gain the expertise to build Cloud data solutions as per the organization's needs (English Edition)
Ebook
Cloud Data Architectures Demystified: Gain the expertise to build Cloud data solutions as per the organization's needs (English Edition)
byAshok Boddeda
Rating: 0 out of 5 stars
0 ratings
Instant Oracle GoldenGate
Ebook
Instant Oracle GoldenGate
byTony Bruzzese
Rating: 0 out of 5 stars
0 ratings
Understanding Azure Data Factory: Operationalizing Big Data and Advanced Analytics Solutions
Ebook
Understanding Azure Data Factory: Operationalizing Big Data and Advanced Analytics Solutions
bySudhir Rawat
Rating: 0 out of 5 stars
0 ratings
Hands-on Cloud Analytics with Microsoft Azure Stack
Ebook
Hands-on Cloud Analytics with Microsoft Azure Stack
byPrashila Naik
Rating: 0 out of 5 stars
0 ratings
Implementing Power BI in the Enterprise
Ebook
Implementing Power BI in the Enterprise
byGreg Low
Rating: 5 out of 5 stars
5/5
Hands-On Azure Data Platform: Building Scalable Enterprise-Grade Relational and Non-Relational database Systems with Azure Data Services
Ebook
Hands-On Azure Data Platform: Building Scalable Enterprise-Grade Relational and Non-Relational database Systems with Azure Data Services
bySagar Lad
Rating: 0 out of 5 stars
0 ratings
IaaS Mastery: Your All-In-One Guide To AWS, GCE, Microsoft Azure, And IBM Cloud
Ebook
IaaS Mastery: Your All-In-One Guide To AWS, GCE, Microsoft Azure, And IBM Cloud
byRob Botwright
Rating: 0 out of 5 stars
0 ratings
Getting Started with Hazelcast
Ebook
Getting Started with Hazelcast
byMat Johns
Rating: 0 out of 5 stars
0 ratings
SQL 101 Crash Course: Comprehensive Guide to SQL Fundamentals and Practical Applications
Ebook
SQL 101 Crash Course: Comprehensive Guide to SQL Fundamentals and Practical Applications
byEmrys Callahan
Rating: 5 out of 5 stars
5/5
Practical Azure SQL Database for Modern Developers: Building Applications in the Microsoft Cloud
Ebook
Practical Azure SQL Database for Modern Developers: Building Applications in the Microsoft Cloud
byDavide Mauri
Rating: 0 out of 5 stars
0 ratings
Serverless Beyond the Buzzword: What Can Serverless Architecture Do for You?
Ebook
Serverless Beyond the Buzzword: What Can Serverless Architecture Do for You?
byThomas Smart
Rating: 0 out of 5 stars
0 ratings
Windows Azure Hybrid Cloud
Ebook
Windows Azure Hybrid Cloud
byDanny Garber
Rating: 0 out of 5 stars
0 ratings
Creating your MySQL Database: Practical Design Tips and Techniques
Ebook
Creating your MySQL Database: Practical Design Tips and Techniques
byMarc Delisle
Rating: 3 out of 5 stars
3/5
Oracle Database Transactions and Locking Revealed: Building High Performance Through Concurrency
Ebook
Oracle Database Transactions and Locking Revealed: Building High Performance Through Concurrency
byDarl Kuhn
Rating: 0 out of 5 stars
0 ratings
Cloud Native AI and Machine Learning on AWS: Use SageMaker for building ML models, automate MLOps, and take advantage of numerous AWS AI services (English Edition)
Ebook
Cloud Native AI and Machine Learning on AWS: Use SageMaker for building ML models, automate MLOps, and take advantage of numerous AWS AI services (English Edition)
byPremkumar Rangarajan
Rating: 0 out of 5 stars
0 ratings
Azure Arc-Enabled Data Services Revealed: Early First Edition Based on Public Preview
Ebook
Azure Arc-Enabled Data Services Revealed: Early First Edition Based on Public Preview
byBen Weissman
Rating: 0 out of 5 stars
0 ratings
SQLite Database Programming for Xamarin: Cross-platform C# database development for iOS and Android using SQLite.XM
Ebook
SQLite Database Programming for Xamarin: Cross-platform C# database development for iOS and Android using SQLite.XM
byAnthony Serpico
Rating: 0 out of 5 stars
0 ratings
The Definitive Guide to AWS Infrastructure Automation: Craft Infrastructure-as-Code Solutions
Ebook
The Definitive Guide to AWS Infrastructure Automation: Craft Infrastructure-as-Code Solutions
byBradley Campbell
Rating: 0 out of 5 stars
0 ratings
Dynamic SQL: Applications, Performance, and Security in Microsoft SQL Server
Ebook
Dynamic SQL: Applications, Performance, and Security in Microsoft SQL Server
byEdward Pollack
Rating: 0 out of 5 stars
0 ratings
Practical Full Stack Machine Learning: A Guide to Build Reliable, Reusable, and Production-Ready Full Stack ML Solutions
Ebook
Practical Full Stack Machine Learning: A Guide to Build Reliable, Reusable, and Production-Ready Full Stack ML Solutions
byAlok Kumar
Rating: 0 out of 5 stars
0 ratings

Programming For You

Skip carousel

Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
Ebook
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
bySteven Cooper
Rating: 4 out of 5 stars
4/5
Python Programming : How to Code Python Fast In Just 24 Hours With 7 Simple Steps
Ebook
Python Programming : How to Code Python Fast In Just 24 Hours With 7 Simple Steps
byJason Scotts
Rating: 4 out of 5 stars
4/5
Python Programming for Beginners: A Comprehensive Crash Course With Practical Exercises to Quickly Learn Coding and Programming for Data Analysis and Machine Learning
Ebook
Python Programming for Beginners: A Comprehensive Crash Course With Practical Exercises to Quickly Learn Coding and Programming for Data Analysis and Machine Learning
byAnthony Adams
Rating: 4 out of 5 stars
4/5
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
Ebook
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
byWalter Shields
Rating: 4 out of 5 stars
4/5
HTML & CSS: Learn the Fundaments in 7 Days
Ebook
HTML & CSS: Learn the Fundaments in 7 Days
byMichael Knapp
Rating: 4 out of 5 stars
4/5
Coding All-in-One For Dummies
Ebook
Coding All-in-One For Dummies
byNikhil Abraham
Rating: 4 out of 5 stars
4/5
Learn to Code. Get a Job. The Ultimate Guide to Learning and Getting Hired as a Developer.
Ebook
Learn to Code. Get a Job. The Ultimate Guide to Learning and Getting Hired as a Developer.
byGwendolyn Faraday
Rating: 5 out of 5 stars
5/5
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
Ebook
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
byArthur T. Brooks
Rating: 0 out of 5 stars
0 ratings
Hacking: Ultimate Beginner's Guide for Computer Hacking in 2018 and Beyond: Hacking in 2018, #1
Ebook
Hacking: Ultimate Beginner's Guide for Computer Hacking in 2018 and Beyond: Hacking in 2018, #1
byDexter Jackson
Rating: 4 out of 5 stars
4/5
The JavaScript Workshop: Learn to develop interactive web applications with clean and maintainable JavaScript code
Ebook
The JavaScript Workshop: Learn to develop interactive web applications with clean and maintainable JavaScript code
byJoseph Labrecque
Rating: 5 out of 5 stars
5/5
PYTHON: Practical Python Programming For Beginners & Experts With Hands-on Project
Ebook
PYTHON: Practical Python Programming For Beginners & Experts With Hands-on Project
byMark Chan
Rating: 5 out of 5 stars
5/5
Grokking Algorithms: An illustrated guide for programmers and other curious people
Ebook
Grokking Algorithms: An illustrated guide for programmers and other curious people
byAditya Bhargava
Rating: 4 out of 5 stars
4/5
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
Ebook
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
byNigel Tillery
Rating: 0 out of 5 stars
0 ratings
Python Programming For Beginners: Learn The Basics Of Python Programming (Python Crash Course, Programming for Dummies)
Ebook
Python Programming For Beginners: Learn The Basics Of Python Programming (Python Crash Course, Programming for Dummies)
byJames Tudor
Rating: 5 out of 5 stars
5/5
SQL All-in-One For Dummies
Ebook
SQL All-in-One For Dummies
byAllen G. Taylor
Rating: 3 out of 5 stars
3/5
Java for Beginners: A Crash Course to Learn Java Programming in 1 Week
Ebook
Java for Beginners: A Crash Course to Learn Java Programming in 1 Week
byBrady Ellison
Rating: 5 out of 5 stars
5/5
Learn PowerShell in a Month of Lunches, Fourth Edition: Covers Windows, Linux, and macOS
Ebook
Learn PowerShell in a Month of Lunches, Fourth Edition: Covers Windows, Linux, and macOS
byTravis Plunk
Rating: 0 out of 5 stars
0 ratings
The Advanced Roblox Coding Book: An Unofficial Guide, Updated Edition: Learn How to Script Games, Code Objects and Settings, and Create Your Own World!
Ebook
The Advanced Roblox Coding Book: An Unofficial Guide, Updated Edition: Learn How to Script Games, Code Objects and Settings, and Create Your Own World!
byHeath Haskins
Rating: 5 out of 5 stars
5/5
CODING FOR ABSOLUTE BEGINNERS: How to Keep Your Data Safe from Hackers by Mastering the Basic Functions of Python, Java, and C++ (2022 Guide for Newbies)
Ebook
CODING FOR ABSOLUTE BEGINNERS: How to Keep Your Data Safe from Hackers by Mastering the Basic Functions of Python, Java, and C++ (2022 Guide for Newbies)
byEric Vargas
Rating: 0 out of 5 stars
0 ratings
Python Projects for Beginners: A Ten-Week Bootcamp Approach to Python Programming
Ebook
Python Projects for Beginners: A Ten-Week Bootcamp Approach to Python Programming
byConnor P. Milliken
Rating: 0 out of 5 stars
0 ratings
The Unofficial Guide to Open Broadcaster Software: OBS: The World's Most Popular Free Live-Streaming Application
Ebook
The Unofficial Guide to Open Broadcaster Software: OBS: The World's Most Popular Free Live-Streaming Application
byPaul Richards
Rating: 0 out of 5 stars
0 ratings
HTML & CSS QuickStart Guide: The Simplified Beginners Guide to Developing a Strong Coding Foundation, Building Responsive Websites, and Mastering the Fundamentals of Modern Web Design
Ebook
HTML & CSS QuickStart Guide: The Simplified Beginners Guide to Developing a Strong Coding Foundation, Building Responsive Websites, and Mastering the Fundamentals of Modern Web Design
byDavid DuRocher
Rating: 4 out of 5 stars
4/5
Pokemon Go: Guide + 20 Tips and Tricks You Must Read Hints, Tricks, Tips, Secrets, Android, iOS
Ebook
Pokemon Go: Guide + 20 Tips and Tricks You Must Read Hints, Tricks, Tips, Secrets, Android, iOS
byGame Guidez
Rating: 5 out of 5 stars
5/5
Teach Yourself C++
Ebook
Teach Yourself C++
byAl Stevens
Rating: 4 out of 5 stars
4/5
The Little SAS Book: A Primer, Sixth Edition
Ebook
The Little SAS Book: A Primer, Sixth Edition
byLora D. Delwiche
Rating: 5 out of 5 stars
5/5
Python: For Beginners A Crash Course Guide To Learn Python in 1 Week
Ebook
Python: For Beginners A Crash Course Guide To Learn Python in 1 Week
byTimothy C. Needham
Rating: 4 out of 5 stars
4/5
Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1
Ebook
Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1
byKevin Clark
Rating: 5 out of 5 stars
5/5
101 Amazing Nintendo NES Facts: Includes facts about the Famicom
Ebook
101 Amazing Nintendo NES Facts: Includes facts about the Famicom
byJimmy Russell
Rating: 4 out of 5 stars
4/5
Microsoft Office 365 Bible: 10:1 Mastery | Excel in Your Profession, Enhance Time Management, and Foster Exceptional Collaboration [III EDITION]: Career Elevator
Ebook
Microsoft Office 365 Bible: 10:1 Mastery | Excel in Your Profession, Enhance Time Management, and Foster Exceptional Collaboration [III EDITION]: Career Elevator
byKevin Pitch
Rating: 5 out of 5 stars
5/5
Learn SQL in 24 Hours
Ebook
Learn SQL in 24 Hours
byAlex Nordeen
Rating: 5 out of 5 stars
5/5

Related podcast episodes

Skip carousel

Oracle Data Lakehouse: With each passing day, more and more data sources are sending greater volumes of data across the globe. For any organization, this combination of structured and unstructured data continues to be a challenge. Data lakehouses link, correlate, and...
Podcast episode
Oracle Data Lakehouse: With each passing day, more and more data sources are sending greater volumes of data across the globe. For any organization, this combination of structured and unstructured data continues to be a challenge. Data lakehouses link, correlate, and...
byOracle University Podcast
0 ratings
0% found this document useful
Troubleshooting Kafka In Production: Kafka has become a ubiquitous technology, offering a simple method for coordinating events and data across different systems. Operating it at scale, however, is notoriously challenging. Elad Eldor has experienced these challenges first-hand, leading to his work writing the book "Kafka: Troubleshooting in Production". In this episode he highlights the sources of complexity that contribute to Kafka's operational difficulties, and some of the main ways to identify and mitigate potential sources of trouble.
Podcast episode
Troubleshooting Kafka In Production: Kafka has become a ubiquitous technology, offering a simple method for coordinating events and data across different systems. Operating it at scale, however, is notoriously challenging. Elad Eldor has experienced these challenges first-hand, leading to his work writing the book "Kafka: Troubleshooting in Production". In this episode he highlights the sources of complexity that contribute to Kafka's operational difficulties, and some of the main ways to identify and mitigate potential sources of trouble.
byData Engineering Podcast
0 ratings
0% found this document useful
Eliminate The Overhead In Your Data Integration With The Open Source dlt Library: Cloud data warehouses and the introduction of the ELT paradigm has led to the creation of multiple options for flexible data integration, with a roughly equal distribution of commercial and open source options. The challenge is that most of those options are complex to operate and exist in their own silo. The dlt project was created to eliminate overhead and bring data integration into your full control as a library component of your overall data system. In this episode Adrian Brudaru explains how it works, the benefits that it provides over other data integration solutions, and how you can start building pipelines today.
Podcast episode
Eliminate The Overhead In Your Data Integration With The Open Source dlt Library: Cloud data warehouses and the introduction of the ELT paradigm has led to the creation of multiple options for flexible data integration, with a roughly equal distribution of commercial and open source options. The challenge is that most of those options are complex to operate and exist in their own silo. The dlt project was created to eliminate overhead and bring data integration into your full control as a library component of your overall data system. In this episode Adrian Brudaru explains how it works, the benefits that it provides over other data integration solutions, and how you can start building pipelines today.
byData Engineering Podcast
0 ratings
0% found this document useful
Building An Internal Database As A Service Platform At Cloudflare: Data persistence is one of the most challenging aspects of computer systems. In the era of the cloud most developers rely on hosted services to manage their databases, but what if you are a cloud service? In this episode Vignesh Ravichandran explains how his team at Cloudflare provides PostgreSQL as a service to their developers for low latency and high uptime services at global scale. This is an interesting and insightful look at pragmatic engineering for reliability and scale.
Podcast episode
Building An Internal Database As A Service Platform At Cloudflare: Data persistence is one of the most challenging aspects of computer systems. In the era of the cloud most developers rely on hosted services to manage their databases, but what if you are a cloud service? In this episode Vignesh Ravichandran explains how his team at Cloudflare provides PostgreSQL as a service to their developers for low latency and high uptime services at global scale. This is an interesting and insightful look at pragmatic engineering for reliability and scale.
byData Engineering Podcast
0 ratings
0% found this document useful
Addressing The Challenges Of Component Integration In Data Platform Architectures: Building a data platform that is enjoyable and accessible for all of its end users is a substantial challenge. One of the core complexities that needs to be addressed is the fractal set of integrations that need to be managed across the individual components. In this episode Tobias Macey shares his thoughts on the challenges that he is facing as he prepares to build the next set of architectural layers for his data platform to enable a larger audience to start accessing the data being managed by his team.
$Addressing The Challenges Of Component Integration In Data Platform Architectures: Building a data platform that is enjoyable and accessible for all of its end users is a substantial challenge. One of the core complexities that needs to be addressed is the fractal set of integrations that need to be managed across the individual components. In this episode Tobias Macey shares his thoughts on the challenges that he is facing as he prepares to build the next set of architectural layers for his data platform to enable a larger audience to start accessing the data being managed by his team.$
$Addressing The Challenges Of Component Integration In Data Platform Architectures: Building a data platform that is enjoyable and accessible for all of its end users is a substantial challenge. One of the core complexities that needs to be addressed is the fractal set of integrations that need to be managed across the individual components. In this episode Tobias Macey shares his thoughts on the challenges that he is facing as he prepares to build the next set of architectural layers for his data platform to enable a larger audience to start accessing the data being managed by his team.$
Podcast episode
Addressing The Challenges Of Component Integration In Data Platform Architectures: Building a data platform that is enjoyable and accessible for all of its end users is a substantial challenge. One of the core complexities that needs to be addressed is the fractal set of integrations that need to be managed across the individual components. In this episode Tobias Macey shares his thoughts on the challenges that he is facing as he prepares to build the next set of architectural layers for his data platform to enable a larger audience to start accessing the data being managed by his team.
byData Engineering Podcast
0 ratings
0% found this document useful
Adding An Easy Mode For The Modern Data Stack With 5X: The "modern data stack" promised a scalable, composable data platform that gave everyone the flexibility to use the best tools for every job. The reality was that it left data teams in the position of spending all of their engineering effort on integrating systems that weren't designed with compatible user experiences. The team at 5X understand the pain involved and the barriers to productivity and set out to solve it by pre-integrating the best tools from each layer of the stack. In this episode founder Tarush Aggarwal explains how the realities of the modern data stack are impacting data teams and the work that they are doing to accelerate time to value.
Podcast episode
Adding An Easy Mode For The Modern Data Stack With 5X: The "modern data stack" promised a scalable, composable data platform that gave everyone the flexibility to use the best tools for every job. The reality was that it left data teams in the position of spending all of their engineering effort on integrating systems that weren't designed with compatible user experiences. The team at 5X understand the pain involved and the barriers to productivity and set out to solve it by pre-integrating the best tools from each layer of the stack. In this episode founder Tarush Aggarwal explains how the realities of the modern data stack are impacting data teams and the work that they are doing to accelerate time to value.
byData Engineering Podcast
0 ratings
0% found this document useful
Data Sharing Across Business And Platform Boundaries: Sharing data is a simple concept, but complicated to implement well. There are numerous business rules and regulatory concerns that need to be applied. There are also numerous technical considerations to be made, particularly if the producer and consumer of the data aren't using the same platforms. In this episode Andrew Jefferson explains the complexities of building a robust system for data sharing, the techno-social considerations, and how the Bobsled platform that he is building aims to simplify the process.
Podcast episode
Data Sharing Across Business And Platform Boundaries: Sharing data is a simple concept, but complicated to implement well. There are numerous business rules and regulatory concerns that need to be applied. There are also numerous technical considerations to be made, particularly if the producer and consumer of the data aren't using the same platforms. In this episode Andrew Jefferson explains the complexities of building a robust system for data sharing, the techno-social considerations, and how the Bobsled platform that he is building aims to simplify the process.
byData Engineering Podcast
0 ratings
0% found this document useful
Designing A Non-Relational Database Engine: Databases come in a variety of formats for different use cases. The default association with the term "database" is relational engines, but non-relational engines are also used quite widely. In this episode Oren Eini, CEO and creator of RavenDB, explores the nuances of relational vs. non-relational engines, and the strategies for designing a non-relational database.
Podcast episode
Designing A Non-Relational Database Engine: Databases come in a variety of formats for different use cases. The default association with the term "database" is relational engines, but non-relational engines are also used quite widely. In this episode Oren Eini, CEO and creator of RavenDB, explores the nuances of relational vs. non-relational engines, and the strategies for designing a non-relational database.
byData Engineering Podcast
0 ratings
0% found this document useful
Tackling Real Time Streaming Data With SQL Using RisingWave: Stream processing systems have long been built with a code-first design, adding SQL as a layer on top of the existing framework. RisingWave is a database engine that was created specifically for stream processing, with S3 as the storage layer. In this episode Yingjun Wu explains how it is architected to power analytical workflows on continuous data flows, and the challenges of making it responsive and scalable.
Podcast episode
Tackling Real Time Streaming Data With SQL Using RisingWave: Stream processing systems have long been built with a code-first design, adding SQL as a layer on top of the existing framework. RisingWave is a database engine that was created specifically for stream processing, with S3 as the storage layer. In this episode Yingjun Wu explains how it is architected to power analytical workflows on continuous data flows, and the challenges of making it responsive and scalable.
byData Engineering Podcast
0 ratings
0% found this document useful
Designing Data Transfer Systems That Scale: The first step of data pipelines is to move the data to a place where you can process and prepare it for its eventual purpose. Data transfer systems are a critical component of data enablement, and building them to support large volumes of information is a complex endeavor. Andrei Tserakhau has dedicated his careeer to this problem, and in this episode he shares the lessons that he has learned and the work he is doing on his most recent data transfer system at DoubleCloud.
Podcast episode
Designing Data Transfer Systems That Scale: The first step of data pipelines is to move the data to a place where you can process and prepare it for its eventual purpose. Data transfer systems are a critical component of data enablement, and building them to support large volumes of information is a complex endeavor. Andrei Tserakhau has dedicated his careeer to this problem, and in this episode he shares the lessons that he has learned and the work he is doing on his most recent data transfer system at DoubleCloud.
byData Engineering Podcast
0 ratings
0% found this document useful
An Overview Of The Sate Of Data Orchestration In An Increasingly Complex Data Ecosystem: Data systems are inherently complex and often require integration of multiple technologies. Orchestrators are centralized utilities that control the execution and sequencing of interdependent operations. This offers a single location for managing visibility and error handling so that data platform engineers can manage complexity. In this episode Nick Schrock, creator of Dagster, shares his perspective on the state of data orchestration technology and its application to help inform its implementation in your environment.
Podcast episode
An Overview Of The Sate Of Data Orchestration In An Increasingly Complex Data Ecosystem: Data systems are inherently complex and often require integration of multiple technologies. Orchestrators are centralized utilities that control the execution and sequencing of interdependent operations. This offers a single location for managing visibility and error handling so that data platform engineers can manage complexity. In this episode Nick Schrock, creator of Dagster, shares his perspective on the state of data orchestration technology and its application to help inform its implementation in your environment.
byData Engineering Podcast
0 ratings
0% found this document useful
Harnessing Generative AI For Creating Educational Content With Illumidesk: Generative AI has unlocked a massive opportunity for content creation. There is also an unfulfilled need for experts to be able to share their knowledge and build communities. Illumidesk was built to take advantage of this intersection. In this episode Greg Werner explains how they are using generative AI as an assistive tool for creating educational material, as well as building a data driven experience for learners.
Podcast episode
Harnessing Generative AI For Creating Educational Content With Illumidesk: Generative AI has unlocked a massive opportunity for content creation. There is also an unfulfilled need for experts to be able to share their knowledge and build communities. Illumidesk was built to take advantage of this intersection. In this episode Greg Werner explains how they are using generative AI as an assistive tool for creating educational material, as well as building a data driven experience for learners.
byData Engineering Podcast
0 ratings
0% found this document useful
Bringing DevOps to the Database with Automation
Podcast episode
Bringing DevOps to the Database with Automation
byThe Cloudcast
0 ratings
0% found this document useful
Shining Some Light In The Black Box Of PostgreSQL Performance: Databases are the core of most applications, but they are often treated as inscrutable black boxes. When an application is slow, there is a good probability that the database needs some attention. In this episode Lukas Fittl shares some hard-won wisdom about the causes and solution of many performance bottlenecks and the work that he is doing to shine some light on PostgreSQL to make it easier to understand how to keep it running smoothly.
Podcast episode
Shining Some Light In The Black Box Of PostgreSQL Performance: Databases are the core of most applications, but they are often treated as inscrutable black boxes. When an application is slow, there is a good probability that the database needs some attention. In this episode Lukas Fittl shares some hard-won wisdom about the causes and solution of many performance bottlenecks and the work that he is doing to shine some light on PostgreSQL to make it easier to understand how to keep it running smoothly.
byData Engineering Podcast
0 ratings
0% found this document useful
Understanding Time-Series Database Patterns
Podcast episode
Understanding Time-Series Database Patterns
byThe Cloudcast
0 ratings
0% found this document useful
Whiteboard Confessional: Everything's a Database Except SQLite: Join me as I continue a new series called Whiteboard Confessional with a look at the awesomeness that is SQLite, including how it wasn’t designed to work in a client-server fashion, when you should use it and when you absolutely shouldn’t, how deciding to
Podcast episode
Whiteboard Confessional: Everything's a Database Except SQLite: Join me as I continue a new series called Whiteboard Confessional with a look at the awesomeness that is SQLite, including how it wasn’t designed to work in a client-server fashion, when you should use it and when you absolutely shouldn’t, how deciding to
byAWS Morning Brief
0 ratings
0% found this document useful
Automate Your Pipeline Creation For Streaming Data Transformations With SQLake: Managing end-to-end data flows becomes complex and unwieldy as the scale of data and its variety of applications in an organization grows. Part of this complexity is due to the transformation and orchestration of data living in disparate systems. The team at Upsolver is taking aim at this problem with the latest iteration of their platform in the form of SQLake. In this episode Ori Rafael explains how they are automating the creation and scheduling of orchestration flows and their related transforations in a unified SQL interface.
Podcast episode
Automate Your Pipeline Creation For Streaming Data Transformations With SQLake: Managing end-to-end data flows becomes complex and unwieldy as the scale of data and its variety of applications in an organization grows. Part of this complexity is due to the transformation and orchestration of data living in disparate systems. The team at Upsolver is taking aim at this problem with the latest iteration of their platform in the form of SQLake. In this episode Ori Rafael explains how they are automating the creation and scheduling of orchestration flows and their related transforations in a unified SQL interface.
byData Engineering Podcast
0 ratings
0% found this document useful
Run Your Own Anomaly Detection For Your Critical Business Metrics With Anomstack: If your business metrics looked weird tomorrow, would you know about it first? Anomaly detection is focused on identifying those outliers for you, so that you are the first to know when a business critical dashboard isn't right. Unfortunately, it can often be complex or expensive to incorporate anomaly detection into your data platform. Andrew Maguire got tired of solving that problem for each of the different roles he has ended up in, so he created the open source Anomstack project. In this episode he shares what it is, how it works, and how you can start using it today to get notified when the critical metrics in your business aren't quite right.
Podcast episode
Run Your Own Anomaly Detection For Your Critical Business Metrics With Anomstack: If your business metrics looked weird tomorrow, would you know about it first? Anomaly detection is focused on identifying those outliers for you, so that you are the first to know when a business critical dashboard isn't right. Unfortunately, it can often be complex or expensive to incorporate anomaly detection into your data platform. Andrew Maguire got tired of solving that problem for each of the different roles he has ended up in, so he created the open source Anomstack project. In this episode he shares what it is, how it works, and how you can start using it today to get notified when the critical metrics in your business aren't quite right.
byData Engineering Podcast
0 ratings
0% found this document useful
Oracle Machine Learning: There is so much data available today. But it only makes a difference when you transform that data into actionable intelligence. In this episode, hosts Lois Houston and Nikita Abraham, along with Nick Commisso, discuss how you can harness the...
Podcast episode
Oracle Machine Learning: There is so much data available today. But it only makes a difference when you transform that data into actionable intelligence. In this episode, hosts Lois Houston and Nikita Abraham, along with Nick Commisso, discuss how you can harness the...
byOracle University Podcast
0 ratings
0% found this document useful
Building Linked Data Products With JSON-LD: A significant amount of time in data engineering is dedicated to building connections and semantic meaning around pieces of information. Linked data technologies provide a means of tightly coupling metadata with raw information. In this episode Brian Platz explains how JSON-LD can be used as a shared representation of linked data for building semantic data products.
Podcast episode
Building Linked Data Products With JSON-LD: A significant amount of time in data engineering is dedicated to building connections and semantic meaning around pieces of information. Linked data technologies provide a means of tightly coupling metadata with raw information. In this episode Brian Platz explains how JSON-LD can be used as a shared representation of linked data for building semantic data products.
byData Engineering Podcast
0 ratings
0% found this document useful
Using Data To Illuminate The Intentionally Opaque Insurance Industry: The insurance industry is notoriously opaque and hard to navigate. Max Cho found that fact frustrating enough that he decided to build a business of making policy selection more navigable. In this episode he shares his journey of data collection and analysis and the challenges of automating an intentionally manual industry.
Podcast episode
Using Data To Illuminate The Intentionally Opaque Insurance Industry: The insurance industry is notoriously opaque and hard to navigate. Max Cho found that fact frustrating enough that he decided to build a business of making policy selection more navigable. In this episode he shares his journey of data collection and analysis and the challenges of automating an intentionally manual industry.
byData Engineering Podcast
0 ratings
0% found this document useful
Unlocking Your dbt Projects With Practical Advice For Practitioners: The dbt project has become overwhelmingly popular across analytics and data engineering teams. While it is easy to adopt, there are many potential pitfalls. Dustin Dorsey and Cameron Cyr co-authored a practical guide to building your dbt project. In this episode they share their hard-won wisdom about how to build and scale your dbt projects.
Podcast episode
Unlocking Your dbt Projects With Practical Advice For Practitioners: The dbt project has become overwhelmingly popular across analytics and data engineering teams. While it is easy to adopt, there are many potential pitfalls. Dustin Dorsey and Cameron Cyr co-authored a practical guide to building your dbt project. In this episode they share their hard-won wisdom about how to build and scale your dbt projects.
byData Engineering Podcast
0 ratings
0% found this document useful
Powering Vector Search With Real Time And Incremental Vector Indexes: The rapid growth of machine learning, especially large language models, have led to a commensurate growth in the need to store and compare vectors. In this episode Louis Brandy discusses the applications for vector search capabilities both in and outside of AI, as well as the challenges of maintaining real-time indexes of vector data.
Podcast episode
Powering Vector Search With Real Time And Incremental Vector Indexes: The rapid growth of machine learning, especially large language models, have led to a commensurate growth in the need to store and compare vectors. In this episode Louis Brandy discusses the applications for vector search capabilities both in and outside of AI, as well as the challenges of maintaining real-time indexes of vector data.
byData Engineering Podcast
0 ratings
0% found this document useful
The View Below The Waterline Of Apache Iceberg And How It Fits In Your Data Lakehouse: Cloud data warehouses have unlocked a massive amount of innovation and investment in data applications, but they are still inherently limiting. Because of their complete ownership of your data they constrain the possibilities of what data you can store and how it can be used. Projects like Apache Iceberg provide a viable alternative in the form of data lakehouses that provide the scalability and flexibility of data lakes, combined with the ease of use and performance of data warehouses. Ryan Blue helped create the Iceberg project, and in this episode he rejoins the show to discuss how it has evolved and what he is doing in his new business Tabular to make it even easier to implement and maintain.
Podcast episode
The View Below The Waterline Of Apache Iceberg And How It Fits In Your Data Lakehouse: Cloud data warehouses have unlocked a massive amount of innovation and investment in data applications, but they are still inherently limiting. Because of their complete ownership of your data they constrain the possibilities of what data you can store and how it can be used. Projects like Apache Iceberg provide a viable alternative in the form of data lakehouses that provide the scalability and flexibility of data lakes, combined with the ease of use and performance of data warehouses. Ryan Blue helped create the Iceberg project, and in this episode he rejoins the show to discuss how it has evolved and what he is doing in his new business Tabular to make it even easier to implement and maintain.
byData Engineering Podcast
0 ratings
0% found this document useful
Building Applications With Data As Code On The DataOS: The modern data stack has made it more economical to use enterprise grade technologies to power analytics at organizations of every scale. Unfortunately it has also introduced new overhead to manage the full experience as a single workflow. At the Modern Data Company they created the DataOS platform as a means of driving your full analytics lifecycle through code, while providing automatic knowledge graphs and data discovery. In this episode Srujan Akula explains how the system is implemented and how you can start using it today with your existing data systems.
Podcast episode
Building Applications With Data As Code On The DataOS: The modern data stack has made it more economical to use enterprise grade technologies to power analytics at organizations of every scale. Unfortunately it has also introduced new overhead to manage the full experience as a single workflow. At the Modern Data Company they created the DataOS platform as a means of driving your full analytics lifecycle through code, while providing automatic knowledge graphs and data discovery. In this episode Srujan Akula explains how the system is implemented and how you can start using it today with your existing data systems.
byData Engineering Podcast
0 ratings
0% found this document useful
Enhancing The Abilities Of Software Engineers With Generative AI At Tabnine: Software development involves an interesting balance of creativity and repetition of patterns. Generative AI has accelerated the ability of developer tools to provide useful suggestions that speed up the work of engineers. Tabnine is one of the main platforms offering an AI powered assistant for software engineers. In this episode Eran Yahav shares the journey that he has taken in building this product and the ways that it enhances the ability of humans to get their work done, and when the humans have to adapt to the tool.
Podcast episode
Enhancing The Abilities Of Software Engineers With Generative AI At Tabnine: Software development involves an interesting balance of creativity and repetition of patterns. Generative AI has accelerated the ability of developer tools to provide useful suggestions that speed up the work of engineers. Tabnine is one of the main platforms offering an AI powered assistant for software engineers. In this episode Eran Yahav shares the journey that he has taken in building this product and the ways that it enhances the ability of humans to get their work done, and when the humans have to adapt to the tool.
byData Engineering Podcast
0 ratings
0% found this document useful
Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary: Working with data is a complicated process, with numerous chances for something to go wrong. Identifying and accounting for those errors is a critical piece of building trust in the organization that your data is accurate and up to date. While there are numerous products available to provide that visibility, they all have different technologies and workflows that they focus on. To bring observability to dbt projects the team at Elementary embedded themselves into the workflow. In this episode Maayan Salom explores the approach that she has taken to bring observability, enhanced testing capabilities, and anomaly detection into every step of the dbt developer experience.
Podcast episode
Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary: Working with data is a complicated process, with numerous chances for something to go wrong. Identifying and accounting for those errors is a critical piece of building trust in the organization that your data is accurate and up to date. While there are numerous products available to provide that visibility, they all have different technologies and workflows that they focus on. To bring observability to dbt projects the team at Elementary embedded themselves into the workflow. In this episode Maayan Salom explores the approach that she has taken to bring observability, enhanced testing capabilities, and anomaly detection into every step of the dbt developer experience.
byData Engineering Podcast
0 ratings
0% found this document useful
Powering your Copilot for Data – with Artem Keydunov of Cube.dev
Podcast episode
Powering your Copilot for Data – with Artem Keydunov of Cube.dev
byLatent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and all things Software 3.0
0 ratings
0% found this document useful
Composable Data Analytics
Podcast episode
Composable Data Analytics
byThe Cloudcast
0 ratings
0% found this document useful
Scalable, Serverless Database Platforms
Podcast episode
Scalable, Serverless Database Platforms
byThe Cloudcast
0 ratings
0% found this document useful

Skip carousel

What is ELT?
Techfastly
Article
What is ELT?
Apr 1, 2021
It stands for extract, load, and transform- the processes a data pipeline uses for replicating the data from a source system into a target system such as a cloud data warehouse. 1. Extraction is the first step in which data is copied from the source
6 min read
Real World Computing
PC Pro Magazine
Article
Real World Computing
May 11, 2023
Migrating to Azure isn’t necessarily the toughest part of a successful cloud migration, explains our guest columnist Many organisations succeed at deploying resources in or migrating to Microsoft Azure. But many of those same organisations fail to en
6 min read
“We Should Pay Attention To The Way That A New Language Can Redefine The Limits Of Computing”
PC Pro Magazine
Article
“We Should Pay Attention To The Way That A New Language Can Redefine The Limits Of Computing”
Feb 11, 2021
7 min read
AWS Vs Azure What’s The Difference?
PC Pro Magazine
Article
AWS Vs Azure What’s The Difference?
Sep 11, 2022
7 min read
Data-driven Decision Making That Uses Data, Mind And Heart
The European Business Review
Article
Data-driven Decision Making That Uses Data, Mind And Heart
Jan 31, 2020
14 min read
Darq
PC Pro Magazine
Article
Darq
Jul 9, 2022
3 min read
REDUCING IT COSTS FOR SMEs
PC Pro Magazine
Article
REDUCING IT COSTS FOR SMEs
Jan 5, 2023
Where is your company’s data stored? It’s all the rage to push data up into the cloud and to make it someone else’s problem. However, this is rarely the real outcome. While I would accept that a well-run data centre is likely to be more robust than a
4 min read
AWS vs Azure
Linux Format
Article
AWS vs Azure
Aug 22, 2023
9 min read
Buying The Tool
Techfastly
Article
Buying The Tool
Apr 1, 2021
3 min read
“When Something Goes Wrong, You Realise You’re Like That Cartoon Character That Has Run Off The Edge Of The Cliff”
PC Pro Magazine
Article
“When Something Goes Wrong, You Realise You’re Like That Cartoon Character That Has Run Off The Edge Of The Cliff”
Feb 9, 2023
We need to talk about data. Specifically, your data and my data. The stuff we use on a day-to-day basis, from where we store it to what our expectations are for its safe handling. Now let me get one thing clear from the beginning: I am going to sugge
9 min read
KeePassXC: The Friendlier Free Offline Password Manager
PCWorld
Article
KeePassXC: The Friendlier Free Offline Password Manager
Sep 5, 2023
7 min read
BUYER'S GUIDE TO Cloud File Sharing In 2021
PC Pro Magazine
Article
BUYER'S GUIDE TO Cloud File Sharing In 2021
Jan 7, 2021
4 min read
MARIADB Optimise And Control Your Databases
Linux Format
Article
MARIADB Optimise And Control Your Databases
Jul 30, 2019
9 min read
Why Is ELT Better For Cloud Data Warehousing?
Techfastly
Article
Why Is ELT Better For Cloud Data Warehousing?
Apr 1, 2021
2 min read
One-day Projects To Improve Your Business Network
PC Pro Magazine
Article
One-day Projects To Improve Your Business Network
Apr 10, 2022
8 min read
Understanding ELT & ETL
Techfastly
Article
Understanding ELT & ETL
Apr 1, 2021
8 min read
10 Myths about Cloud Computing
Techfastly
Article
10 Myths about Cloud Computing
Oct 21, 2020
Cloud is a combination of hardware and software that stores your data virtually and gives you access to the desired software and application whenever you need it. Cloud computing is not your traditional computing that bounds and restricts the apps an
4 min read
Master Linux VM creation in Azure
Linux Format
Article
Master Linux VM creation in Azure
May 30, 2023
12 min read
AI As A Service
PC Pro Magazine
Article
AI As A Service
Jul 9, 2020
2 min read
When Should You Upgrade Your Hardware?
PC Pro Magazine
Article
When Should You Upgrade Your Hardware?
Aug 13, 2020
7 min read
Herd In The Cloud
Linux Format
Article
Herd In The Cloud
Sep 21, 2021
Matt Yonkovit is Percona’s Head of Open Source Strategy and a member of SHA (Silly Hats Anonymous). “Going ‘cloud native’ involves building applications in new ways. Traditional applications are generally designed with a two- or three-tier architectu
1 min read
It’s Great When You’re K8s
Linux Format
Article
It’s Great When You’re K8s
Oct 18, 2022
8 min read
Cloudy With No Chance Of Erp
Architectural Review Asia Pacific
Article
Cloudy With No Chance Of Erp
Nov 11, 2019
ERP (enterprise resource planning) was born around the time the first ‘[Something] for Dummies’ book was published*. It’s typically inflexible, uncompromising software designed for large businesses, like banks, large corporations, manufacturing and s
2 min read
Edge and Cloud Computing Can They Coexist Peacefully?
Techfastly
Article
Edge and Cloud Computing Can They Coexist Peacefully?
Jun 1, 2022
6 min read
Mac 911
MacWorld
Article
Mac 911
Sep 18, 2018
5 min read
Browser Wars 2020
Maximum PC
Article
Browser Wars 2020
May 26, 2020
8 min read
Data Fabric
PC Pro Magazine
Article
Data Fabric
Aug 13, 2020
3 min read
Is Quantum Computing Ready For Prime Time?
Maximum PC
Article
Is Quantum Computing Ready For Prime Time?
Nov 7, 2023
4 min read
Vector Vexations
Linux Format
Article
Vector Vexations
Apr 2, 2024
Why does MySQL not support vectors in its community edition? Generative AI is the hot topic in tech. GenAI relies on vector data. Yet Oracle has no plans to support vectors in the community edition of MySQL. If you want to try out vector data with ot
1 min read
Salesforce Adding Einstein Analytics Al To Tableau Platform
Techfastly
Article
Salesforce Adding Einstein Analytics Al To Tableau Platform
Feb 4, 2021
3 min read

Related categories

Skip carousel

Reviews for The Modern Data Warehouse in Azure

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

The Modern Data Warehouse in Azure - Matt How

M. HowThe Modern Data Warehouse in Azurehttps://doi.org/10.1007/978-1-4842-5823-1_1

1. The Rise of the Modern Data Warehouse

Matt How¹

(1)

Alton, UK

A data warehouse is a common and well-understood technology asset that underpins many decision support systems. Whether the warehouse was initially designed to act as a hub for data integration or a base for analytical consistency, many organizations make use of the concepts and technologies that underpin data warehousing.

At one point, the concept of a data warehouse was revolutionary and the two key philosophies on data warehousing, those of Ralph Kimball and Bill Inmon, were new and exciting. However, many decades have passed since this point, and while the philosophies have cross-pollinated, the core design and purpose has stayed very much the same, so much so that many data warehouse developers can move seamlessly from company to company because the data warehouse is such a prevalent design. The only thing that changes is the subject matter. This is very unlike more transactional databases that may be designed very differently to support the specific needs of an application.

As the cloud revolution began, more and more services began to find homes in the cloud and the data warehouse is no exception. A cloud-based environment eliminates many common issues with data warehousing and also offers many new opportunities. First of which is the serverless nature of cloud-based databases. By not having to manage the server environment, patching, the operating system (OS) or upgrades, and others, the development team can really focus just on the data processing that needs to be undertaken. In addition, the architecture itself can be scaled so that businesses pay for what they actually use and not for a service that offers growth room for the next five years. Instead, the size of the system can be tailed and charged at per hour increments so that aggressive cost optimizations can be achieved.

In times gone by, the on-premises architecture of data warehouses meant that there were hard limits on the amount of data that could be stored and the frequency at which that data could be ingested. Further, the tools used to populate an on-premises data warehouse had limited ability to deal with complex data types or streaming datasets, concepts that are now prevalent in the application landscape that feed data warehouses. Businesses now require these sources to be included in their reports, and so the data warehouse must modernize in order to keep up. At present, Azure provides many tools and services to help overcome these problems, many of which can be integrated directly into what would now be known as a modern data warehouse.

In addition to modernizing the database, the tools that operate, automate, and populate the data warehouse also need to keep up in order for the solution to feel cohesive. This is why Azure offers excellent integration and automation services that can be used in conjunction with the SQL database technologies. These tools mean that more can be achieved with less code and confusion, by creating standard patterns that can be applied generically to a variety of data processing problems. Common menial tasks such as database backups can be completely automated, making the issue of disaster recovery much less of a worry. With the latest features of Azure SQL Database, artificial intelligence is used to recommend and apply tuning alterations and index adjustments to ensure database performance is at its absolute best. This works alongside advanced threat detection which ensures databases hosted in Azure are safer than ever.

Finally, businesses are increasingly interested in big data and data science, concepts that both require processing huge amounts of data at scale and maintaining a good degree of performance. For this reason, data lakes have become more popular and, rather than being seen as an isolated service, should be seen as an excellent companion to the modern data warehouse. Data lakes offer the flexibility to process varied data types at a variety of frequencies, distilling value at every stage, which can then be passed into the modern data warehouse and analyzed by the end users alongside the more traditional measures and stats.

In recent years, many organizations have been struggling with the issues associated with on-premises data warehousing and are now looking to modernize. The rise of the modern data warehouse has already begun, and the goal of this book is to ensure every reader can reap the full benefit.

Getting Started

Microsoft Azure is a comprehensive cloud platform that provides the ability to build Platform as a Service (PaaS), Software as a Service (SaaS), and Infrastructure as a Service (IaaS) components on both Microsoft-specific services and also third-party and open source technologies. Free trials are available for Microsoft Azure that provide 30-day access and roughly £150/$200 worth of Azure credit. This should allow you to explore most if not all services in this book and gather more of a practical understanding of their implementation. There are also free tiers available for many services that provide sufficient amounts of features for reviewing. Alternatively, you or your company may already have an existing Azure subscription which could then be used to experiment with the technologies listed in this book.

Multi-region Support

A core element of Azure is its multi-region support. As you may know, the cloud is really just someone else’s computer, and in this case, the computer belongs to Microsoft and it is stored in a massive data center. It is these data centers that comprise an Azure region. If you are based in America, then you can pick from a range of regions, one of which will be your local region and will likely offer you the lowest latency; you could however deploy resources to a European region if you knew you were supporting customers in that part of the world. Most regions have a paired region which is used for disaster recovery scenarios, but on the whole it is best to keep related resources in the same region. This is to avoid data egress fees which are charged of data that has to be moved out of a region and into another. Note, Azure does not charge data ingress fees.

Resource Groups and Tagging

Once an Azure subscription has been set up, there are a few recommendations to help you organize the subscription. First is the resource group. The resource group is the root container for all single resources and allows a logical grouping for different services that relate to a single system. For example, a modern data warehouse may sit within a resource group that contains an Azure Data Factory, an Azure SQL Database, and an Azure Data Lake Gen 2 (ADL Gen2) account. The resource group means that admins can assign permissions to that single level and control permissions for the entire system. As the subscription gets more use, you should begin creating resource groups per project or application, per environment, so for a single data warehouse, you may have a development, test, and production resource group, each with different permissions.

Another useful technique is to use tags. Tags allow admins to label different resources so that they can be found easily and tracked against different departments, even if they are stored in the same resource group. Common tags include

Cost center

Owner

Creator

Application

However, many others could be useful to your organization.

Azure Security

From a security standpoint, Azure is an incredibly well-trusted platform. With over 90 compliance certificates in place, including many that are industry or region specific, no cloud platform has a more comprehensive portfolio. Microsoft has invested over one billion US dollars into the security of the Azure platform, having an army of cyber security experts at hand to keep your data safe. These facts and figures offer assurance that the cloud platform is secure; however, within your environment, it is important to properly secure data against malicious employees or external services. This is where service principals are employed. These are service accounts that can be assigned access to many of the resources in the resource group without any human employees having access to the data, ensuring the most sensitive datasets can remain protected.

Modernizing a data platform is no easy task. There are a lot of new terminology and new technologies to understand. In order to work with the demos and walk-throughs in this book, I have prepared some initial resources to review so that there is a common understanding.

Tools of the Trade

There are some tools that will make these technologies easier to use. These are easy to download and work with and in most cases are cross platform compatible, meaning they can work on Apple Macs and Windows machines. The following list explains the key tools that will come in handy throughout this book and what technologies they will assist with:

Visual Studio: 2019 is the current version and is the primary integrated development environment (IDE) when working with Azure and other Microsoft-based technologies.

Visual Studio SQL Server Data Tools: This add-in for Visual Studio gives developers the ability to create database projects and other BI-related projects such as Analysis Services.

Microsoft Azure Storage Explorer: This lightweight tool allows developers to connect to cloud storage accounts and access them as if they were local to their PC. When working with data lakes, this can be very useful.

SQL Server Management Studio: If you are based on a Windows environment, then this is a very powerful tool for monitoring and managing your SQL databases that has been trusted for years.

Azure Data Studio: This is a cross platform version of SQL Server Management Studio. Essentially, this is the go-to place for managing and monitoring any Microsoft SQL environment.

Glossary of Terms

With many new technologies being incorporated into the data platform, a glossary of terms is important to help introduce a conformed understanding. Additionally, many of these terms can be searched online which will allow development teams and architects to research the technologies more fully. The goal of this glossary, shown in Table 1-1, is to act as a point of reference for readers of this book, in case some terminology is new to them.

Table 1-1

Common Azure Terms

Naming Conventions

All development projects can benefit from a rigorous naming convention in my opinion and so a modern data warehouse is no different. A good naming convention should supply those that read the name enough detail to understand what the object is and roughly what it does. Additionally, a naming convention clears up any debate about what a particular thing should be called, as the formula to produce the name already exists. The naming convention included here is the standard recommended by Azure, which I have simply described in a shorter format.

The name of a resource is broken down into several pieces, and so the following list describes each section of the name. In the following, I will offer some examples of resource names, assuming the project for the book is called Modern Data Warehouse in Azure:

Department, business unit or project: This could be mrkt for marketing, fin for finance, or sls for sales.

Application or service name: For example, a SQL database would be sqldb, a Synapse Analytics database would be syndb, an Azure Data Factory would be adf.

Environment: This could be dev, test, sit, prod, to name a few.

Deployment region: This is the region in which the resource is located and is usually abbreviated such that East US would become eus and North Europe would become neu.

In Table 1-2, I have given examples of some common data warehousing resources alongside their suggested names.

Table 1-2

Example Azure resource names

M. HowThe Modern Data Warehouse in Azurehttps://doi.org/10.1007/978-1-4842-5823-1_2

2. The SQL Engine

Matt How¹

(1)

Alton, UK

The focus of this chapter is to break open the mysteries of each SQL storage engine and understand why a particular flavor of Azure SQL technology suits one scenario over another. We will analyze the underlying architecture of each service so that development choices can be well informed and well reasoned. Once we understand how each implementation of the SQL engine in Azure processes and stores data, we can look at the direction Microsoft is taking that technology and forecast whether the same choice would be made in the future. The knowledge gained in this chapter should provide you with the capability to understand your source data and therefore to choose which SQL engine should be used to store and process that data.

Later in this book, we will move out of the structured SQL world and discuss how we can utilize Azure data lake technology to more efficiently work with our data; however, those services are agnostic to the SQL engine that we decide best suits our use case and therefore can be decided upon later. As a primary focus, we must understand our SQL options, and from there, we can tailor our metadata, preparation routines, and development tools to suit that engine.

The Four Vs

The Microsoft Azure platform has a wealth of data storage options at the user’s disposal, each with different features and traits that make them well suited for a given type of data and scenario. Given the flexible and dynamic nature of cloud computing, Microsoft has built a comprehensive platform that ensures all varieties of data can be catered for. The acknowledgment of the need to cater to differing types of data gets neatly distilled into what is known in the data engineering world as The 3 Vs – volume, variety, and velocity.

Any combination of volume, variety, and velocity can be solved using a storage solution in the Azure platform. Often people refer to a fourth V being value which I think is a worthy addition as the value can often get lost in the volume.

As the volume increases, the curation process to distil value from data becomes more complex, and therefore, specific tools and solutions can be used to help that process, validating the need for a fourth V. When attempting to tackle any one or combination of the four Vs, it is important to understand the full set of options available so that a well-informed decision can be made. Understanding the reasons why a certain technology should be chosen over another is essential to any development process, as this can then inform the code, structure, and integration of that technology.

To use an example, if you needed to store a large amount of enterprise data that was a complete mix of file types and sizes, you would use an Azure Storage account. This would allow you to organize your data into a clear structure and efficiently increase your account size as and when you need. The aspects of that technology help to reduce the complexities of dealing with large-scale data and remove any barriers to entry. Volume, check. Variety, check.

Alternatively, if the requirement was to store JavaScript Object Notation (JSON) documents so that they can be efficiently queried, then the best option would be to utilize Cosmos DB. While there is nothing stopping JSON data being stored in Blob Storage, the ability to index and query JSON data using Cosmos DB make this an obvious choice. The guaranteed latency and throughput options of Cosmos DB mean that high-velocity data is easily ingested. When the volume begins to increase, then Cosmos DB will scale with it. Velocity, check. Volume, check.

Moving to a data warehouse, we know we will have a large amount of well-structured, strongly typed data that needs to rapidly serve up analytical insight. We need a SQL engine. Crucially, this is where the fourth V, value, comes into play. Datasets being used to feed a data warehouse may contain many attributes that are not especially valuable, and good practice dictates that these attributes are trimmed off before arriving in the data warehouse. The golden rule is that data stored in a data warehouse should be well curated and of utmost value. A SQL engine makes surfacing that valuable data easy, and further to that, no other storage option can facilitate joining of datasets to produce previously uncovered value as effortlessly as a SQL engine can. Value, check.

However, a wrinkle in the decision process is that Azure provides two types of SQL engine to choose from; each can tackle any challenge in the four Vs; however, it is wise to understand which engine solves which V best. Understanding the nuances of each flavor of Azure SQL will help developers make informed decisions about how to load, query, and manage the data warehouse.

The first SQL engine we will examine in this chapter is Azure Synapse Analytics (formerly Azure SQL Data Warehouse). This massively parallel processing (MPP) service provides scalability, elasticity, and concurrency, all underpinned by the well-loved Microsoft SQL server engine. The clue is certainly in the former title; this is a good option for data warehousing. However, there are other factors that mean this may not be the right choice in all scenarios. While Azure Synapse Analytics has a wealth of optimizations targeted at data warehousing, there are some reasons why the second SQL option, Azure SQL Database, may be more suitable.

Azure SQL Database is an OLTP type system that is optimized for reads and writes; however, it has some interesting features that make it a great candidate for a data warehouse environment. The recent advent of Azure SQL Database Hyperscale means that Azure SQL Database can scale up to 100 TB and provide additional read-only compute nodes to serve up analytical data. A further advantage is that Azure SQL Database has intelligent query processing and can be highly reactive to changes in runtime conditions allowing for peak performance to be maintained at critical times. Finally, there are multiple deployment options for Azure SQL Database that include managed instances and elastic pools. In essence, a managed instance is a full-blown SQL server instance deployed to the cloud and provides the closest match to an existing on-premises Microsoft SQL server implementation in Azure. Elastic pool databases utilize a single pool of compute resource to allow for a lower total cost of ownership as databases can consume more and less resources from the pool rather than having to be scaled independently.

Azure Synapse Analytics

When implementing an on-premises data warehouse, there are many constraints placed upon the developer. Initially there is the hassle of setting up and configuring the server, and even if this is taken care of already, there is always a maintenance and management overhead that cannot be ignored. Once the server is set up, further thought needs to be applied to file management and growth. In addition, the data warehouse itself is limited to the confines of the physical box, and often large databases have to utilize complex storage solutions to mitigate this issue.

However, if you are reading this book, then it is clear you are no longer interested in this archaic and cumbersome approach to data warehousing. By making the move up to the Azure cloud, you can put the days of server management behind you, safe in the knowledge that Microsoft will take care of all that. And what’s more, Azure does not just provide a normal SQL instance that is purely serverless; they have restructured the underlying architecture entirely so that it is tailored for the cloud environment. This is then extended further to the point that Azure Synapse Analytics is not only purpose-built for the cloud but purpose-built for large-scale data warehousing.

Understanding Distributions

A key factor that needs to be understood when working with Azure Synapse Analytics is that of distributions. In a standard SQL server implementation, you are working in a symmetric multi-processing (SMP) environment which means there is a single storage point coupled to a set of CPUs and queries are parallelized across those CPUs using a service bus. The main problem here is that all the CPUs need to access the same storage and this can become a bottleneck, especially when running large analytical queries.

When you begin using Azure Synapse Analytics, you are now in a massively parallel processing (MPP) environment.

There are a number of key differences between SMP and MPP environments, and they are illustrated in Figure 2-1. The most important is that storage is now widely distributed and coupled to a specific amount of compute. The benefit here is that each node of the engine is essentially a separate SQL database and can access its own storage separately from all the other nodes without causing contention.

../images/481645_1_En_2_Chapter/481645_1_En_2_Fig1_HTML.jpg

Figure 2-1

Diagram of SMP vs. MPP

Figure 2-1 shows how in an SMP environment, there can be contention for storage resources due to the single point of access; however, this problem is alleviated in the MPP environment as each compute

Enjoying the preview?

Page 1 of 1

The Modern Data Warehouse in Azure: Building with Speed and Agility on Microsoft’s Cloud Platform

About this ebook

Matt How

Related authors

Related to The Modern Data Warehouse in Azure

Related ebooks

Programming For You

Related podcast episodes

Related articles

Related categories

Reviews for The Modern Data Warehouse in Azure

What did you think?

Book preview

The Modern Data Warehouse in Azure - Matt How

1. The Rise of the Modern Data Warehouse

Getting Started

Multi-region Support

Resource Groups and Tagging

Azure Security

Tools of the Trade

Glossary of Terms

Naming Conventions

2. The SQL Engine

The Four Vs

Azure Synapse Analytics

Understanding Distributions