Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Hands-On Azure Data Platform: Building Scalable Enterprise-Grade Relational and Non-Relational database Systems with Azure Data Services
Hands-On Azure Data Platform: Building Scalable Enterprise-Grade Relational and Non-Relational database Systems with Azure Data Services
Hands-On Azure Data Platform: Building Scalable Enterprise-Grade Relational and Non-Relational database Systems with Azure Data Services
Ebook483 pages3 hours

Hands-On Azure Data Platform: Building Scalable Enterprise-Grade Relational and Non-Relational database Systems with Azure Data Services

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Hands-On Azure Data Platform' helps readers get a fundamental understanding of the Database, Data Warehouse, and Data Lake and their management on the Azure Data Platform.

The book describes how to work efficiently with Relational and Non-Relational Databases, Azure Synapse Analytics, and Azure Data Lake. The readers will use Azure Databricks and Azure Data Factory to experience data processing and transformation. The book delves deeply into topics like continuous integration, continuous delivery, and the use of Azure DevOps. The book focuses on the integration of Azure DevOps with CI/CD pipelines for data ops solutions. The book teaches readers how to migrate data from an on-premises system or another cloud service provider to Azure.

After reading the book, readers will develop end-to-end data solutions using the Azure data platform. Additionally, data engineers and ETL developers can streamline their ETL operations using various efficient Azure services.
LanguageEnglish
Release dateSep 2, 2022
ISBN9789355510389
Hands-On Azure Data Platform: Building Scalable Enterprise-Grade Relational and Non-Relational database Systems with Azure Data Services

Related to Hands-On Azure Data Platform

Related ebooks

Enterprise Applications For You

View More

Related articles

Reviews for Hands-On Azure Data Platform

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Hands-On Azure Data Platform - Sagar Lad

    CHAPTER 1

    Getting Started with Azure Data Platform

    Data is the new oil. A phrase that has been used quite often these days and rightly said so. Data has been impacting all the industry verticals for all the good reasons. With the abundance of data available today, people and organizations are able to take well informed decisions to yield growth. Various tools and platforms have come up to take advantage of the data to gather more insights and make data backed decisions. Some of the examples justifying the impact of data can be. Tools and platforms have been continuously evolving in the market to meet with increasing demand to store, transform or process huge amounts of data.

    In this chapter, we are going to briefly discuss the services offered by Azure Data Platform to work with various types of data.

    Structure

    In this chapter, we will learn the following aspects of Azure Data Platform.

    Evolving world of data and its types

    Defining the concept of database, data warehouse and data lake

    Taking a quick tour of Azure Data Platform

    Taking a quick tour of Azure Storage Account – blobs, tables, queues, and files

    Objectives

    After studying this chapter, you should be able to:

    Understand the various offering of Azure to work with data

    Explore the core services of Azure Storage account

    Evolving world of data and its type

    The world of data has been evolving like none other in the past few decades. With the internet boom and rise in the number of software applications and hardware devices by people across the world, the amount of data generated has been growing exponentially. According to few resources, we produce more than 2.5 quintillion bytes every day and it is expected to grow in the coming days. The vast abundance of data has enabled organizations to take well informed decisions to address business problems and take data backed decisions thus the need to store, process and manage data to gather insights has been the focus of most organizations.

    But the abundance of data has also created a few obstacles for organizations. The data generated by most of the devices or applications are in various forms including text, image, audio, video and are enormous in size to store in on-premises setup of organizations. Some of the data is structured while some are semi unstructured while others are unstructured. This has led to the creation of specific software's or services to deal with each kind of data.

    With Azure Data Platform, we get various services managed by Azure to store, process and transform. We can work with the data as per business requirements without worrying about the maintenance of the underlying hardware, scalability, licensing, and availability to name a few. These services are highly available, secure, and compliant with most of the regulations.

    To work with structured data, we can use services like Azure SQL or Azure SQL Data Warehouse and to work with the semi structured and unstructured data, we can use services like Azure Blob Storage, Azure Files and Azure Cosmos DB. We will be discussing most of the services of Azure Data platform extensively in the upcoming chapters of this book.

    Defining the concept of databases, data warehouses and data lake

    Modern applications have been creating as well as consuming data of various formats. To cater the needs of modern applications, we need to plan our data strategy in a careful manner to securely store the data to gather the maximum benefits. Data has been widely used across industry to gather insights to take data-backed decisions by understanding customer behaviors to generate more revenue and profit. Thus, it is very important to carefully plan out the right strategy and data storage option. Depending on the nature of data and the business requirement, we need to decide the options to store our data.

    In this section, we are going to briefly talk about different data storage options.

    Database

    Databases are a good choice for data workloads which store data of single format. It is used to store a finite amount of data. We can store historic data in databases, but it has certain constraints of its own. Over time, we cannot keep on inserting more and more data to a single database. This can impact the performance of the application which is using this database.

    Data warehouse

    Data warehouses allow us to deal with multiple formats of data as well as store huge amounts of data. It is an ideal choice to store historical data. We process these historical data to gather insights and make crucial business decisions. Data warehouses work on the principle of Extract, Transform and Load (ETL) while working with the data. As various applications can generate data in various formats, data is first extracted from all the sources and then transformed to a structured format and then loaded to the data warehouse where we can perform our analytics on the stored data. But there are challenges with it too. One concern is that it only supports vertical scaling and can only store data in a particular format.

    Data lakes

    Data lake can store huge amounts of data like data warehouse as well as store data in multiple formats. Apart from that it also addresses all the drawbacks that were there with databases and data warehouses. We can scale our data lakes horizontally as well as store data as per our requirements. Data Lake can store data in any format. Data lake works on the principle for the ELT. In data warehouses, the schema of the schema is defined before the data is stored but with data lakes, the schema is defined after the data is loaded.

    Taking a quick tour of Azure Data Platform

    Microsoft Azure provides an array of services to store, process, and manage data solutions. Provisioning and configuring these are easy and don’t have a steep learning curve. Microsoft Azure provides end-to-end solutions to monitor and manage Data Platform services. We can manage or provide these services by using Azure CLI, Azure Portal, or Azure PowerShell.

    The following services are part of Microsoft Azure's Data Platform which helps us build and manage data solutions:

    Azure SQL Database

    Azure Cosmos DB

    Azure Database for MySQL

    Azure Database for PostgreSQL

    Azure Synapse Analytics

    Azure Databricks

    Azure Data Lake

    Azure Data Factory

    Storage account

    Let us have a quick tour of each of these services.

    Azure SQL Database

    Azure SQL Database is a managed, secure, and scalable relational database as a service (DBaaS) offering of Microsoft Azure which falls in the category of Platform as a Service (PaaS). It comes up with an assurance of 99.995% availability and has built.in backup, recovery, and patch-ups. With Azure SQL Database, we can focus on working on the business requirement without worrying about the underlying hardware OS or even managing the database engine. Azure SQL runs on top of the latest enterprise version of SQL server. Azure SQL Database offers two deployment options, that is, Single database and elastic pool. With a single database, we get a set of resources specific to the database managed via a logical SQL server. We have hyperscale and serverless options available for the single database deployment option. With the elastic pool option, we can manage and scale multiple databases. All the databases which are a part of the elastic pool share a set of resources.

    Azure Cosmos DB

    Azure Cosmos DB is a highly scalable, multi-model and globally distributed database service offered by Microsoft Azure. It is a fully managed NoSQL database service with 99.999% availability. It supports multiple database APIs like the SQL API, API for MongoDB, Cassandra API, Table API and Gremlin API. We can build applications by leveraging the client libraries of Azure Cosmos DB by using the language of our choice. It provides indexing out of the box for all our data stored in it to quickly return the results of the queries. We can define our own user defined functions and stored procedures in Azure Cosmos DB. It also encryption at rest for our data along with role-based access control to our data to keep it secure.

    Azure Database for MySQL

    Azure Database for MySQL is a fully managed relational database as a service offered by Microsoft azure. It is based on the database engine of MySQL Community edition. It is a highly available service while being compliant with industry leading compliances to protect data while in transit as well as in rest. It comes up with automatic backups and monitoring capabilities. It helps to work on the business requirement and accelerate our time to market without worrying about underlying infrastructure management.

    Azure Database for MySQL comes up with two deployment models, that is, Single server and Flexible server. With the single server deployment model, we can handle most of our database management functions like patching, backups, and high availability with minimal user configuration. It provides 99.99% availability in a single availability zone. It is an ideal option for cloud native applications which don't require custom MySQL Configurations or more granular control on the database server. With the Flexible server deployment model, we get a more granular control and flexibility to define the configuration settings as per our requirement over the database server. It is an ideal option for applications which requires more control and customizations.

    Azure Database for PostgreSQL

    Azure Database for PostgreSQL is a managed relation database as a service (DBaaS) offered by Microsoft Azure. It is based on the runtime engine of PostgreSQL community edition. It provides built.in high availability and data protection by automatic backups. It provides scaling up or down as per our requirements in a matter of seconds. It provides enterprise grade security to our data while at rest or in transit.

    Azure Database for PostgreSQL comes in three deployment models, that is, Single server, Flexible server and Hyperscale. Flexible server deployment model is still in preview. Single server deployment handles most of the database management functions but with the Flexible server we have power to customize the database configuration as per our needs. Though all the deployment models can be used for application, the Hyperscale model is well suited for multi-tenant applications.

    Azure Synapse Analytics

    Azure Synapse analytics is an enterprise grade analytics service offered by Microsoft Azure which helps us in reducing the time to gather insights from the data stored in various data warehouses and big data systems. It is integrated with various Azure services like Azure Data Lake, Azure Databricks and Azure ML to name a few. It supports multiple languages like python, R, SQL, Java, and Scala which are normally used for analytics purposes.

    Azure Databricks

    Azure Databricks is a data analytics platform created by the joint efforts of Microsoft in collaboration with the team which had created Apache Spark. Azure Databricks is built on top of Apache Spark. It supports easy integration with various Azure services like Azure Blob storage, SQL databases to name a few. It supports multiple languages like Scala, Java, python. With the workspace, it also allows data teams to have real-time collaboration. It can be easily integrated with visualization tools like power BI and Tableau.

    Azure Data Lake

    Azure Data Lake is a highly scalable cloud-based data lake offered by Microsoft Azure as its enterprise big data analytics solutions for the cloud. It is built on top of Azure Blob Storage. We can store data of any type, size with different ingestion speed in Azure Data Lake without transforming them. It is a Hadoop compatible service and is highly cost-effective in nature. It supports an array of Azure services which we can use to perform data ingestion, analytics as well as to build visualizations.

    Azure Data Factory

    Azure Data Factory is a fully managed, cloud-based service offered by Microsoft Azure to perform large scale ELT, ETL, and integration tasks. It provides a code free UI where we can create our data workflow by using various activities to transform and transfer data to desired destinations. It has more than 90 built.in connectors to connect with various sources like Google Big Query and Amazon RedShift to name a few. We can use Azure Data Factory to perform various kinds of tasks which include data integration, transformation, and transfer tasks. For example, we can use azure data factory to migrate data from an AWS S3 bucket to Azure Blob Storage.

    Azure Storage Account

    Azure Storage Account is a suite of cloud storage services comprising core storage services. Storage account consists of five core storage services namely Azure Blob, Queue, Files, Disks, and table storage. We can leverage the power of an azure storage account to meet most of our storage needs without worrying about the underlying hardware devices. We can store data of various formats like images, videos, or logs in a storage

    Enjoying the preview?
    Page 1 of 1