SQL Server 2019 AlwaysOn: Supporting 24x7 Applications with Continuous Uptime
()
About this ebook
Get a fast start to using AlwaysOn, the SQL Server solution to high-availability and disaster recovery. This third edition is newly-updated to cover the 2019 editions of both SQL Server and Windows Server and includes strong coverage of implementing AlwaysOn Availability Groups on both Windows and Linux operating systems. The book provides a solid and accurate understanding of how to implement systems requiring consistent and continuous uptime, as well as how to troubleshoot those systems in order to keep them running and reliable. This edition is updated to account for all new major functionality and also includes coverage of implementing atypical configurations, such as clusterless and domain-independent Availability Groups, distributed Availability Groups, and implementing Availability Groups on Azure.
The book begins with an introduction to high-availability and disaster recovery concepts such as Recovery Point Objectives (RPOs), Recovery Time Objectives (RTOs), availability levels, and the cost of downtime. You’ll then move into detailed coverage of implementing and configuring the AlwaysOn feature set in order to meet the business objectives set by your organization. Content includes coverage on implementing clusters, building AlwaysOn failover clustered instances, and configuring AlwaysOn Availability Groups.
SQL Server 2019 AlwaysOn is chock full of real-world advice on how to build and configure the most appropriate topology to meet the high-availability and disaster recovery requirements you are faced with, as well as how to use AlwaysOn Availability Groups to scale-out read-only workloads. This is a practical and hands-on book to get you started quickly in using one of the most talked-about SQL Server feature sets.
What You Will Learn
- Understand high availability and disaster recovery in SQL Server 2019
- Build and configure a Windows Cluster in Windows Server 2019
- Create and configure an AlwaysOn failover clustered instance
- Implement AlwaysOn Availability Groups and appropriately configure them
- Implement AlwaysOn Availability Groups on Linux servers
- Configure Availability Groups on Azure IaaS
- Administer AlwaysOn technologies post implementation
- Understand typical configurations, such as clusterless and distributed Availability Groups
Who This Book Is For
For Microsoft SQL Server database administrators who interested in growing their knowledge and skills in SQL Server’s high-availability and disaster recovery feature set.
Read more from Peter A. Carter
Securing SQL Server: DBAs Defending the Database Rating: 0 out of 5 stars0 ratingsSQL Server Advanced Data Types: JSON, XML, and Beyond Rating: 0 out of 5 stars0 ratingsPro SQL Server 2019 Administration: A Guide for the Modern DBA Rating: 0 out of 5 stars0 ratings
Related to SQL Server 2019 AlwaysOn
Related ebooks
Query Store for SQL Server 2019: Identify and Fix Poorly Performing Queries Rating: 0 out of 5 stars0 ratingsThe Real MCTS SQL Server 2008 Exam 70-432 Prep Kit: Database Implementation and Maintenance Rating: 4 out of 5 stars4/5Introducing Azure Kubernetes Service: A Practical Guide to Container Orchestration Rating: 0 out of 5 stars0 ratingsCyber Security on Azure: An IT Professional’s Guide to Microsoft Azure Security Rating: 0 out of 5 stars0 ratingsPractical Azure SQL Database for Modern Developers: Building Applications in the Microsoft Cloud Rating: 0 out of 5 stars0 ratingsDynamic SQL: Applications, Performance, and Security in Microsoft SQL Server Rating: 0 out of 5 stars0 ratingsAzure DevOps for Web Developers: Streamlined Application Development Using Azure DevOps Features Rating: 0 out of 5 stars0 ratingsGetting Started with SQL Server 2014 Administration Rating: 0 out of 5 stars0 ratingsEffective MySQL Optimizing SQL Statements Rating: 3 out of 5 stars3/5SQL Server 2017 Query Performance Tuning: Troubleshoot and Optimize Query Performance Rating: 0 out of 5 stars0 ratingsBuilding Microservices Applications on Microsoft Azure: Designing, Developing, Deploying, and Monitoring Rating: 0 out of 5 stars0 ratingsBeginning Apache Spark Using Azure Databricks: Unleashing Large Cluster Analytics in the Cloud Rating: 0 out of 5 stars0 ratingsPowerCLI Essentials Rating: 0 out of 5 stars0 ratingsInstant Citrix XenApp Rating: 5 out of 5 stars5/5Microsoft Exchange Server 2013 High Availability Rating: 0 out of 5 stars0 ratingsSecuring Windows Server 2008: Prevent Attacks from Outside and Inside Your Organization Rating: 0 out of 5 stars0 ratingsEssential ASP.NET Web Forms Development: Full Stack Programming with C#, SQL, Ajax, and JavaScript Rating: 0 out of 5 stars0 ratingsSQL Server 2008 Administration in Action Rating: 0 out of 5 stars0 ratingsUnderstanding Azure Monitoring: Includes IaaS and PaaS Scenarios Rating: 0 out of 5 stars0 ratingsUnderstanding Azure Data Factory: Operationalizing Big Data and Advanced Analytics Solutions Rating: 0 out of 5 stars0 ratingsA Class ePub Rating: 0 out of 5 stars0 ratingsPro SQL Server Internals Rating: 0 out of 5 stars0 ratingsSQL Server DMVs in Action: Better Queries with Dynamic Management Views Rating: 0 out of 5 stars0 ratingsBeginning SQL Server Reporting Services Rating: 0 out of 5 stars0 ratingsMySQL 8 Query Performance Tuning: A Systematic Method for Improving Execution Speeds Rating: 0 out of 5 stars0 ratingsDemystifying the Azure Well-Architected Framework: Guiding Principles and Design Best Practices for Azure Workloads Rating: 0 out of 5 stars0 ratingsAzure SQL Revealed: A Guide to the Cloud for SQL Server Professionals Rating: 0 out of 5 stars0 ratingsMicrosoft Sql Server Management Studio A Clear and Concise Reference Rating: 0 out of 5 stars0 ratingsBeginning PBR Texturing: Learn Physically Based Rendering with Allegorithmic’s Substance Painter Rating: 0 out of 5 stars0 ratingsMs Sql Server Management Studio Third Edition Rating: 0 out of 5 stars0 ratings
Databases For You
Grokking Algorithms: An illustrated guide for programmers and other curious people Rating: 4 out of 5 stars4/5Excel 2021 Rating: 4 out of 5 stars4/5SQL Clearly Explained Rating: 5 out of 5 stars5/5SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL Rating: 4 out of 5 stars4/5Visualizing Graph Data Rating: 0 out of 5 stars0 ratingsData Science Strategy For Dummies Rating: 0 out of 5 stars0 ratingsPython Projects for Everyone Rating: 0 out of 5 stars0 ratingsData Management for Researchers: Organize, maintain and share your data for research success Rating: 0 out of 5 stars0 ratingsPractical Data Analysis Rating: 4 out of 5 stars4/5Access 2019 For Dummies Rating: 0 out of 5 stars0 ratingsLearn SQL in 24 Hours Rating: 5 out of 5 stars5/5Building a Scalable Data Warehouse with Data Vault 2.0 Rating: 4 out of 5 stars4/5Business Intelligence Strategy and Big Data Analytics: A General Management Perspective Rating: 5 out of 5 stars5/5Behind Every Good Decision: How Anyone Can Use Business Analytics to Turn Data into Profitable Insight Rating: 5 out of 5 stars5/5SQL Server: Tips and Tricks - 1 Rating: 5 out of 5 stars5/5Serverless Architectures on AWS, Second Edition Rating: 5 out of 5 stars5/5Jump Start MySQL: Master the Database That Powers the Web Rating: 0 out of 5 stars0 ratingsGetting Started with SQL Server 2014 Administration Rating: 0 out of 5 stars0 ratingsCodeless Data Structures and Algorithms: Learn DSA Without Writing a Single Line of Code Rating: 0 out of 5 stars0 ratingsA Concise Guide to Object Orientated Programming Rating: 0 out of 5 stars0 ratingsData Governance: How to Design, Deploy and Sustain an Effective Data Governance Program Rating: 4 out of 5 stars4/5100+ SQL Queries T-SQL for Microsoft SQL Server Rating: 4 out of 5 stars4/5Raspberry Pi Server Essentials Rating: 0 out of 5 stars0 ratingsBlockchain Basics: A Non-Technical Introduction in 25 Steps Rating: 5 out of 5 stars5/5Advanced Analytics in Power BI with R and Python: Ingesting, Transforming, Visualizing Rating: 0 out of 5 stars0 ratingsCompTIA DataSys+ Study Guide: Exam DS0-001 Rating: 0 out of 5 stars0 ratingsAccess 2010 All-in-One For Dummies Rating: 4 out of 5 stars4/5Learn SQL Server Administration in a Month of Lunches Rating: 3 out of 5 stars3/5Learning PostgreSQL Rating: 1 out of 5 stars1/5
Reviews for SQL Server 2019 AlwaysOn
0 ratings0 reviews
Book preview
SQL Server 2019 AlwaysOn - Peter A. Carter
© Peter A. Carter 2020
P. A. CarterSQL Server 2019 AlwaysOnhttps://doi.org/10.1007/978-1-4842-6479-9_1
1. High Availability and Disaster Recovery Concepts
Peter A. Carter¹
(1)
Botley, UK
In today’s 24×7 environments that are running mission-critical applications, businesses rely heavily on the availability of their data. Although servers and their software are generally reliable, there is always the risk of a hardware failure or a software bug, each of which could bring a server down. To mitigate these risks, business-critical applications often rely on redundant hardware to provide fault tolerance. If the primary system fails, then the application can automatically fail over to the redundant system. This is the underlying principle of high availability (HA).
Even with the implementation of HA technologies, there is always a small risk of an event that causes the application to become unavailable. This could be due to a major incident, such as the loss of a data center, due to a natural disaster, or due to an act of terrorism. It could also be caused by data corruption or human error, resulting in the application’s data becoming lost or damaged beyond repair.
In these situations, some applications may rely on restoring the latest backup to recover as much data as possible. However, more critical applications may require a redundant server to hold a synchronized copy of the data in a secondary location. This is the underpinning concept of disaster recovery (DR). This chapter discusses the concepts behind HA and DR.
Level of Availability
The amount of time that a solution is available to end users is known as the level of availability, or uptime. To provide a true picture of uptime, a company should measure the availability of a solution from a user’s desktop. In other words, even if your SQL Server has been running uninterrupted for over a month, users may still experience outages to their solution caused by other factors. These factors can include network outages or an application server failure.
In some instances, however, you have no choice but to measure the level of availability at the SQL Server level. This may be because you lack holistic monitoring tools within the enterprise. Most often, however, the requirement to measure the level of availability at the instance level is political, as opposed to technical. In the IT industry, it has become a trend to outsource the management of data centers to third-party providers. In such cases, the provider responsible for managing the SQL servers may not necessarily be the provider responsible for the network or application servers. In this scenario, you need to monitor uptime at the SQL Server level to accurately judge the performance of the service provider.
The level of availability is measured as a percentage of the time that the application or server is available. Companies often strive to achieve 99%, 99.9%, 99.99%, or 99.999% availability. As a result, the level of availability is often referred to in 9s. For example, five 9s of availability means 99.999% uptime and three 9s means 99.9% uptime.
Table 1-1 details the amount of acceptable downtime per week, per month, and per year for each level of availability.
Table 1-1
Levels of Availability
All values are rounded down to the nearest second.
To calculate other levels of availability, you can use the script in Listing 1-1. Before running this script, replace the value of @Uptime to represent the level of uptime that you wish to calculate. You should also replace the value of @UptimeInterval to reflect uptime per week, month, or year.
DECLARE @Uptime DECIMAL(5,3) ;
--Specify the uptime level to calculate
SET @Uptime = 99.9 ;
DECLARE @UptimeInterval VARCHAR(5) ;
--Specify WEEK, MONTH, or YEAR
SET @UptimeInterval = 'YEAR' ;
DECLARE @SecondsPerInterval FLOAT ;
--Calculate seconds per interval
SET @SecondsPerInterval =
(
SELECT CASE
WHEN @UptimeInterval = 'YEAR'
THEN 60*60*24*365.243
WHEN @UptimeInterval = 'MONTH'
THEN 60*60*24*30.437
WHEN @UptimeInterval = 'WEEK'
THEN 60*60*24*7
END
) ;
DECLARE @UptimeSeconds DECIMAL(12,4) ;
--Calculate uptime
SET @UptimeSeconds = @SecondsPerInterval * (100-@Uptime) / 100 ;
--Format results
SELECT
CONVERT(VARCHAR(12), FLOOR(@UptimeSeconds /60/60/24)) + ' Day(s), '
+ CONVERT(VARCHAR(12), FLOOR(@UptimeSeconds /60/60 % 24)) + ' Hour(s), '
+ CONVERT(VARCHAR(12), FLOOR(@UptimeSeconds /60 % 60)) + ' Minute(s), '
+ CONVERT(VARCHAR(12), FLOOR(@UptimeSeconds % 60)) + ' Second(s).' ;
Listing 1-1
Calculating the Level of Availability
Actual Availability
Now that we understand how to calculate the level of availability required by an application, we should also understand how to calculate the actual availability of an application. We can do this by discovering the MTBF (Mean Time Between Failures) and MTTR (Mean Time to Recover) metrics.
The MTBF metric describes the average length of time between failures. For example, imagine that we were reviewing the service logs for the past week and discovered that the Foo application has suffered three outages. There are 168 hours in a week and there have been three failures. We simply need to divide the number of hours within our time period by the number of failures. This will produce our MTBF. In this case, we have an MTBF of 56 hours.
MTTR can actually have two different meanings: Mean Time to Recover or Mean Time to Repair. When there is no HA or DR in place, then the MTTR metric describes the average length of time it takes for something that is broken to be repaired. This could potentially be an extended period of time, as we may need to wait for a service engineer to replace faulty hardware. When thinking about HA and DR, however, we use MTTR metric to mean Mean Time to Recover to record the duration of the outage. For example, if we have a three-node cluster and one of the nodes experiences a hardware failure, of course we need to fix this server, but from an application downtime perspective, we will only have an outage of a matter of seconds or minutes, while the service fails over and we still have resilience, meaning that the application’s nonfunctional requirements are not impacted. Therefore, in this example, I will assume that MTTR is referring to Mean Time to Recover.
We can calculate the MTTR by taking a sum of the total downtime duration within our period and dividing it by the number of failures. In this case, during our 168-hour period, we have had three failures and the total duration of downtime has been 12 minutes. Therefore, the MTTR for our Foo application is 4 minutes.
Now that we know our application’s MTBF and MTTR, we can use these metrics to calculate the actual availability of our application. The formula for this is (MTBF/(MTBF+MTTR))*100. So in this case, we first need to convert our MTTR value to hours, so we have hours as our consistent unit. Four minutes is 0.06667 hours. Therefore, our calculation would be (56/(56+0.6667))*100. This makes our actual application availability 99.8811% .
Service-Level Agreements and Service-Level Objectives
When a third-party provider is responsible for managing servers, the contract usually includes service-level agreements (SLAs). These SLAs define many parameters, including how much downtime is acceptable, the maximum length of time a server can be down in the event of failure, and how much data loss is acceptable if failure occurs. Normally, there are financial penalties for the provider if these SLAs are not met.
In the event that servers are managed in-house, DBAs still have the concept of customers. These are usually the end users of the application, with the primary contact being the business owner. An application’s business owner is the stakeholder within the business who commissioned the application and who is responsible for signing off on funding enhancements, among other things.
In an in-house scenario, it is still possible to define SLAs, and in such a case, the IT Infrastructure or Platform departments may be liable for charge-back to the business teams if these SLAs are not being met. However, in internal scenarios, it is much more common for IT departments to negotiate service-level objectives (SLOs) with the business teams, as opposed to SLAs. SLOs are very similar in nature to SLAs, but their use implies that the business do not impose financial penalties on the IT department in the event that they are not met.
Proactive Maintenance
It is important to remember that downtime is not only caused by failure but also by proactive maintenance . For example, if you need to patch the operating system, or SQL Server itself, with the latest service pack, then you must have some downtime during installation.
Depending on the upgrade you are applying, the downtime in such a scenario could be substantial – several hours for a stand-alone server. In this situation, high availability is essential for many business-critical applications – not to protect against unplanned downtime, but to avoid prolonged outages during planned maintenance.
Recovery Point Objective and Recovery Time Objective
The recovery point objective (RPO) of an application indicates how much data loss is acceptable in the event of a failure. For a data warehouse that supports a reporting application, for example, this may be an extended period, such as 24 hours, given that it may only be updated once per day by an ETL process and all other activity is read-only reporting. For highly transactional systems, however, such as an OLTP database supporting trading platforms or web applications, the RPO will be zero. An RPO of zero means that no data loss is acceptable.
Applications may have different RPOs for high availability and for disaster recovery. For example, for reasons of cost or application performance, an RPO of zero may be required for a failover within the site. If the same application fails over to a DR data center, however, five or ten minutes of data loss may be acceptable. This is because of technology differences used to implement intra-site availability and inter-site recovery.
The recovery time objective (RTO) for an application specifies the maximum amount of time an application can be down before recovery is complete and users can reconnect. When calculating the achievable RTO for an application, you need to consider many aspects. For example, it may take less than a minute for a cluster to fail over from one node to another and for the SQL Server service to come back up; however, it may take far longer for the databases to recover. The time it takes for databases to recover depends on many factors, including the size of the databases, the quantity of databases within an instance, and how many transactions were in-flight when the failover occurred. This is because all noncommitted transactions need to be rolled back.
Just like RPO, it is common for there to be different RTOs depending on whether you have an intra-site or inter-site failover. Again, this is primarily due to differences in technologies, but it also factors in the amount of time you need to bring up the entire estate in the DR data center if the primary data center is lost.
The RPO and RTO of an application may also vary in the event of data corruption. Depending on the nature of the corruption and the HA/DR technologies that have been implemented, data corruption may result in you needing to restore a database from a backup.
If you must restore a database, the worst-case scenario is that the achievable point of recovery may be the time of the last backup. This means that you must factor a hard business requirement for a specific RPO into your backup strategy. If only part of the database is corrupt, however, you may be able to salvage some data from the live database and restore only the corrupt data from the restored database.
Data corruption is also likely to have an impact on the RTO. One of the biggest influencing factors is if backups are stored locally on the server, or if you need to retrieve them from tape. Retrieving backup files from tape, or even from off-site locations, is likely to add significant time to the recovery process.
Note
Backups directly to tape from SQL Server are deprecated. When this section refers to retrieving backups from tape, it is assuming a tape drive as the target for an enterprise backup solution that your database backups have been offloaded too.
Another influencing factor is what caused the corruption. If it is caused by a faulty IO subsystem, then you may need to factor in time for the Windows administrators to run the check disk command (CHKDSK) against the volume and potentially more time for disks to be replaced. If the corruption is caused by a user accidently truncating a table or deleting a data file, however, then this is not of concern.
Cost of Downtime
If you ask any business owners how much downtime is acceptable for their applications and how much data loss is acceptable, the answers invariably come back as zero and zero, respectively. Of course, it is never possible to guarantee zero downtime, and once you begin to explain the costs associated with the different levels of availability, it starts to get easier to negotiate a mutually acceptable level of service.
The key factor in deciding how many 9s you should try to achieve is the cost of downtime. Two categories of cost are associated with downtime: tangible costs and intangible costs. Tangible costs are usually fairly straightforward to calculate. Let’s use a sales application as an example. In this case, the most obvious tangible cost is lost revenue because the sales staff cannot take orders. Intangible costs are more difficult to quantify but can be far more expensive. For example, if a customer is unable to place an order with your company, they may place their order with a rival company and never return. Other intangible costs can include loss of staff morale, which leads to higher staff turnover, or even loss of company reputation. Because intangible costs, by their very nature, can only be estimated, the industry rule of thumb is to multiply the tangible costs by three and use this figure to represent your intangible costs.
Once you have an hourly figure for the total cost of downtime for your application, you can scale this figure out, across the predicted life cycle of your application, and compare the costs of implementing different availability levels. For example, imagine that you calculate that your total cost of downtime is $2,000/hour and the predicted life cycle of your application is three years. Table 1-2 illustrates the cost of downtime for your application, comparing the costs that you have calculated for implementing each level of availability, after you have factored in hardware, licenses, power, cabling, additional storage, and additional supporting equipment, such as new racks, administrative costs, and so on. This is known as the total cost of ownership (TCO) of a solution.
Table 1-2
Cost