SQL Server High Availability – DBCC 2021 – Part 1

Any one of my customers has asked, or should have asked, “do I need a certain level of high availability for my SQL server or one of my SQL servers?” Regrettably, these deliberations often come to a halt when it comes to the budget… Naturally, such HA-solutions are more expensive, and depending on the solution chosen and the environment’s configuration, appropriate licenses must be made available. However, if one calculates the costs that will occur in the “worst-case” scenario during or as a result of an outage, it should be clear that these costs are reversible.

I had the opportunity to give a speech about it last week at the Data Blaster Community Conference 2021 (thank you, SQLPASS Deutschland), and here is a summary of what I spoke about.

I’ve shown right from the start how high such costs can be, and I’ve done so with the help of a Percona online calculator. The SQL Server goes down for 1-2 hours, 5 employees are assigned to the Restore, each earning $100,000 per year, and 100 additional employees cannot work correctly, earning an average of $50,000 per year. In the following screenshot, you can see the numbers that can occur during an emergency, during recovery, or as a result of the trouble.

Why does SQLServer High Verfuegbarkeit matter? - Percona Calculator

With total costs of about 8.000.000 Euros for a 2 hour SQL Server outage, I’d consider spending an additional 250.000 Euros for the installation to add a second server and equip it with the required licenses. But, as I’ll explain in the following sections, such high availability can be seen in the SQL Server setup dialog, but first, let’s clear up the terminology… 😉

Disaster Recovery vs. High Availability

On wikipedia.de, the term “high availability” is described as follows:

High availability (HA) is a characteristic of a system that aims to ensure an agreed level of operational performance, usually uptime, for a higher than standard period.

https://en.wikipedia.org/wiki/High_availability

In comparison, the concept of disaster recovery.

Disaster Recovery assumes that the primary site is not recoverable (at least for some time) and represents a process of rebuilding data and services to a secondary survived site, which is opposite to restoring back to its original place.

https://en.wikipedia.org/wiki/Disaster_recovery

When it comes to high availability, it’s all about keeping the time of an outage or downtime as low as possible so that all connected systems can keep running as quickly as possible without data loss or manual intervention. Naturally, before the installation, you should think about the process design, the goals to be achieved, and the need.

  • What is the purpose of the business?
  • What would it cost the company if they are unable to work?
  • Is there, on the whole, a compromise between technology and business? Does it really have to be the minimum downtime (<10 seconds), or is it possible that maximum downtimes of less than a minute are sufficient?
  • What is it that technology can accomplish?
  • Knowledge, processes, and all those must all be followed to the letter of the law?

And only after these questions have been answered can you think about the actual solution for high availability and how it can/should be implemented.

The AlwaysOn Failover-Cluster (FCI)

High Availability - SQL Server - AlwaysOn Failover Cluster

We now come to one of the possible solutions for achieving high availability of SQL Servers inside your own datacenter, the Windows Failover Cluster (basically identical to a Linux Failover Cluster, but you should refer to the documentation of the respective distribution and cluster software).

Initially, a Windows Failover Cluster is created from two Windows Servers with the additional installed Failover-Cluster feature, or, to put it another way, both servers are logically (and in AD) connected. Both are familiar with one another and are now aware that they belong together, exchanging several relevant health-status information. This Windows Failover Cluster is given its own IP address and DNS name and a Cluster-Named-Object (CNO) in Active Directory, which later “manages” the cluster.

I can’t say anything about the required storage requirements because it varies depending on the environment… Whether it’s a SAN, NAS, NFS, HCI, or something else, it’s always about a storage system that can give both servers access. Microsoft adds the following:

Contrary to the availability group, an FCI must use shared storage between all nodes of the FCI for database and log storage. The shared storage can be in the form of WSFC cluster disks, disks on a SAN, Storage Spaces Direct (S2D), or file shares on an SMB. This way, all nodes in the FCI have the same view of instance data whenever a failover occurs. This does mean, however, that the shared storage has the potential of being the single point of failure, and FCI depends on the underlying storage solution to ensure data protection.

https://docs.microsoft.com/en-us/sql/sql-server/failover-clusters/windows/always-on-failover-cluster-instances-sql-server?view=sql-server-ver15&WT.mc_id=DP-MVP-5002576#FCIelements

Now you can start installing SQL Servers; on Node 1, you’ll set up the foundation for a SQL Server Failover Cluster (Installation of a Failover-Cluster), while on Node 2, you’ll only add a node to an existing SQL Server Cluster.

Depending on how this cluster is set up – whether it’s only one instance or many – SQL Server licenses must be acquired. You can choose between two operating modes: active/passive or active/active. When using active/passive, SQL Server instances can only run on one cluster node, but they can run on both nodes for a limited time in the event of a failure. When using active/active, all nodes can be used equally and for an unlimited amount of time, which is especially useful when dealing with several instances because it allows some kind of load-balancing.

If a node in the cluster breaks, the second node can no longer communicate with the other side and tries to get the broken services into his side as quickly as possible and start them there.

Unfortunately, a hybrid scenario is not possible due to the lack of scalable resources on-premise and in the cloud. However, a Failover-Cluster can be built in the Azure Cloud as an alternative. To do so, you’ll need a Proximity Placement Group, Managed Disks with the “Shared Storage” feature enabled, and the two virtual machines described above, whose installation and operation are essentially identical.

High Availability - Azure SQL Server Failover Cluster - DBCC2021

More on SQL Server high availability will be covered in a next blog post.

Azure SQL VM – no possibility to configure the disk usage

A customer had asked me the last few days to help him optimize a SQL server on an Azure VM. The customer always had slight performance problems with this SQL server, so he first asked the application manufacturer or supervisor, but they didn’t want to or could help. The only answer they received was that a disk latency of 5 ms is recommended for the DATA disks of the SQL Server… Officially, in Azure, this can only be achieved with ULTRA disks. In this blog post, I don’t want to detail how this might be fulfilled. Just show you an error that I noticed during the analysis.

Starting with a small intro, I talked to the customer and showed him my calculation of what it would cost to equip the server with 2 TB ultra disks and whether it might not be more reasonable and adequate to put the SQL server through its paces and optimize it (also the application or its T-SQL) and in the last step to go to ultra disks if necessary.

The customer agreed, and we have taken the SQL Server under review. Even if they are using a marketplace image from Microsoft (thanks to the product team!), which are already configured very well, there might be some more screws to adjust.

Azure SQL VM and the data disks

If you follow the recommendations for Azure SQL in a virtual machine, you can read the recommendations there:

Stripe multiple Azure data disks using Storage Spaces to increase I/O bandwidth up to the target virtual machine’s IOPS and throughput limits.

https://docs.microsoft.com/en-us/azure/azure-sql/virtual-machines/windows/performance-guidelines-best-practices-storage

In a first effort, we replaced the 1TB disk with several small disks to increase the maximum throughput from 200MB to 625MB. So far, this has been accomplished without any downtime, the customer has been delighted so far. But in this activity, I also came across a message in the Azure portal on this virtual machine or the SQL Server configuration overview, which confused me a little; Google wasn’t a real help.

In the Azure portal, there is the possibility to roll out several extensions for all SQL servers within an Azure VM, which are intended to support the admin-team in managing the VM or the SQL server via Powershell, Azure CLI, or the portal and provide an overview on data from the logs and metrics. Microsoft uses the “SQL Server IaaS Extension Query Service” or “SQL Server IaaS Agent extension” for this purpose. Both collecting data from the virtual machine and the SQL Server and display it in a user-friendly way in the portal. Unfortunately, that was the problem with this customer VM, and these metrics were not 100% available, as the following screenshot shows.

Azure SQL VM - No data in the portal for disk usage

No data on the storage could be determined in the further “Configure” overview, although all extensions were rolled out on the server and did not display any data even after one or more reboots. So I went looking for the cause…

Extension-Log and fn_trace_gettable

As the Microsoft documentation states, there are different modes for different requirements, and of course, you can also notice a local error log, which you can use for troubleshooting. This error log is an XE file and a trace file, neither of which can be read with Notepad only with SQL Server Management Studio… so the easiest way to access the trace file with a T-SQL system function to determine that the system user apparently has no or too few authorisations. Usually, the extension is located directly on the C drive in the directory C:\SQLIaaSExtension, so you have to use this path accordingly in the SQL query.

SELECT * FROM fn_trace_gettable('C:\SQLIaaSExtension\Log\log.trc', default);  
GO  

By executing this T-SQL statement to examine the extension log file, I received the following line, among other things:

alter server role [sysadmin] add member [NT Service\SQLIaaSExtensionQuery]

When seeking for the user, I discovered that the customer had deleted this user – for whatever reason (later I found out that this apparently did not happen “by accident”, on other SQL servers in Azure, the user is also missing)

USE [master]
GO

/****** Object:  Login [NT SERVICE\SqlIaaSExtensionQuery]    Script Date: 5/10/2021 3:35:08 PM ******/
DROP LOGIN [NT SERVICE\SqlIaaSExtensionQuery]
GO

/****** Object:  Login [NT SERVICE\SqlIaaSExtensionQuery]    Script Date: 5/10/2021 3:35:08 PM ******/
CREATE LOGIN [NT SERVICE\SqlIaaSExtensionQuery] FROM WINDOWS WITH DEFAULT_DATABASE=[master], DEFAULT_LANGUAGE=[us_english]
GO

ALTER SERVER ROLE [sysadmin] ADD MEMBER [NT SERVICE\SqlIaaSExtensionQuery]
GO

As a test, I recreated this user, authorized it, and checked in the Azure portal whether there was a change in the representations… Yes, the SQL Server IaaS Agent Extension can now access the SQL Server and read out the necessary and relevant data.

Azure SQL VM - Data in the portal for disk usage

Azure SQL – Change your Skills – DataSaturday #6 Malta

Yesterday I was able to give my speech as part of DataSaturday #6 in Malta. It was about changes in the life of a DBA when he is confronted with the fact that his SQL servers and/or databases should be migrated to the Microsoft Cloud Azure. With this session, I wanted to minimize the fears and show that Azure SQL is “only” a SQL server or an SQL database.

In contrast to all other events, Dennes (and his team) had come up with something new in the process. The session did not have 60 minutes but 90 minutes. The first 15 minutes were used as an interview or introduction, the host introduced the speaker, a small video was produced for this, and individual questions were prepared on the respective curriculum vitae of the speaker. Then the speech took place over a full 60 minutes, followed by another 15 minutes for Q&A from the participants, or the host could ask the relevant questions himself. A successful solution led the talk more like a conversation and reduced the “pressure” a little, and I (hopefully also as a participant) felt more part of it.

My host was Deepthi Goguri; I had a lot of fun holding this session with her. Even if this wasn’t my first lecture in English, I am still a little more nervous … I find the right words at the right moment, I pronounce everything correctly … Thanks to the calm and friendly nature of Deepthi, I became more relaxed and could prepare myself a little better for the session. Didn’t help to get over the fact that I wanted to tell and show more than I had time; I have to work on it urgently! Two years ago, I talked to Rob Sewell that I primarily wanted to leave my comfort zone => not only to give lectures in German but start with English and whether he could support me in my start – as a mentor, so to speak. And then Corona came, and everything turned out differently … I just jumped and dared … which brings us back to the topic. 😉

It is just a SQLServer - dont be afraid of moving you databases into Azure

Just dare and try

Deepthi asked me yesterday what I would recommend to the participants if the client or employer wanted to start migrating to the cloud. I can only recommend anyone interested not to be afraid, just because the SQL Server is no longer in your own data center. Still, now in a Microsoft data center, this does not mean that Microsoft will take over all the work and you – as the database administrator – has nothing more to do anymore… ok, you have to learn something new and think a little bit different, but it’s still a SQL Server with its databases!
The first steps could be the following:

  • create an Azure Free Account
  • just deploy a free Linux VM and try out how to install a SQL server there 
  • or an Azure SQL Database to see which steps you have to take to achieve this 
  • and of course, all the great resources of Microsoft Learn – Learning Paths and Modules with lots of content and exercises
    Azure Fundamentals – AZ-900
    Azure Data Fundamentals – DP-900
  • If you have additional questions about the particular topics, you can also read the handy documentation, which often helps me better understand specific issues.

Various learning guides for Azure SQL available

Microsoft – mainly in Anna Hoffman and Bob Ward – also provides many interactive learning materials that can be formed in workshops, labs, or YouTube videos! There is material for several weeks of continuous learning about the SQL Server and/or Azure SQL … regardless of whether Azure SQL Database or the Managed Instance or as a SQL Server VM in Azure. Here you just have to start and try it out, don’t be afraid … it’s not magic. It’s actually quite simple because you can put everything together yourself as you need it. 😉

Microsoft Learn: Azure SQL fundamentals learning path
aka.ms/azuresqlfundamentals
Select the Azure SQL Workshop
aka.ms/sqlworkshops
How to choose tool
aka.ms/chooseazuresql
Azure SQL documentation
aka.ms/azuresqldocs
More videos from our team
aka.ms/dataexposed

Change your Skills - Learning - Ressourcen-Overview

Here are my slides from this talk:
Azure SQL – Change your skills to become a cloud DBA

And if you want to read something about Azure SQL Database, you can, of course either do so in my blog here or in the relatively new blog post on Azure SQL by Deepthi. 😉

Aha effect when setting SQL instance parameters with dbatools

In the last week, I installed several SQL Servers for one customer and had to install and configure them all identically. What is closer here than to do this with a Powershell script, so I made the “effort” and the individual steps of the SQL Server configuration with dbatools to realize. I do not want to go into all the steps of the configuration in this post, but show only a part of it, which brought me a specific “aha effect”.

As part of the SQL Server configuration, the default values of “MaxDop” and “Cost Threshold for Parallelism” should be adjusted. Setting MaxDoP using Powershell or dbatools is relatively easy as there is a separate command for this, but for the “Cost Threshold for Parallelism” dbatools has a “work-around”. Unfortunately, there is not (yet) direct command for this.

Set-DbaMaxDop -SqlInstance sql2008, sql2012

This command-line sets the value of Max DOP (Maximum Degree of Parallelism) to the recommended value for SQL Server instances SQL2008 and SQL2012. Always in connection with this configuration parameter is the “Cost Threshold”, which still set to 5 by default.

All SQL Server instances at once …

To configure all SQL Server instances relatively quickly and easily, I have the command “Get-DbaRegisteredServer” (as an alias of Get-DbaRegServer) made. As preparation for this, I have created the necessary servers (here 2 servers with 3 instances each) as “Registered Server” on all servers in the SQL Server Management Studio and was then able to access it with Powershell aka dbatools.

According to dbatools documentation, this command retrieves a list of SQL Server objects stored in locally registered groups in both the Azure Data Studio and the central management server.

originell from dbatools.io - Many thanks to Chrissy

With this command and the possibility to pass the objects from the result object as a pipeline, you can do beautiful things like configuring the value for “MaxDoP” on all servers or instances in a command-line…

Get-DbaRegisteredServer | Set-DbaMaxDop

However, now to my aha effect with another command line 😉

Cost Threshold For Parallelism

As indicated above, the adjustment is only a work-around with dbatools and not with a dbatools command, this I use now “Set-DbaSpConfigure”. Of course, I could also configure the “MaxDoP” with this command, but then I have to provide for the previous determination and calculation of the respective value for MaxDoP, so determine the existing cores, these match the Best Practice and the value of one Pass variable to the set command. I set these values to 40 in 98% of all instances (unless the customer or the application wants something else), so I do not need any logic here.

Following my above logic or the documentation, I tried it with the following command line:

Get-DbaRegisteredServer | Set-DbaSpConfigure -Name 'CostThresholdForParallelism' -Value 40

This command brought me to an (at first sight) incomprehensible error (even my attempt to pass the value as a string was not successful):

WARNING: [13:02:23][Set-DbaSpConfigure] Value out of range for Server1\Instanz1 ( <-> )
WARNING: [13:02:23][Set-DbaSpConfigure] Value out of range for Server2\Instanz1 ( <-> )
WARNING: [13:02:23][Set-DbaSpConfigure] Value out of range for Server1\Instanz2 ( <-> )
WARNING: [13:02:23][Set-DbaSpConfigure] Value out of range for Server2\Instanz2 ( <-> )
WARNING: [13:02:23][Set-DbaSpConfigure] Value out of range for Server1\Instanz3 ( <-> )
WARNING: [13:02:23][Set-DbaSpConfigure] Value out of range for Server2\Instanz3 ( <-> )

Unfortunately, I have not found the reason for this, and maybe someone else can bring me closer to the phenomenon… maybe this is by design or even a “bug”…

However, I’ve been so successful with pipelining before that I’ve used that too… so let’s first find out all the SQL instances, then find the current parameter for the “Cost Threshold For Parallelism” on those instances, and then set it to the new value 40.

Get-DbaRegisteredServer | Get-DbaSpConfigure -Name 'CostThresholdForParallelism' | Set-DbaSpConfigure -Value 40
ComputerName  : Server1
InstanceName  : Instanz1
SqlInstance   : Server1\Instanz1
ConfigName    : CostThresholdForParallelism
PreviousValue : 5
NewValue      : 40

ComputerName  : Server2
InstanceName  : Instanz1
SqlInstance   : Server2\Instanz1
ConfigName    : CostThresholdForParallelism
PreviousValue : 5
NewValue      : 40

ComputerName  : Server1
InstanceName  : Instanz2
SqlInstance   : Server1\Instanz2
ConfigName    : CostThresholdForParallelism
PreviousValue : 5
NewValue      : 40

ComputerName  : Server2
InstanceName  : Instanz2
SqlInstance   : Server2\Instanz2
ConfigName    : CostThresholdForParallelism
PreviousValue : 5
NewValue      : 40

ComputerName  : Server1
InstanceName  : Instanz3
SqlInstance   : Server1\Instanz3
ConfigName    : CostThresholdForParallelism
PreviousValue : 5
NewValue      : 40

ComputerName  : Server2
InstanceName  : Instanz3
SqlInstance   : Server2\Instanz3
ConfigName    : CostThresholdForParallelism
PreviousValue : 5
NewValue      : 40

Also, already I have found out something great for myself and am more enriched by experience in dealing with dbatools!

I love this Powershell module, which allows me to customize, optimize, and automate many (almost anything!) Things around and around SQL Server. I like to use it (as you can see my other blog posts) and meanwhile with all my clients. THANKS to @Chrissy and the many other contributors who are taking the trouble to make this community tool!

Photo by Ben White on Unsplash