A customer had asked me the last few days to help him optimize a SQL server on an Azure VM. The customer always had slight performance problems with this SQL server, so he first asked the application manufacturer or supervisor, but they didn’t want to or could help. The only answer they received was that a disk latency of 5 ms is recommended for the DATA disks of the SQL Server… Officially, in Azure, this can only be achieved with ULTRA disks. In this blog post, I don’t want to detail how this might be fulfilled. Just show you an error that I noticed during the analysis.
Starting with a small intro, I talked to the customer and showed him my calculation of what it would cost to equip the server with 2 TB ultra disks and whether it might not be more reasonable and adequate to put the SQL server through its paces and optimize it (also the application or its T-SQL) and in the last step to go to ultra disks if necessary.
The customer agreed, and we have taken the SQL Server under review. Even if they are using a marketplace image from Microsoft (thanks to the product team!), which are already configured very well, there might be some more screws to adjust.
Azure SQL VM and the data disks
If you follow the recommendations for Azure SQL in a virtual machine, you can read the recommendations there:
Stripe multiple Azure data disks using Storage Spaces to increase I/O bandwidth up to the target virtual machine’s IOPS and throughput limits.https://docs.microsoft.com/en-us/azure/azure-sql/virtual-machines/windows/performance-guidelines-best-practices-storage
In a first effort, we replaced the 1TB disk with several small disks to increase the maximum throughput from 200MB to 625MB. So far, this has been accomplished without any downtime, the customer has been delighted so far. But in this activity, I also came across a message in the Azure portal on this virtual machine or the SQL Server configuration overview, which confused me a little; Google wasn’t a real help.
In the Azure portal, there is the possibility to roll out several extensions for all SQL servers within an Azure VM, which are intended to support the admin-team in managing the VM or the SQL server via Powershell, Azure CLI, or the portal and provide an overview on data from the logs and metrics. Microsoft uses the “SQL Server IaaS Extension Query Service” or “SQL Server IaaS Agent extension” for this purpose. Both collecting data from the virtual machine and the SQL Server and display it in a user-friendly way in the portal. Unfortunately, that was the problem with this customer VM, and these metrics were not 100% available, as the following screenshot shows.
No data on the storage could be determined in the further “Configure” overview, although all extensions were rolled out on the server and did not display any data even after one or more reboots. So I went looking for the cause…
Extension-Log and fn_trace_gettable
As the Microsoft documentation states, there are different modes for different requirements, and of course, you can also notice a local error log, which you can use for troubleshooting. This error log is an XE file and a trace file, neither of which can be read with Notepad only with SQL Server Management Studio… so the easiest way to access the trace file with a T-SQL system function to determine that the system user apparently has no or too few authorisations. Usually, the extension is located directly on the C drive in the directory C:\SQLIaaSExtension, so you have to use this path accordingly in the SQL query.
SELECT * FROM fn_trace_gettable('C:\SQLIaaSExtension\Log\log.trc', default); GO
By executing this T-SQL statement to examine the extension log file, I received the following line, among other things:
alter server role [sysadmin] add member [NT Service\SQLIaaSExtensionQuery]
When seeking for the user, I discovered that the customer had deleted this user – for whatever reason (later I found out that this apparently did not happen “by accident”, on other SQL servers in Azure, the user is also missing)
USE [master] GO /****** Object: Login [NT SERVICE\SqlIaaSExtensionQuery] Script Date: 5/10/2021 3:35:08 PM ******/ DROP LOGIN [NT SERVICE\SqlIaaSExtensionQuery] GO /****** Object: Login [NT SERVICE\SqlIaaSExtensionQuery] Script Date: 5/10/2021 3:35:08 PM ******/ CREATE LOGIN [NT SERVICE\SqlIaaSExtensionQuery] FROM WINDOWS WITH DEFAULT_DATABASE=[master], DEFAULT_LANGUAGE=[us_english] GO ALTER SERVER ROLE [sysadmin] ADD MEMBER [NT SERVICE\SqlIaaSExtensionQuery] GO
As a test, I recreated this user, authorized it, and checked in the Azure portal whether there was a change in the representations… Yes, the SQL Server IaaS Agent Extension can now access the SQL Server and read out the necessary and relevant data.