Create an automation runbook to force a SQL Server AVG failover
This article is the third part of a series which describes the lab environment I prepared for my demo @ TCDIT2017 conference. Other parts of the series are available here:
Environment implementation |
Securing lab environment |
Create an automation runbook to force a SQL Server AVG failover |
In this post I’ll try to summarize the steps you need to implement in order to create an Azure Automation runbook which will force a SQL Server Availability Group failover on a disaster recovery data center.
I used updated the steps provided in this article, updating them to Resource Manager model, in order to run a custom script on the DR SQL Server instance.
Basically, you need to:
- Upload a custom PowerShell script into a storage account, inside a blob storage container
- Create an automation account
- Create a runbook which pick the custom PS script, inject and run it into the DR VM
Then, you simply need to simulate failover condition and run the runbook to test it.
Let’s start from the beginning! 🙂
Create a custom PS script and upload it into a storage account
You can find ForceSqlDrFailover.PS1Â in my GitHub repo. It expects to be ran on a DR cluster node, without any quorum vote, which is unable to communicate with main data center nodes; in this condition, the cluster service isn’t able to bring the cluster online.
The script forces the cluster services to stop, and then restart it with ForceQuorum option;Â it then wait for the cluster to be in operating state, and change its quorum configuration. Once the previous operations are completed, it forces the SQL Availability Group failover with possible data loss.
Here’s the code:
Param( [string]$drNodeName, [string]$sqlInstance='DEFAULT', [string]$agName ) import-module sqlps import-module FailoverClusters $timeout = New-Object System.TimeSpan -ArgumentList 0, 0, 120 #Force cluster start with single node online Write-Output "Stopping cluster service on $($drNodeName)..." $clusSvc = Get-Service -Name "Cluster Service" -ComputerName $drNodeName Stop-Service -InputObject $clusSvc -ErrorAction SilentlyContinue $clusSvc.WaitForStatus("Stopped", $timeout) Write-Output "Cluster service stopped on $($drNodeName)." Write-Output "Starting cluster service on $($drNodeName)..." Start-ClusterNode -name $drNodeName -ForceQuorum $clusSvc = Get-Service -Name "Cluster Service" -ComputerName $drNodeName $clusSvc.WaitForStatus("Running", $timeout) While((Get-ClusterNode $drNodeName).State -ne "Up") { Start-Sleep 30 Write-Output "Waiting for cluster node to be online..." } Write-Output "Cluster node online." Set-ClusterQuorum -NoWitness #Fix node weight on DR node $clNode = Get-ClusterNode $drNodeName While($clNode.NodeWeight -ne 1) { try { $clNode.NodeWeight = 1 } catch { Write-Output "Trying to change cluster node weight..." } } Write-Output "Cluster node weight changed." Write-Output "Cluster service started with ForceQuorum on $($drNodeName)." #Force AG failover with possible data loss Write-Output "Forcing failover for $($agName) AVG..." $SQLAvailabilityGroupPath = "SQLSERVER:\Sql\$($drNodeName)\$($sqlInstance)\AvailabilityGroups\$($agName)" Switch-SqlAvailabilityGroup -Path $SQLAvailabilityGroupPath -AllowDataLoss -force Write-Output "AVG online."
You then need to upload it on a storage account, inside a blob storage container. In my demo I used the SQL Server storage account placed in the DR data center.
You can can do it manually or use the following code to do it via PowerShell:
$resourceGroupName = 'tcdit17-sqlhademo' $saName = 'tcditsqlsadr' $scriptFile = '.\forceSqlDrFailover.ps1' Add-AzureRmAccount $keys = Get-AzureRmStorageAccountKey -ResourceGroupName $resourceGroupName -Name $saName $storageContext = New-AzureStorageContext -StorageAccountName $saName -StorageAccountKey $keys[0].Value $container = Get-AzureStorageContainer -Name 'script-container' -Context $storageContext if(!$container) { $container = New-AzureStorageContainer -Name 'script-container' -Context $storageContext } $UploadFile = @{ Context = $StorageContext; Container = $container.Name; File = $scriptFile; } Set-AzureStorageBlobContent @UploadFile;
Create the Automation Account
Login into Azure portal and create a new resource of type “Automation”:
Type a new name, put it into an appropriate resource group and choose to creare a RunAs Account:
Done! Now you should have all the resources you need to create your first runbook
Create a new failover runbook
From the Automation Account properties, choose the Runbook tab and add a new one:
Assign it a new name, and create a PowerShell Workflow runbook:
In the Edit pane, you need to insert the Powershell Workflow script.
Since I used a workflow ready to be called by an ASR Recovery Plan, I had to manage $RecoveryPlanContext object as parameter, which is normally passed by Recovery Plan itself. I didn’t remove this part because I plan to reuse it for other projects, and I simply added the creation of a fake $RecoveryPlanContext if the parameter is empty.
The workflow executes following steps:
- Gain access to the subscription using RunAsAccount credential
- Retrieve the failover script from the storage account
- Inject it CustomScriptExtension
Here’s the code:
workflow ForceSqlDrFailover { [OutputType([string])] param ( [Object]$RecoveryPlanContext ) #$VerbosePreference="Continue" if(!$RecoveryPlanContext) { $RecoveryPlanContext = inlineScript { $obj = New-Object PSObject $obj | Add-Member -MemberType NoteProperty -Name "FailoverType" -Value "pfo/ufo" $obj } Write-Verbose "RecoveryPlanContext parameter object not present. A fake parameter object was created for demo purposes." } $connectionName = "AzureRunAsConnection" try { # Get the connection "AzureRunAsConnection " $servicePrincipalConnection=Get-AutomationConnection -Name $connectionName Write-Verbose "Logging in to Azure..." Add-AzureRmAccount ` -ServicePrincipal ` -TenantId $servicePrincipalConnection.TenantId ` -ApplicationId $servicePrincipalConnection.ApplicationId ` -CertificateThumbprint $servicePrincipalConnection.CertificateThumbprint } catch { if (!$servicePrincipalConnection) { $ErrorMessage = "Connection $connectionName not found." throw $ErrorMessage } else{ Write-Error -Message $_.Exception throw $_.Exception } } inLineScript { $resourceGroupName = 'tcdit17-sqlhademo' $saName = 'tcditsqlsadr' $scriptFile = 'forceSqlDrFailover.ps1' $sqlDrNode = 'tcdit-sql-dr0' $agName = 'alwayson-ag' Write-Verbose "Acquiring encryption keys for $($saName) Storage Account..." $saKey = (Get-AzureRmStorageAccountKey -ResourceGroupName $resourceGroupName -Name $saName).Key1 Write-Verbose "Keys acquired." Write-Verbose "Creating storage context..." $storageContext = New-AzureStorageContext -StorageAccountName $saName -StorageAccountKey $saKey Write-Verbose "Storage context created." Write-Verbose "failovertype $($Using:RecoveryPlanContext.FailoverType)"; if ($Using:RecoveryPlanContext.FailoverType -eq "Test") { Write-Verbose "tfo: Skipping SQL Failover"; } else { Write-Verbose "pfo/ufo"; $VM = Get-AzureRmVM -Name $sqlDrNode -ResourceGroupName $resourceGroupName; $AGArgs="-drNodeName $($sqlDrNode) -agName $($agName)"; Write-Verbose "Starting AG Failover to $($sqlDrNode)"; Set-AzureRmVMCustomScriptExtension -ResourceGroupName $resourceGroupName ` -VMName $VM.name ` -Location $VM.location ` -Name "ForceSqlDrFailover" ` -TypeHandlerVersion "1.1" ` -StorageAccountName $saName ` -StorageAccountKey $saKey ` -FileName $scriptFile ` -Run $scriptFile ` -Argument $AGArgs ` -ContainerName "script-container" Write-Verbose "Completed AG Failover"; } } }
Once inserted, you can test it in the test pane; then you need to publish it:
Then, you have simply to start it. In my lab environment, it took about 3-5 minutes to complete the failover between data centers.
That’s all, if you need any additional info just contact me 🙂
0 Comments