Create an automation runbook to force a SQL Server AVG failover

Published by Marco Obinu on

This article is the third part of a series which describes the lab environment I prepared for my demo @ TCDIT2017 conference. Other parts of the series are available here:

Environment implementation
Securing lab environment
Create an automation runbook to force a SQL Server AVG failover

In this post I’ll try to summarize the steps you need to implement in order to create an Azure Automation runbook which will force a SQL Server Availability Group failover on a disaster recovery data center.

I used updated the steps provided in this article, updating them to Resource Manager model, in order to run a custom script on the DR SQL Server instance.

Basically, you need to:

  • Upload a custom PowerShell script into a storage account, inside a blob storage container
  • Create an automation account
  • Create a runbook which pick the custom PS script, inject and run it into the DR VM

Then, you simply need to simulate failover condition and run the runbook to test it.
Let’s start from the beginning! 🙂

Create a custom PS script and upload it into a storage account

You can find ForceSqlDrFailover.PS1 in my GitHub repo. It expects to be ran on a DR cluster node, without any quorum vote, which is unable to communicate with main data center nodes; in this condition, the cluster service isn’t able to bring the cluster online.

The script forces the cluster services to stop, and then restart it with ForceQuorum option; it then wait for the cluster to be in operating state, and change its quorum configuration. Once the previous operations are completed, it forces the SQL Availability Group failover with possible data loss.

Here’s the code:

Param(
[string]$drNodeName,
[string]$sqlInstance='DEFAULT',
[string]$agName
)

import-module sqlps
import-module FailoverClusters

$timeout = New-Object System.TimeSpan -ArgumentList 0, 0, 120

#Force cluster start with single node online
Write-Output "Stopping cluster service on $($drNodeName)..."
$clusSvc = Get-Service -Name "Cluster Service" -ComputerName $drNodeName
Stop-Service -InputObject $clusSvc -ErrorAction SilentlyContinue
$clusSvc.WaitForStatus("Stopped", $timeout)
Write-Output "Cluster service stopped on $($drNodeName)."

Write-Output "Starting cluster service on $($drNodeName)..."
Start-ClusterNode -name $drNodeName -ForceQuorum

$clusSvc = Get-Service -Name "Cluster Service" -ComputerName $drNodeName
$clusSvc.WaitForStatus("Running", $timeout)

While((Get-ClusterNode $drNodeName).State -ne "Up") {
    Start-Sleep 30
    Write-Output "Waiting for cluster node to be online..."
}
Write-Output "Cluster node online."
Set-ClusterQuorum -NoWitness

#Fix node weight on DR node
$clNode = Get-ClusterNode $drNodeName
While($clNode.NodeWeight -ne 1)
{
    try {
        $clNode.NodeWeight = 1
    }
    catch { 
        Write-Output "Trying to change cluster node weight..."
    }
}
Write-Output "Cluster node weight changed."
Write-Output "Cluster service started with ForceQuorum on $($drNodeName)."

#Force AG failover with possible data loss
Write-Output "Forcing failover for $($agName) AVG..."
$SQLAvailabilityGroupPath = "SQLSERVER:\Sql\$($drNodeName)\$($sqlInstance)\AvailabilityGroups\$($agName)"
Switch-SqlAvailabilityGroup -Path $SQLAvailabilityGroupPath -AllowDataLoss -force
Write-Output "AVG online."

You then need to upload it on a storage account, inside a blob storage container. In my demo I used the SQL Server storage account placed in the DR data center.
You can can do it manually or use the following code to do it via PowerShell:

$resourceGroupName = 'tcdit17-sqlhademo'
$saName = 'tcditsqlsadr'
$scriptFile = '.\forceSqlDrFailover.ps1'

Add-AzureRmAccount

$keys = Get-AzureRmStorageAccountKey -ResourceGroupName $resourceGroupName -Name $saName
$storageContext = New-AzureStorageContext -StorageAccountName $saName -StorageAccountKey $keys[0].Value

$container = Get-AzureStorageContainer -Name 'script-container' -Context $storageContext
if(!$container) {
    $container = New-AzureStorageContainer -Name 'script-container' -Context $storageContext
}

$UploadFile = @{
    Context = $StorageContext;
    Container = $container.Name;
    File = $scriptFile;
    }
Set-AzureStorageBlobContent @UploadFile;

Create the Automation Account

Login into Azure portal and create a new resource of type “Automation”:

Type a new name, put it into an appropriate resource group and choose to creare a RunAs Account:

Done! Now you should have all the resources you need to create your first runbook

Create a new failover runbook

From the Automation Account properties, choose the Runbook tab and add a new one:

Assign it a new name, and create a PowerShell Workflow runbook:

In the Edit pane, you need to insert the Powershell Workflow script.

Since I used a workflow ready to be called by an ASR Recovery Plan, I had to manage $RecoveryPlanContext object as parameter, which is normally passed by Recovery Plan itself. I didn’t remove this part because I plan to reuse it for other projects, and I simply added the creation of a fake $RecoveryPlanContext if the parameter is empty.

The workflow executes following steps:

  • Gain access to the subscription using RunAsAccount credential
  • Retrieve the failover script from the storage account
  • Inject it CustomScriptExtension

Here’s the code:

workflow ForceSqlDrFailover
{
[OutputType([string])]

param (
[Object]$RecoveryPlanContext
)

#$VerbosePreference="Continue"

if(!$RecoveryPlanContext)
{
$RecoveryPlanContext = inlineScript {
$obj = New-Object PSObject
$obj | Add-Member -MemberType NoteProperty -Name "FailoverType" -Value "pfo/ufo"
$obj
}
Write-Verbose "RecoveryPlanContext parameter object not present. A fake parameter object was created for demo purposes."
}

$connectionName = "AzureRunAsConnection"
try
{
# Get the connection "AzureRunAsConnection "
$servicePrincipalConnection=Get-AutomationConnection -Name $connectionName

Write-Verbose "Logging in to Azure..."
Add-AzureRmAccount `
-ServicePrincipal `
-TenantId $servicePrincipalConnection.TenantId `
-ApplicationId $servicePrincipalConnection.ApplicationId `
-CertificateThumbprint $servicePrincipalConnection.CertificateThumbprint
}
catch {
if (!$servicePrincipalConnection)
{
$ErrorMessage = "Connection $connectionName not found."
throw $ErrorMessage
} else{
Write-Error -Message $_.Exception
throw $_.Exception
}
}

inLineScript {

$resourceGroupName = 'tcdit17-sqlhademo'
$saName = 'tcditsqlsadr'
$scriptFile = 'forceSqlDrFailover.ps1'
$sqlDrNode = 'tcdit-sql-dr0'
$agName = 'alwayson-ag'

Write-Verbose "Acquiring encryption keys for $($saName) Storage Account..."
$saKey = (Get-AzureRmStorageAccountKey -ResourceGroupName $resourceGroupName -Name $saName).Key1
Write-Verbose "Keys acquired."

Write-Verbose "Creating storage context..."
$storageContext = New-AzureStorageContext -StorageAccountName $saName -StorageAccountKey $saKey
Write-Verbose "Storage context created."

Write-Verbose "failovertype $($Using:RecoveryPlanContext.FailoverType)";

if ($Using:RecoveryPlanContext.FailoverType -eq "Test")
{
Write-Verbose "tfo: Skipping SQL Failover";
}
else
{
Write-Verbose "pfo/ufo";

$VM = Get-AzureRmVM -Name $sqlDrNode -ResourceGroupName $resourceGroupName;

$AGArgs="-drNodeName $($sqlDrNode) -agName $($agName)";

Write-Verbose "Starting AG Failover to $($sqlDrNode)";

Set-AzureRmVMCustomScriptExtension -ResourceGroupName $resourceGroupName `
-VMName $VM.name `
-Location $VM.location `
-Name "ForceSqlDrFailover" `
-TypeHandlerVersion "1.1" `
-StorageAccountName $saName `
-StorageAccountKey $saKey `
-FileName $scriptFile `
-Run $scriptFile `
-Argument $AGArgs `
-ContainerName "script-container"

Write-Verbose "Completed AG Failover";
}
}
}

Once inserted, you can test it in the test pane; then you need to publish it:

Then, you have simply to start it. In my lab environment, it took about 3-5 minutes to complete the failover between data centers.

 

That’s all, if you need any additional info just contact me 🙂


Marco Obinu

Curious by nature, talkative geek who can speak in front of a public or a camera, in love with technology, especially SQL Server. I want to understand how things work and to solve problems by myself during work as in my many hobbies.

0 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: