SQL Server Availability Groups in GCP

I wanted to share some tips on how to create a SQL Server Availability Group in GCP. If you have tried before and put both nodes in the same subnet you may have noticed that it will not work. In order to make it work you need to put the nodes in different subnets.
You should get familiar with CDIR notation. Don’t need to be an expert but just understand the difference between a /16 netmask (255.255.0.0) vs /24 netmask (255.255.255.0) for example. You can follow the step-by-step instructions found at [Configuring SQL Server Availability Groups]. In this post I just want to provide some extra information in case you need to customize the steps for your own environment.

Summary of the network layout for the two nodes
Hostname Subnet – CIDR Hostname IP Guest OS CIDR IPs for WSFC & Listener
cluster-sql1 wsfcesubnet1 – 10.0.0.0/24 10.0.0.4 10.0.0.0/16 10.0.1.4; 10.0.1.5
cluster-sql2 wsfcesubnet2 – 10.1.0.0/24 10.1.0.4 10.1.0.0/16 10.1.1.4; 10.1.1.5

This is a summary of the steps:

  1. Create a VPC network with 3 subnets. Only 2 are absolutely required, the third one is for the domain controller which could also be in one of the two subnets.
    1. wsfcsubnet1 – 10.0.0.0/24 for Node 1
    2. wsfcsubnet2 – 10.1.0.0/24 for Node 2
    3. wsfcsubnet3 – 10.2.0.0/24 for Domain Controller
  2. Create firewall rules to allow all traffic between the 3 subnets and also for RDP.
  3. Create a Windows instance to be the domain controller in the subnet wsfcsubnet3 and create a Windows AD Domain.
  4. Create the first instance cluster-sql1 in the subnet wsfcsubnet1 and assign the IP 10.0.0.4.
  5. Create the second instance cluster-sql2 in the subnet wsfcsubnet2 and assign the IP 10.1.0.4.
  6. Connect to both instances using RDP. You first need to create a Windows username/password.
  7. Now comes the most critical part of the process:
    1. Change the IP to be static instead of DHCP.
    2. The same command will change the subnet mask in the Windows guest OS to /16 (255.255.0.0). Notice that /16 has more IP addresses than /24 which is the subnet where the node resides. Later on we will choose IPs outside the /24 range but inside the /16 range for the Windows Failover Cluster and Listener.
    3. The same command above will make you lose the connection to the server, so you will need to re-connect again.
    4. Set the local DNS to be the domain controller (10.2.0.100).
    5. Open the firewall ports for Availability Groups and SQL Server.
  8. In GCP create four routes. The routes will send traffic intended for the Windows Failover Cluster or Listener back to the corresponding node. Every node has essentially two extra IPs assigned to them.
  9. Create the Windows Failover Cluster
  10. Create the SQL Server Availability Group

Below is a diagram on how the two nodes are setup.

The most critical part is to set the netmask of the Windows guest OS wider than the netmask of the GCP subnet (/16 has more IPs than /24). Then for the WSFC and Listener we choose IPs outside the /24 range but within the /16 range. The GCE Advanced Router will use the 4 routes that we created to route traffic to the corresponding node.

I created some PowerShell scripts to do the creation of the Availability Group automatically. You can find the scripts in the Google Cloud Platform GitHub repository. You can also watch a demo that I did during Google Cloud 2017 conference with Feng Min back when I was a Googler.
If you have any questions or ideas about SQL Server on Google Cloud send me a note. On a future post I plan to explain how to run the PowerShell scripts that I mentioned earlier. I also plan do do another post about setting up Availability Groups using Alias IP Ranges to assign the IPs for the cluster and listener without having to modify the netmask of the nodes.