Architecting a Hybrid Database Strategy with Microsoft SQL Server : Introduction to Microsoft SQL AlwaysOn
   
Introduction to Microsoft SQL AlwaysOn
Traditionally, running Microsoft SQL Server on the VMware vSphere® platform offered multiple database availability and disaster recovery options using VMware features, such as VMware vSphere High Availability and VMware vSphere Distributed Resource Scheduler™ (DRS), with Microsoft features, such as failover clustering. The Microsoft SQL Server high availability features have also been combined with vSphere to create a flexible availability and recovery solution, applying the most efficient and appropriate tools for each specific use case.
All of Microsoft SQL Server availability technologies are supported on the core vSphere platform, including:
Log shipping
Database mirroring
AlwaysOn failover cluster instances
AlwaysOn Availability Groups
It is important to recognize that SQL database availability involves more than the technologies just described. When architecting this type of solution, you must also consider all the design factors for local site and remote site availability options, backup and restore options, and best practices and deployment considerations.
While the objective of this paper is to address use cases available with the AlwaysOn Availability Groups mechanism, to provide a complete picture of the available options with AlwaysOn technology, the following table lists all of Microsoft SQL Server 2016 availability options and their potential ability to meet Recovery Time Objectives (RTOs) and Recovery Point Objectives (RPOs).
Table 1. Microsoft SQL Server High Availability Technologies
Technology
Granularity
Storage Type
RPO
RTO
Log shipping
Per database
Non-shared storage architecture
Last shipped transaction log
DBA recovery only
Database mirroring
Per database
Non-shared storage architecture
None*
< 3 seconds or administrator recovery
AlwaysOn Failover Cluster Instances
Per instance
Shared storage only
None
~30 seconds
AlwaysOn Availability Groups
Per database
Non-shared storage architecture
None**
~3 seconds or administrator recovery
* With high safety mode.
** With synchronous commit mode.
The potential RPO and RTO values shown in the previous table are typical examples of achievable values in real world implementations. However, architecture, hardware, and configuration are significant factors that can skew these estimations significantly.
The focus of this document is the SQL Server 2016 AlwaysOn Availability Groups feature, which provides flexible design choices for selecting an appropriate high availability and disaster recovery solution for applications, and supports availability at either the application database or instance level. The AlwaysOn Availability Groups feature was developed for applications that require high uptime, need protection against failures within a data center, and require adequate redundancy against failures across data centers. This solution architecture helps protect application databases from planned and unplanned downtime without the need for shared storage between nodes.
Figure 1. Windows Server Failover Cluster with AlwaysOn Availability Group
 
The Availability Groups feature provides an enterprise-level alternative to database mirroring. The availability group supports the failover environment for a set of read-write primary databases, and one to four sets of corresponding secondary databases. Additionally, you can make secondary databases available for read-only access or backup operations. With AlwaysOn Availability Groups automatic failover mode enabled, if the primary replica fails, a secondary replica automatically takes over servicing client requests and becomes the new primary replica.
The AlwaysOn Availability Groups feature provides support for the following functions:
The ability to offload backup jobs to a secondary node
The ability to provide read-only access to secondary replica nodes
Provide a single primary database replica and up to four secondary database replica targets
Provide a mixture of synchronous and asynchronous data replication between primary and multiple secondary replicas
Faster application failover is provided by availability group listeners
Multiple databases can be configured in a single failover unit
Flexible failover policies through the Windows Server Failover Clustering (WSFC) feature
This AlwaysOn Availability Group mechanism is also different from traditional SQL failover cluster Instances because a failover cluster can provide availability only at the entire Microsoft SQL Server instance level, not for individual databases.
As mentioned previously, the AlwaysOn Availability Groups feature is built on a non-shared disk architecture. Each availability group replica has its own copy of the database, which can be deployed on VMFS or RDMs. Log streams are used to replicate data from the primary to the secondary replica nodes synchronously or asynchronously, with up to two synchronous replicas allowed. Availability groups sit on top of WSFC technology to provide a failover policy or quorum management. WSFC is required by AlwaysOn Availability Groups, with each availability group replica also being a WSFC node, unlike AlwaysOn Failover Cluster Instances, for which there is no requirement to use shared disks.
While VMware does not support VMware vSphere vMotion® or DRS on clustered SQL Server virtual machines within a shared disk architecture (such as AlwaysOn Failover Cluster Instances), this restriction does not apply to AlwaysOn Availability Groups that are built on a non-shared disk architecture. VMware fully supports using vSphere HA, vSphere vMotion, and DRS with AlwaysOn Availability Groups.
Figure 2. Windows Server Failover Cluster with AlwaysOn Availability Group Configuration
 
With vSphere vMotion, a VMware ESXi™ host can be powered down for planned maintenance at any time without interruption to the client requests. vSphere HA can quickly reboot a Microsoft SQL Server virtual machine, which can rejoin the AlwaysOn Availability Group session if a hardware failure occurs.
In this planned maintenance scenario, vSphere vMotion is employed to proactively live migrate an availability group replica to a different host to allow the servicing of the hardware to take place without requiring an AlwaysOn Availability Group failover event. There is no disruption to the Microsoft SQL Server service during the vSphere vMotion operation and no interruption to the client’s application connections or any in-flight transactions. By coupling together vSphere vMotion with AlwaysOn Availability Groups technologies, you can eliminate the need to fail over the Microsoft cluster and reduce service interruptions for operational hardware maintenance or renewal.
In the unplanned failure scenario, the SQL Server environment could be vulnerable if further host failures were to occur during the time between the loss of a replica and its restoration because, depending on available bandwidth and network conditions, the resynchronization of the replica could take a significant period of time to complete. vSphere HA helps to alleviate this issue by restarting the failed replica virtual machine on another available host in the vSphere cluster. This facilitates a significantly quicker restore to full protection of the database, and reduces the amount of time spent by the availability group in the failover state. In the event of an unplanned physical host failure, you do not need to wait for the physical host to be serviced and brought back online to bring back the AlwaysOn Availability Group replica. Instead, vSphere HA automatically detects the host failure and immediately reboots the AlwaysOn Availability Group replica virtual machine onto a different available ESXi host.
With these integrated VMware mechanisms for high availability, it is perfectly possible, if not expected, to achieve better levels of uptime running AlwaysOn Availability Groups with vSphere than on physical hardware. This provides a high availability and disaster recovery solution for Tier 1 virtualized business critical databases to maximize continuous availability of the application it services and for business continuity during disaster scenarios. Furthermore, this solution architecture:
Demonstrates business-critical levels of high performance and availability between the IT organizations private data centers and VMware Cloud Providers’ facilities.
Promotes resiliency to help meet recovery time and point objectives when faced with application, storage, network, or compute node failures.
Demonstrates how to achieve business continuity SLAs in partnership with a VMware Cloud Provider to lower risk and operational costs.
Figure 3. Solution Logical Architecture
 
For the solution shown in the previous figure, which includes a cross-site IPSec VPN and Active Directory Sites and Services, consider the following:
For an IPSec VPN, the endpoint networks cannot be the same.
The IP Sec VPN mapping must use the correct VMware Cloud Provider Program edge gateway firewall rules to allow traffic to pass.
Networks segments in the VMware Cloud Provider Program data center must be defined as a new site with the Sites and Services configuration of an Active Directory domain controller.
SQL Server 2016 AlwaysOn provides flexible design choices for selecting an appropriate solution for your application with protection against failures within a data center and adequate redundancy across data centers. For a further overview of the AlwaysOn design options for high availability and disaster recovery, see AlwaysOn Availability Groups (SQL Server) at https://technet.microsoft.com/en-us/library/hh510230.