Architecting a Hybrid Messaging Strategy with Microsoft Exchange 2013 : Introduction to Microsoft Exchange Server DAGs
   
Introduction to Microsoft Exchange Server DAGs
With Microsoft Exchange 2013, the data protection methods in Microsoft Exchange 2007 and 2010 have evolved into the latest version of the DAG, which represents the new building block for highly available and disaster recoverable Microsoft Exchange solutions.
A DAG is made of up to 16 mailbox servers that host a set of replicated databases and provide automatic database-level recovery from failures, issues, or outages that affect individual mailbox servers or databases. Microsoft recommends minimizing the number of deployed DAGs to simplify administration. However, with certain design factors, multiple DAGs might be required. For instance:
If your design requires deployment of more than 16 mailbox servers.
If you have active mailbox users in multiple sites (active-active site configuration).
If you require separate DAG-level administrative boundaries for operational reasons.
If you have mailbox servers in separate Active Directory domains (DAGs are domain bound).
In this solution architecture, due to the large number of mobile and remote workers being employed by the business, the IT organization is deploying an active-active site configuration, with active users connecting globally from multiple remote offices and through a range of mobile devices.
For this reason, the Microsoft Exchange architecture requires at least two DAGs, with each DAG spanning both sites. A file share witness is located in both the on-premises data center and in the VMware Cloud Provider Program facility to prevent database copies from becoming active in both sites, in the event of a network failure between the two facilities.
For instance, if there is a network outage between the two data center facilities, the primary site for that DAG will get two votes (the DAG member and the file share witness) as opposed to the single vote by the DAG member at the secondary data center. The majority vote retains quorum and the mailbox databases remain mounted.
Figure 1. File Share Witness Majority Vote Architecture
 
The Microsoft Exchange DAG feature is built on a non-shared disk architecture, with each server having its own copy of the database. This copy can be deployed on either VMFS or RDMs with log replay used to replicate data from the active to the passive nodes.
DAGs are built on top of Windows Server Failover Clustering (WSFC) technology, which provides a failover policy and quorum management. Although WSFC is required by DAGs, unlike traditional Microsoft Exchange failover cluster instances, there is no requirement to use shared disks. While VMware does not support VMware vSphere® vMotion® or VMware vSphere Distributed Resource Scheduler™ on clustered Microsoft Exchange Server virtual machines with a shared disk architecture, such as failover cluster instances, this restriction does not apply to DAGs that are built on a non-shared disk architecture. Therefore, using VMware vSphere High Availability, vSphere vMotion, and DRS with DAGs is fully supported by VMware.
This means that with vSphere vMotion, a VMware ESXi™ host can be powered down for planned maintenance at any time without interruption to client requests. In the event of an unplanned hardware failure, vSphere HA can quickly reboot a Microsoft Exchange Server virtual machine, which can then rejoin the DAG session.
In this planned maintenance scenario, vSphere vMotion can be employed to proactively live migrate an availability group replica to a different host to allow hardware maintenance without requiring a DAG failover event. With vSphere vMotion, there is no disruption of Microsoft Exchange mailbox server services during the migration and no interruption to the client’s email sync connections or any in-flight message transportation. By coupling vSphere vMotion with DAG technologies, you can eliminate the need to fail over the Microsoft cluster and reduce service interruptions for operational hardware maintenance or renewal.
In the unplanned hardware failure scenario, the Microsoft Exchange Server environment can be vulnerable if further host failures occur during the time between the loss of a passive database and its restoration, because, depending on available bandwidth and network conditions, the resynchronization of the passive node can take a significant period of time to complete. vSphere HA helps alleviate this issue by restarting the failed passive DAG virtual machine on another available host in the VMware vSphere cluster. This facilitates a faster restore to full protection of the mailbox database and reduces the amount of time spent by the DAG in the failed state. In the event of an unplanned physical host failure, you do not need to wait for the physical host to be serviced and brought back online to restore the passive DAG copy online. Instead, vSphere HA automatically detects the host failure and immediately reboots the passive DAG virtual machine on a different available ESXi host.
With these integrated VMware mechanisms for high availability, it is possible to achieve better levels of service uptime using DAGs with vSphere than on physical hardware.
 
 
The primary objective of this document is to demonstrate how to reduce the impact of hardware and software failures by using virtualized Microsoft Exchange Server 2013 between an on-premises data center and a VMware Cloud Provider facility to architect a high availability and disaster recovery solution for tier 1 virtualized business critical mailbox databases. This maximizes continuous availability of the applications being serviced and also provides business continuity during disaster scenarios. This solution architecture also aims to:
Demonstrate business-critical levels of high performance and availability between the IT organization’s private data center and the VMware Cloud Provider’s facilities.
Provide resiliency that can meet recovery time and point objectives when faced with application, storage, network, or compute node failures.
Demonstrate how to achieve business-continuity SLAs in partnership with a VMware Cloud Provider to lower risk and operational costs.
Figure 2. Solution Architecture Overview