Architecting a vSphere Compute Platform : Planning Host Deployment : 6.3 Boot from SAN
   
6.3 Boot from SAN
When you configure a host to boot from a SAN, the hypervisor’s boot image is stored on a single LUN in the SAN attached storage system. When the host is powered on, it boots from the LUN through the SAN rather than from any local media. A boot from SAN environment can provide numerous benefits to the infrastructure, including providing a completely stateless compute environment. However, it can be complex to support. In addition, in certain use cases, do not use boot from SAN for ESXi hosts. For instance, where vSAN is being employed on the same hardware. Before you decide whether boot from SAN is appropriate for your environment, the advantages and drawbacks for an environment are outlined in the following table.
 
 
Table 10. Advantages and Drawbacks of Boot from SAN
Advantages
Drawbacks
Less power, less heat, less state – Removing internal hard drives from servers means they consume less power and generate less heat. That means they can be packed more densely, and the need for localized cooling is reduced. Without local storage, the servers effectively become “stateless” compute resources that can be pulled and replaced without having to worry about locally-stored data.
Compatibility problems – Some operating systems, system BIOS, and especially HBAs, might not support boot from SAN. Upgrading these components might change the economics in favor of local boot or vSphere Auto Deploy.
Less server CapEx – Boot from SAN enables organizations to purchase less expensive diskless servers. Further savings can be made through reduced storage controller costs, although servers still need bootable HBAs.
Single point of failure – If a server hard drive fails, the system will be unable to boot, but if a SAN or its fabric experience major problems, it is possible that no servers will be able to boot. Although the likelihood of this happening is relatively small because of the built-in redundancy in most SAN systems, it is nevertheless worth considering.
More efficient use of storage – Whatever the footprint of a server's operating system, it will always be over-provisioned in terms of internal storage to accommodate it. Using boot from SAN, the boot device can be configured to match the capacity it requires. That means a large number of servers running a range of operating systems can boot from a far smaller number of physical disks.
Boot overload potential – If a large number of servers try to boot at the same time—after a power failure, for example—this might overwhelm the fabric connection. In these circumstances, booting might be delayed or, if timeouts occur, some servers might fail to boot completely. This can be prevented by ensuring that boot LUNs are distributed across as many storage controllers as possible and that individual fabric connections are never loaded beyond vendor limits.
High availability – Spinning hard drives with moving internal components are disasters waiting to happen in terms of reliability, so removing reliance on internal hard drives guarantees higher server availability. The servers still rely on hard drives, but SAN storage arrays are much more robust and reliable, with far more redundancy built in so that servers can boot.
Boot dependencies – The SAN and array infrastructure must be operational to boot ESXi hosts. After a complete data center outage, these components must be started and be operational prior to restarting hosts.
 
Rapid disaster recovery – Data, including boot information, can easily be replicated from one SAN at a primary site to another SAN at a remote disaster recovery site. That means that in the event of a failure, servers are up and running at the remote site very rapidly.
Configuration issues – Diskless servers can easily be pulled and replaced, but their HBAs have to be configured to point to their SAN-based boot devices before they boot. Unexpected problems can occur if a hot-swappable HBA is replaced in a running server. Unless the HBA is configured for boot from SAN, the server will continue to run but fail to boot the next time it is restarted.
Lower OpEx though more centralized server management – Boot from SAN provides the opportunity for greatly simplified management of operating system patching and upgrades. For example, upgraded operating system images can be prepared and cloned on the SAN, and then individual servers can be stopped, directed to their new boot images, and rebooted, with very little down time. New hardware can also be brought up from SAN-based images without the need for any Ethernet networking requirements. LUNs can be cloned and used to test upgrades, service packs, and other patches or to troubleshoot applications.
LUN presentation problems – Depending on your hardware, you might find that some servers can only boot from SAN from a specific LUN number (LUN0). If that is the case, you must have a mechanism in place to present the unique LUN that you use to boot a given server as the LUN it (and other similar servers) expects to see. This is now considered a legacy issue that does not affect a new implementation.
 
Better performance – In some circumstances the rapidly spinning, high-performance disks in a SAN may provide better operating performance than is available on a lower performance local disk.
 
Additional complexity – There is no doubt that boot from SAN is far more complex than common local booting, and that adds an element of operational risk. As IT staff become accustomed to the procedure, however, this risk diminishes. However, do not discount the potential for problems in the early stage of boot from SAN adoption. For example, boot-from-SAN configurations require individual fabric zoning for each server and potentially a much more complex HBA/CNA configuration.
 
CostSAN storage is typically more expensive than local storage, so any savings on server storage is lost on the extra SAN disks.
Storage team overhead – A SAN LUN must be provisioned and managed for every server, which can create significant additional work for a storage team.
Performance Periods of heavy VMkernel disk swapping I/O can affect the virtual machine’s disk performance, because they share the same disk I/O channels.
Microsoft clustering – In vSphere 4, virtual machines configured with Microsoft Clustering (MSCS or failover clustering) are not supported on boot from SAN configurations.
Scratch partitions –ESXi does not automatically create scratch partitions in a boot from SAN environment because it sees the disks as remote. The creation of scratch partitions can be easily configured manually or scripted.