Architecting VMware vSAN 6.2 : vSAN Technology and Features Overview : 3.2 vSAN Key Features : 3.2.3 Deduplication and Compression
   
3.2.3 Deduplication and Compression
Enabling deduplication and compression can reduce the amount of storage consumed by as much as seven times. However, the actual reduction value varies, because this depends primarily on the types of data present, number of duplicate blocks, how much these data types can be compressed, and distribution of these unique blocks. For example, video files do not compress well, while documents and spreadsheets typically yield more favorable results. The following figure shows deduplication and compression efficiency being enabled in an all-flash vSAN environment.
Deduplication and compression is a cluster-wide setting that is disabled by default and can be enabled using a simple drop-down menu, illustrated here. Note that a rolling reformat of every disk group on every host in the vSAN cluster is required, which can take a considerable amount of time. However, this process does not incur virtual machine downtime. In addition, deduplication and compression are enabled as a single parameter. It is not possible to enable deduplication or compression individually.
Figure 6. Features: Deduplication and Compression
/var/folders/3j/csn3k7g54_5c7639tybnp6jw0000gn/T/com.TechSmith.Snagit/2016-02-04_10-29-48.png
Deduplication occurs when data is destaged nearline from the cache tier to the capacity tier of an all-flash disk group. The deduplication algorithm utilizes a 4K fixed-block size and is performed within each disk group independently. In other words, redundant copies of a block within the same disk group are reduced to one copy, but redundant blocks across multiple disk groups are not deduplicated. The deduplication at the disk group level by vSAN using a 4K block size helps provide a good balance between space efficiency and performance.
The compression algorithm is applied after deduplication has occurred, just before the data is written to the capacity tier flash devices. Considering the additional compute resource and allocation map overhead of compression, vSAN only stores compressed data if a unique 4K block can be reduced to 2K or less. Under all other circumstances, the block is written uncompressed.
There is a storage policy implication to be aware of when deduplication and compression are enabled. This is especially true when upgrading from a previous version of vSAN, where one or more storage policies contain an object space reservation rule with a value other than 0 or 100 percent. When deduplication and compression are enabled, object space reservation rules must be set to 0 or 100 percent. Values from 1 to 99 percent are not supported when deduplication and compression are enabled. An object that is assigned a storage policy containing an object space reservation rule of 100 percent is analyzed for deduplication, but no space savings are realized because capacity for the entire object is reserved. Before upgrading vSAN, all policies containing an explicit object space reservation rule must be configured to 0 or 100 percent.
Note The implicit default value for this rule is 0 percent, so there is no need to adjust a policy that does not have the object space reservation explicitly defined.
Naturally, the processes of deduplication and compression on any storage platform require additional CPU cycles and can potentially impact performance in terms of latency and maximum IOPS. vSAN is no exception. However, considering deduplication and compression are only supported in an all-flash vSAN configuration, these effects are negligible in the majority of use cases due to the high levels of performance available from modern enterprise flash devices.