Data Deduplication and Hyper-V
So far, I have covered some of the reasons that Windows Server 2016 is a great storage
platform. Storage Spaces with its thin provisioning and autorecovery, Storage Spaces
Direct to use local storage in servers, Storage Replica to provide DR, improved ChkDsk
error correction, iSCSI and SMB 3 and above servers, and VHDX are amazing features.
There are many other features, such as the new ReFS filesystem and industry-leading
NFS implementation, but I want to touch on one more feature that I would not have
covered in Windows Server 2012, but is now applicable to virtualization: data
deduplication.
Windows Server 2012 introduced the block-level data deduplication capability as an
optional role available within the File and iSCSI Services collection of role services. In
Windows Server 2012, data deduplication did not work on any file that had an
exclusive lock open, which was the case for a virtual hard disk used by a virtual
machine. This meant that the data deduplication feature was useful only for reducing
space for archived virtual machine or libraries of content.
In Windows Server 2012 R2, the data deduplication functionality was improved to
work on exclusively locked files. It can therefore deduplicate virtual hard disks used
by Hyper-V virtual machines. For the Windows Server 2012 R2 release, though,
deduplication is supported for only a single scenario: the deduplication of VDI
deployment virtual machines, primarily personal desktop deployment that often
results in a very large amount of duplicated content. When you leverage the data
deduplication capability at the filesystem level, all the duplicated blocks within a
virtual hard disk and between different virtual hard disks would be single-instanced,
resulting in huge disk savings. Windows Server 2012 R2 also adds support for
deduplication for Cluster Shared Volumes, which means that deduplication can be
used on shared cluster disks and on the storage of scale-out file servers. Note that
while ReFS is a possible filesystem for Hyper-V in Windows Server 2016, it does not
support deduplication. If you need deduplication, you need to use NTFS, which is the
guidance for all scenarios except Storage Spaces Direct anyway.
The way the data deduplication works is that a periodic scan of the filesystem is
performed and the blocks on disk have a hash value created. If blocks are found with
the same value, it means the content is the same, and the block is moved to a single
instance store. The old locations now point to the single-instance store copy. The
block size used is variable to achieve the greatest level of deduplication. It is common
to see disk space savings of up to 95 percent in VDI environments, because most of
the content of each virtual hard disk is the same as the other virtual hard disk
instances. Using deduplication speeds up the performance of VDI environments
because of improvements in caching instead of having a negative performance impact,
which may be expected.
It should be noted that while in Windows Server 2012 R2 the data deduplication is