virtual disk Metadata. Also note that the SBC uses 5 GB of memory per 1 TB of caching
devices in the node, which means that it is important to plan resource utilization
accordingly, especially in hyperconverged scenarios where the storage is hosted by the
same nodes hosting virtual machines. A key point is that the SBC is agnostic to storage
pools and disks; it is a resource for the node and used across the various storage pools
and disks present on a system and not tied to any specific pool or disk.
Resiliency of data stored in the cache is provided by virtue of the fact that the SBC is at
a node level and sits below the virtual disk (see Figure 4. 8 ), which is what defines the
required resiliency. Writes are distributed based on the resiliency to the required
number of nodes; for example, the write would be sent to three nodes, and then on
each of those nodes, the write would be persisted to the cache and then destaged to
the hot and cold tiers as required. No additional local resiliency is required on the
cache storage, as the resiliency is defined at the virtual disk level and achieved
through the multiple copies over multiple nodes. If a cache device has an error, the
write is still available on other nodes in the cluster, which will be used to rehydrate
the data on the node that experienced the problem. This is the same as for the hot and
cold tiers; because the data is stored multiple times across disks in different nodes,
resiliency is not at a single node level but is based on the resiliency defined for the
virtual disk.
The automatic nature of using disks does not stop at the storage pool. The virtual
disks created in the pool can use a mixture of resiliency types to offer a blend of best
performance and capacity. Traditionally, there are two types of resiliency:
Mirror This provides the best performance. However, it is the least efficient, as all
data is duplicated multiple times (for example, a three-way mirror has three copies
of the data, which means that only 33 percent of the disk footprint is usable for
storage).
Parity This provides the most capacity at the expense of performance, as
computations must be performed to calculate parity values. With dual-parity, 50
percent of the disk footprint is used for data storage. (LRC erasure coding is used,
which is the same scheme used for Azure Storage. Details can be found at
http://research.microsoft.com/en-us/um/people/chengh/papers/LRC 12 .pdf.)
Windows Server 2016 introduces mixed-resiliency virtual disks, which as the name
suggests, use a mix of mirroring and parity to provide high performance (mirroring)
for the hot data, and high capacity (parity) for the cold data. Mixed-resiliency virtual
disks are enabled through a combination of Storage Spaces and ReFS. To use a mixed-
resiliency disk, you must create a virtual disk with storage in both a mirror tier and a
parity tier. ReFS then provides the real-time tiering capability, which works as follows:
1. Data is written to the disk, and those writes always go to the mirror (performance)
tier. Note that the Storage Bus Cache is used for the first write, and the source of
the write is acknowledged. If the data is an update to existing data in the parity tier,
the existing data in the parity tier is invalidated and the data is written to the