Citrix XenDesktop and PVS: A Write Cache Performance Study

If you’re unfamiliar, PVS (Citrix Provisioning Server) is a vDisk deployment mechanism available for use within a XenDesktop or XenApp environment that uses streaming for image delivery. Shared read-only vDisks are streamed to virtual or physical targets in which users can access random pooled or static desktop sessions. Random desktops are reset to a pristine state between logoffs while users requiring static desktops have their changes persisted within a Personal vDisk pinned to their own desktop VM. Any changes that occur within the duration of a user session are captured in a write cache. This is where the performance demanding write IOs occur and where PVS offers a great deal of flexibility as to where those writes can occur. Write cache destination options are defined via PVS vDisk access modes which can dramatically change the performance characteristics of your VDI deployment. While PVS does add a degree of complexity to the overall architecture, since its own infrastructure is required, it is worth considering since it can reduce the amount of physical computing horsepower required for your VDI desktop hosts. The following diagram illustrates the relationship of PVS to Machine Creation Services (MCS) in the larger architectural context of XenDesktop. Keep in mind also that PVS is frequently used to deploy XenApp servers as well.

PVS 7.1 supports the following write cache destination options (from Link):

Cache on device hard drive - Write cache can exist as a file in NTFS format, located on the target-device’s hard drive. This write cache option frees up the Provisioning Server since it does not have to process write requests and does not have the finite limitation of RAM.
Cache on device hard drive persisted (experimental phase only) - The same as Cache on device hard drive, except cache persists. At this time, this write cache method is an experimental feature only, and is only supported for NT6.1 or later (Windows 7 and Windows 2008 R2 and later).
Cache in device RAM - Write cache can exist as a temporary file in the target device’s RAM. This provides the fastest method of disk access since memory access is always faster than disk access.
Cache in device RAM with overflow on hard disk - When RAM is zero, the target device write cache is only written to the local disk. When RAM is not zero, the target device write cache is written to RAM first.
Cache on a server - Write cache can exist as a temporary file on a Provisioning Server. In this configuration, all writes are handled by the Provisioning Server, which can increase disk IO and network traffic.
Cache on server persistent - This cache option allows for the saving of changes between reboots. Using this option, after rebooting, a target device is able to retrieve changes made from previous sessions that differ from the read only vDisk image.

Many of these were available in previous versions of PVS, including cache to RAM, but what makes v7.1 more interesting is the ability to cache to RAM with the ability to overflow to HDD. This provides the best of both worlds: extreme RAM-based IO performance without the risk since you can now overflow to HDD if the RAM cache fills. Previously you had to be very careful to ensure your RAM cache didn’t fill completely as that could result in catastrophe. Granted, if the need to overflow does occur, affected user VMs will be at the mercy of your available HDD performance capabilities, but this is still better than the alternative (BSOD).

Results

Even when caching directly to HDD, PVS shows lower IOPS/ user numbers than MCS does on the same hardware. We decided to take things a step further by testing a number of different caching options. We ran tests on both Hyper-V and ESXi using our standard 3 user VM profiles against LoginVSI’s low, medium, high workloads. For reference, below are the standard user VM profiles we use in all Dell Wyse Datacenter enterprise solutions:

Profile Name	Number of vCPUs per Virtual Desktop	Nominal RAM (GB) per Virtual Desktop	Use Case
Standard	1	2	Task Worker
Enhanced	2	3	Knowledge Worker
Professional	2	4	Power User

We tested three write caching options across all user and workload types: cache on device HDD, RAM + Overflow (256MB) and RAM + Overflow (512MB). Doubling the amount of RAM cache on more intensive workloads paid off big netting a near host IOPS reduction to 0. That’s almost 100% of user generated IO absorbed completely by RAM. We didn’t capture the IOPS generated in RAM here using PVS, but as the fastest medium available in the server and from previous work done with other in-RAM technologies, I can tell you that 1600MHz RAM is capable of tens of thousands of IOPS, per host. We also tested thin vs thick provisioning using our high end profile when caching to HDD just for grins. Ironically, thin provisioning outperformed thick for ESXi, the opposite proved true for Hyper-V. To achieve these impressive IOPS number on ESXi it is important to enable intermediate buffering (see links at the bottom). I’ve highlighted the more impressive RAM + overflow results in red below. Note: IOPS per user below indicates IOPS generation as observed at the disk layer of the compute host. This does not mean these sessions generated close to no IOPS.

Hyper-visor	PVS Cache Type	Workload	Density	Avg CPU %	Avg Mem Usage GB	Avg IOPS/User	Avg Net KBps/User
ESXi	Device HDD only	Standard	170	95%	1.2	5	109
ESXi	256MB RAM + Overflow	Standard	170	76%	1.5	0.4	113
ESXi	512MB RAM + Overflow	Standard	170	77%	1.5	0.3	124
ESXi	Device HDD only	Enhanced	110	86%	2.1	8	275
ESXi	256MB RAM + Overflow	Enhanced	110	72%	2.2	1.2	284
ESXi	512MB RAM + Overflow	Enhanced	110	73%	2.2	0.2	286
ESXi	HDD only, thin provisioned	Professional	90	75%	2.5	9.1	250
ESXi	HDD only thick provisioned	Professional	90	79%	2.6	11.7	272
ESXi	256MB RAM + Overflow	Professional	90	61%	2.6	1.9	255
ESXi	512MB RAM + Overflow	Professional	90	64%	2.7	0.3	272

For Hyper-V we observed a similar story and did not enabled intermediate buffering at the recommendation of Citrix. This is important! Citrix strongly recommends to not use intermediate buffering on Hyper-V as it degrades performance. Most other numbers are well inline with the ESXi results, save for the cache to HDD numbers being slightly higher.

Hyper-visor	PVS Cache Type	Workload	Density	Avg CPU %	Avg Mem Usage GB	Avg IOPS/User	Avg Net KBps/User
Hyper-V	Device HDD only	Standard	170	92%	1.3	5.2	121
Hyper-V	256MB RAM + Overflow	Standard	170	78%	1.5	0.3	104
Hyper-V	512MB RAM + Overflow	Standard	170	78%	1.5	0.2	110
Hyper-V	Device HDD only	Enhanced	110	85%	1.7	9.3	323
Hyper-V	256MB RAM + Overflow	Enhanced	110	80%	2	0.8	275
Hyper-V	512MB RAM + Overflow	Enhanced	110	81%	2.1	0.4	273
Hyper-V	HDD only, thin provisioned	Professional	90	80%	2.2	12.3	306
Hyper-V	HDD only thick provisioned	Professional	90	80%	2.2	10.5	308
Hyper-V	256MB RAM + Overflow	Professional	90	80%	2.5	2.0	294
Hyper-V	512MB RAM + Overflow	Professional	90	79%	2.7	1.4	294

Implications

So what does it all mean? If you’re already a PVS customer this is a no brainer, upgrade to v7.1 and turn on “cache in device RAM with overflow to hard disk” now. Your storage subsystems will thank you. The benefits are clear in both ESXi and Hyper-V alike. If you’re deploying XenDesktop soon and debating MCS vs PVS, this is a very strong mark in the “pro” column for PVS. The fact of life in VDI is that we always run out of CPU first, but that doesn’t mean we get to ignore or undersize for IO performance as that’s important too. Enabling RAM to absorb the vast majority of user write cache IO allows us to stretch our HDD subsystems even further, since their burdens are diminished. Cut your local disk costs by 2/3 or stretch those shared arrays 2 or 3x. PVS cache in RAM + overflow allows you to design your storage around capacity requirements with less need to overprovision spindles just to meet IO demands (resulting in wasted capacity).
References:
DWD Enterprise Reference Architecture
http://support.citrix.com/proddocs/topic/provisioning-7/pvs-technology-overview-write-cache-intro.html
When to Enable Intermediate Buffering for Local Hard Drive Cache