This Blog Post is based on the work of Chhandomay Mandal, Michael Cooney and myself.

This is the first part of a two-part series blog post around evolution of VDI over the past years and how EMC XtremIO works across all of them. So let’s start.

How many of us remember Terminal Services?

Terminal services were quite popular in late nineties through early 2000s. It provided the ability to host multiple, simultaneous client sessions on a single Windows Server OS instance.

It was cost effective, easy to deliver, mature technology reaching north of 80 million users in its heyday.

 However, it had limited use cases with application compatibility issues. The most problematic of all was the fact that the application instances were shared. If somebody could crash Office, it was down for everyone on the system. If someone could get the OS to panic, then everybody was presented with the blue screen of death.

On the storage side, the hard disk drive or HDD arrays of the day with ho-hum performance and limited data services were sufficient to host the Terminal Services.

As server virtualization gained strong foothold in the data center, desktop virtualization became the next logical candidate. We could decouple the software from the hardware dependencies, and run desktops as VMs.

As the data center grade storage costs an order of magnitude higher than that of desktop storage, we were introduced to the concepts of gold images and differential data to make the storage economics work. Instead of everybody having their own desktop image, all desktops shared the same read-only gold image and the individual desktop writes destined to the OS got written to the unique differential data space of a desktop.

 So, in this model, you need, say, 40 GB for gold image, and 2 GB for differential data per desktop. For a 10,000 desktop deployment, your capacity requirement is roughly 20TB, which is reasonable.

 On the flip side, differential data get discarded when users log off or desktops reboot. So we solved the capacity problem at the expense of user personalization. This type of desktops are commonly referred as non-persistent desktops as they don’t retain changes made by the user, and are commonly deployed for task worker use cases like call centers.

During the same time, hybrid arrays came to the market that added SSDs into HDD arrays. The hybrid arrays leverage SSDs as a cache or a tier in front of HDDs to provide some performance acceleration. Now, VDI desktops create periodic I/O storms. For example, the gold image gets hammered with read I/O requests when many desktops boot simultaneously. Although hybrid arrays provide some relief to the IO storms because of SSDs in their stack, they neither scale for thousands of desktops nor have the agile, efficient data services needed in a modern data center.

Persistent desktop, where you take everything you have in your physical desktop – OS, applications, user settings, data – and just make it a VM, is an intuitive solution.

In this model, users are happy because they retain all their personalization including the applications they installed. Moreover, desktop admins are happy too because all the existing desktop management tools like System Center Configuration Manager, Patch Manager and other agents continue to work exactly as before.

Now, persistent desktops generate higher IOPS per desktop than their non-persistent counterparts, but, most importantly, this VDI model has extremely large capacity needs. Continuing with the same example as before, for 10,000 users at 40GB per persistent desktop, your capacity need is 400TB. So the capacity requirement increased 20x, from 20TB in non-persistent desktops to 400TB in persistent desktops for the same number of desktops. A lot of these are data are common, as Windows OS and applications constitute the bulk of it. A highly efficient data reduction technology at the heart of the storage layer is the most critical component for any persistent desktop model to be remotely economically feasible.

On the storage side, around 2012, All-Flash arrays started to come to the market. Some of them provided high performance with zero data services; others provided performance improvements beyond what the hybrid arrays could deliver along with limited data services including some data reduction. However, as these All-Flash arrays are based on scale-up architectures, their controllers become the bottleneck for performance much earlier than their backend SSDs. Moreover, data reduction service, like data deduplication, is a post-process activity for these All-Flash arrays. So a typical scale-up architecture based All-Flash array in the market can’t deliver the performance or the high data reduction efficiencies that are critical for a successful persistent desktop deployment at large scale.

In recent months, newer technologies are coming up in the VDI platform software side, that has the promise of delivering a no-compromise, all-inclusive desktop experience, when paired with the right storage platform. We can now deliver desktops at scale that can easily handle graphics-rich applications, seamlessly work across all use cases and can be better than physical desktops for both user experience and cost.

Storage is the critical component in delivering the promise of next-gen desktops. A scale-out, truly N-way active-active architecture is needed to deliver the high performance at a consistently low latency needed to scale your virtual desktop environment while maintaining a great end-user experience. Inline, all-the time data reduction technologies are critical to reduce the capacity footprint. XtremIO is the only solution in the market today that can not only satisfy all the requirements of non-persistent and persistent desktops at scale but also deliver on the emerging VDI platform software technologies.

As many of you can attest, XtremIO can run non-persistent desktops very well. It delivers high performance with consistently low latency, thereby enabling you to host a large number of desktops on a single X-Brick, which is the basic building of an XtremIO cluster with two controllers and a DAE that can have 25 SSDs. You can run storage-intensive operations like desktop refresh and recompose in a non-persistent desktop environment while other active desktops are running, and the array will continue to deliver high level of user experience.

However, if you are running non-persistent desktops only because of capacity savings you need, then XtremIO has good news for you.

The same single X-Brick can host a high number of persistent desktops as well. Your users can have a highly responsive desktop with all their personalization and your desktop admins can continue to use the same set of desktop administration tools while enjoying the superior storage capacity reduction from XtremIO to make the solution very cost-effective.

What makes XtremIO unique for VDI?

  1. In a nutshell, XtremIO’s unique content-based metadata engine coupled with its scale-out architecture delivers the unique value for VDI. You grow an XtremIO cluster by non-disruptively adding additional X-Bricks, linearly increasing both capacity and performance at the same time.

2. The volumes are always thin-provisioned, the only writes performed to the disks are for data that are globally unique across the entire cluster.

3. All the data services are inline, all the time. So as the writes come in, the metadata engine looks at the content of the data blocks, performing the writes to the SSDs only if the cluster didn’t see the data before; for duplicate data, it just acknowledges the writes to the host with appropriate updates to in-memory metadata without actually performing any writes to the SSDs. This inline deduplication saves tremendous capacity for persistent desktops without affecting performance at all.

4. Inline compression adds to XtremIO’s data reduction efficiencies.

5. XtremIO has a proprietary Flash-based data protection algorithm that offers better than RAID-10 performance with RAID 5 capacity savings.

6. Finally, XtremIO’s differentiated agile copy data service helps you provision and deploy desktops at a much faster rate than any other Flash-based solution in the market.

XtremIO’s architectural advantages translate into very significant savings for our VDI customers. Today we have over 2.5 million virtual desktops running on XtremIO, and our customers typically see 10:1 or more data reduction ratios for persistent desktops with 50% less cost per desktop than traditional storage.

A single X-Brick can host up to 2500 persistent desktops and up to 3500 non-persistent desktops.

I want to add that beyond storage, XtremIO helps in reducing server infrastructure footprint as well for VDI. As an example, we have had customers who could reduce the RAM allocation from 2 GB/desktop to 1.5 GB/desktop, that’s a 25% reduction in RAM, and XtremIO handled the resulting increased IOPS due to more swapping by the OS with the same latency for the same number of hosted desktops on the array.

With XtremIO, Desktop Virtualization has experienced some breakthroughs including:

  • 10:1 Reduction in capacity requirements per desktop
  • 50% Reduction in Storage Cost Per Desktop
  • 25% Lower RAM requirements
    • Based on reducing RAM needed per desktop from 2GB to 1.5GB
      • XtremIO delivers the performance experience of a 2GB desktop using only 1.5GB of RAM
      • Not all IO performance comes from RAM. Flash is faster than the desktop O/S, Desktop kernel & Hypervisor

expects disk to be. Previously adding desktop RAM solved some IO problems. This is no longer as necessary.

The disk performance now comes directly from the XtremIO disk, not virtual desktop “RAM trickery”.

  • 40% Reduction in Server Infrastructure.
    • If you need to support all desktop services at scale
    • Example: Suspend & Resume desktop services for 100% of desktops = a lot of extra storage sitting around

being idle 99% of the time. With XtremIO, admins can oversubscribe the storage needed for desktop

services & provision less XtremIO storage (don’t need 1:1 mapping).

  • Each X-Brick can support up to 2,500 persistent virtual desktops and up to 3,500 non-persistent desktops.

Now let’s see how a single X-Brick handle a boot storm of 2,500 VDI VMs!

now lets see how the same single X-Brick handle the load of 2,500 Full clones VMs (Office is installed locally)

note that im using LoginVSI 4.1 Knowledge Worker which creates additional load across the board (everyone else out there are still using the “medium” workload which is lighter on the CPUs and the storage array)

see the full differences here:

Now let’s take a look at some of the emerging trends in VDI. First, just-in-time desktops.

In a traditional desktop model, applications, user data, profile settings, OS are all intermingled together. When you deliver persistent desktops using this traditional model, it is the old wine in a new bottle and you miss out the potential of improving the desktop management workflow itself.

Instead, what if you could virtualize each application or a set of related applications separately in their own containers? Alongside, each user gets their own container for their specific data & applications. Think of it, each of these containers being a vmdk file. And then, you can put together everything at run time by collecting the appropriate set of containers for a specific user as and when he needs to have access to his desktop. Let’s illustrate this further.

You have a pool of stateless desktops with nothing but OS. Applications have their own containers; same for each user specific data.

Office worker Joe comes in. He gets one of the stateless desktops, Adobe & Office applications get added, Joe’s personal data along with the applications he installed on his desktop get mounted from Joe’s user data volume, and he gets his own customized desktop.

Now let’s consider designer Bob. When he logs in, it is the same process but this time, based on Bob’s profile, the graphics-intensive design applications get added too for Bob’s personalized desktop.

So this combines the best of both non-persistent and persistent world. Users get customizable desktops and apps with consistent experience across sessions. IT can easily update and deliver apps, secure the data and enjoy the better economics.

The chart shows the results of application response times, as measured by the industry-standard real world VDI load generator tool LoginVSI, of 2,500 desktops running on a single 10 TB X-Brick. First, LoginVSI generated the VDI load for 2,500 persistent (full clone) desktops. The application response times – as the number of desktops are increased – are shown in blue. Then the 2,500 persistent (full clone) desktops were converted into layered desktops. In this specific example, VMware App Volumes were used to create the application containers and delivery of applications. When the same 2,500 layered desktops were run, the application response times showed similar patterns but were roughly 15% higher throughout the test than before.

The takeaways here are:

  1. The layered desktops create more IOPS/desktop than their persistent full clone counterparts. So a platform that can deliver high IOPS with consistently low latency is more important than ever in this type of next-gen desktops.
  2. The I/O profiles change from the typical VDI I/O patterns that we are used to. For example, application container volumes will see high I/Os as the application volumes are effectively read-only. However, there will be variations of the read I/O intensity; e.g. the Microsoft Office container volume will likely see higher read I/Os than a specialty application as there are many more users of Office than a department-specific specialty application. On the other hand, containers for user specific information will see high write I/O profile. Unless the storage array balances the load uniformly and automatically across all controllers and all SSDs in a true N-way active-active scale-out architecture with radically simple array management, the SAN administrators will be hard pressed to optimize the storage platform to deliver best user experience with these next-gen desktops.
  3. Finally, you can see there were some high utilization spikes in persistent (full clone) desktops in blue; the spikes with layered desktops (in purple) were less in magnitude but higher in numbers (existing pretty consistently throughout the tests). It shows that the underlying storage platform needs to deliver high performance with consistently low latencies irrespective of the VDI I/O load for these next-gen desktops.

High IOPS is a critical need for any VDI deployment. Here is an example of what XtremIO can deliver for VDI. In this example, the XMS dashboard is showing that the X-Brick is delivering nearly 125K IOPS when 1,000 persistent (full-clone) desktops are booted simultaneously.

High throughput is another critical need for any VDI deployment. Time needed to clone many desktops from a template is dependent on the throughput that an array can deliver. Here is an example of what XtremIO can do for VDI. In this example, the XMS dashboard is showing that the X-Brick is delivering nearly 45 GB/s throughput when 1,000 persistent (full clone) desktops are cloned simultaneously from a template. 45GB/s is an insanely high throughput, the array is able to deliver this throughput at the target side due to XtremIO’s in-memory metadata architecture, and unique in-memory metadata based implementation of VMware’s VAAI Copy Offload API.

Consistently low-latency is another important criteria for the array to deliver highly-responsive, better-than-physical user experience to all the users, all the time, at any scale. Here is another example of what XtremIO can deliver for VDI. In this example, the XMS dashboard is showing that the X-Brick is delivering sub-millisecond latency for 1,000 persistent (full clone) desktops at steady state.

Below you can see a testing I ran with Horizon 6 + App volumes for 2,500 users running on a single 10TB X-Brick!

in part 2, i will discuss Desktop As A Service (DaSS) and virtualizing GPUs


Leave a ReplyCancel reply