CITRIX XENDESKTOP 5 POC On A Vblock 300-The Results are In!

Hi,

During july a lot of people gathered somewhere in Boston to run a pretty interesting POC, it was about running a pretty heavy 2,000 users workload running in a XENDESKTOP 5 environment which is provided by a Vblock 300 series. while I already published some VNX results for CITRIX XENDESKTOP, I would like to use this post to also publish some real world considerations about the deployment.

the environment overview:

A Vblock 300 GX – A full public overview can be seen from the link below:

VCE VBLOCK 300 SERIES OVERVIEW

ESX CPU Performance considerations:

The ESX’s that hosted the VDI VM’s were based around CISCO B200 with 96GB and a CPU of 2.93 GHz of RAM, this server was able to run 70 users with high workload without any problem at all, each VM ran with 1vCPU and 2GB of RAM, the ESX balloon driver worked hard but that’s what it suppose to do, so we performed a heavy memory over commitment which worked great and really provided an added value with a server that theoretically doesn’t have enough RAM (96GB of RAM) to host 70 Users each (70vms X 2gb RAM X guest overhead). no VMKernel disk swapout was noted.

80 Users on this server configuration started to show some high &RDY& values which LoginVSI didn’t complain about but I thought that they will be too high for a real world implementation so 70 remained the sweet spot

below: the environment running 2,000 users, 70 users per B200, the reason that you see 2,500 users is that at some point someone decided to stress the environment even further..I love this job!

below: the RAM usage

 

 

Below: A B200 with 70 users running heavy workload profile, note that the %RDY% is still within the “OK” limit

below: B200 with 80 users running heavy profile workload. the %RDY% is starting to look bad..

we then started to experience with a B230 server that has far more RAM (192GB to be exact) which did carry more user workload – around 83 but the TCO for this type of server Vs the B200 isnt worth it so B200 it is!!

Anti Virus Considerations:

Ok, this is a very hot topic to discuss, there are a lot of new “specific to VDI” AV solutions out there, the one that we used here wasn’t based around the vShield Endpoint API and boy, you could tell that

attached below, you can see a B200 that was perfectly capable of running 70 users before AV and now it runs 70 users with AV on, the showed the ugly head of %RDY again, so PLEASE, PLEASE PLEASE, make sure you evaluate a proper AV solution or otherwise, you are going to pay (literally speaking) a lot for the overhead that non VDI AV will bring to your environment..to be fair with the AV vendor that was used here, I wouldn’t mention the company name as the customer turned to them and they said that are in a work on a better solution..

Storage Considerations:

So, it’s one thing to preach your customers about EMC FAST CACHE:

Permanent link to CITRIX XenDesktop 5 on EMC VNX – Match made in Heaven (Part1)

and here:

Permanent link to CITRIX XenDesktop 5 on EMC VNX – Match made in Heaven (Part2)

and It’s another thing to actually eat your own dogfood with no net and see the numbers in real action, so let’s start

Read / Writes:

VDI workload (as oppose to the common belief) tend to have a very high Write percentage, how much exactly, well, it depends, I’ve seen numbers varies from 40-60% for writes, so make sure your storage array cache support both read AND writes caching technologies.

on the figure below, you can see Writes peaking from 40-60% during the test

let’s see some more numbers from the FAST CACHE perspective:

Booting simultaneously 2000 VMs on VNX 5700 (16:32)

80K IOPS in total (40K each SP).

A great response time !

A great Fast Cache utilization: (yes, 1.000 actually means near 100% writes were caches while almost 87% of the reads were absorbed by the FAST CACHE..yep, I know it sounds crazy but in a good way!

Total SP’s Utilization, they remained well below 80% (great!!!, in fact if you take a closer look you can see that the average utilization was more in the region of 60-65% utilization, also, as a real life consideration, it is very rare that all the users will work in 100% concurrency, not to mention that loginVSI simulate users that are doing heavy tasks again and again and again…my bottom line is that this VNX is far more capable in real life scenario and you can probably far more users..

Network Considerations:

Network Performance Analysis – Ethernet/IP Network.

We used the below LAN topology network for vBlock internal and external connectivity..

Key Highlights:

Overview:

  1. All links that were used, inside and out of the vBlock were running at 10 Gbps speed.
  2. Connectivity between the blade server’s chassis and the UCS 6100s was 80 Gbps.  It was formed from a total of 8 uplinks of 10 Gbps using FCoE (Ethernet and Fibre channel on same link).  All links were active. We used copper SFP+ cables.
  3. Connectivity between the two UCS 6100 and the two  Nexus 5000 were of 80.  we used total of 4 uplinks of 10Gbps, for Ethernet only, to connect a UCS 6100 to a Nexus 5000. We used copper SFP+ cables.
  4. From the Nexus 5000 to the Nexus 7000 we used total of 4 uplinks of 10Gbps Ethernet. 2 from each Nexus 5000. We used single Nexus 7010. We used optical cables.
  5. To monitor traffic between the vBlock and the external  “user network” we engineered all Ethernet traffic to one of the uplinks only (The red dot at the topology map above). This interface is “Ethernet  1/37” on “Nexus 5020-A-2”

A snip showing 640 Mbps from the test:

Key findings and conclusions:

  1. The maximum traffic that was observed on this link, at the heaviest test, with 2000 VDIs, was 700 Mbps, sustained rate.  This load represents an average of 350Kbps traffic for each VM.
  2. During all tests, the internal links, between the chassis and the UCS6100, and between the UCS6100 and the Nexus 5000 did not hit a load of more than 10% of a specific link. This means that the internal network, carrying LAN and SAN traffic is adequate for current and future needs.
  3. This findings means that the customer plan to connect the vblock to its current network with a total of 8 links of 1 Gbps, with link aggregation technology should be adequate for the expected traffic needs. Nevertheless it is strongly advised to upgrade the customer LAN switches to support 10Gbps interfaces connectivity to the vBlock.

Network Performance Analysis – Fibre-Channel/Storage Area  Network.

Pushing the envelope..

at some point we wanted to push the storage to hold more than 2,000 users and so we loaded the Vblock with 2,500 users:

now, one would expect adding a quarter of the original load to add at least 25% to the SP’s utilization, right?

Wrong!

below, you can see the SP’s utilization, only a slight increase of the original 2,000 user workload, this is to do with Mr. FAST CACHE

below, you can see the FAST CACHE utilization, almost 100%..now that’s really cool (at least in my mind..)

So, in the post i wasn’t trying to cover all the aspects but just to show you some of the highlights that a Vblock can offer to your VDI enviornment, may it be a CITRIX or  VMware, both of them are running in a very similar workload, CITRIX mcs works very similar to VIEW linked Clones and the Uset / CPU core ratio works the same..the end results is that a Vblock will ALWAYS generate you the same results over again and again in the same way you know that buying a car model from one place or the other will work the same. different Vblocks can also be managed from one console (UIM) which allow you to quickly deploy different Vblocks from one interface, imagine this, you have 6 Vblock that are all used for VDI with similar workloads running in different sites, you can create one service offering and basically clone it to the remote Vblock, let UIM do the heavy lifting for you (SAN,NAS and ESXi deployments) and you are done!

Credits:

This type of work is never a one man mission, a lot of people were involved in order to make this POC a successful one, i would like to give some credits to:

Miri Weiss Korn – EMC TC

Max (Hi Guys, this is max speaking!) Fishman – EMC TC

Gadi Feldman – CITRIX Consultant

until next time..

Similar Posts

29 Comments

    1. 28 for the VM’s workload (not including management servers, failover capacity etc`..)

  1. duncan@yellow-bricks – Netherlands – Duncan Epping is a Chief Technologist in the Office of the CTO in the Cloud Infrastructure Business Group (CIBG) at VMware. Besides writing on Yellow-Bricks, Duncan co-authors the vSAN Deep Dive book series and the vSphere Clustering Deep Dive book series. Duncan also co-hosts the Unexplored Territory Podcast.
    Duncan says:

    I wonder if the OS used what 32 or 64bit and how much memory saving was due to TPS?

    1. Hi Duncan, the OS was win7 64bit, let’s do the math togethor, 70 VM’s, each 2GB of RAM = 140GB, add each VM CPU / RAM overhead which is based around 1vCPU and 2048mb of RAM = 137.81 X 70 = 9646.7mb, so even without the ESX VMKernel overhead we are talking about pushing 150GB of RAM workload into A B200 with 96GB of RAM..

      1. duncan@yellow-bricks – Netherlands – Duncan Epping is a Chief Technologist in the Office of the CTO in the Cloud Infrastructure Business Group (CIBG) at VMware. Besides writing on Yellow-Bricks, Duncan co-authors the vSAN Deep Dive book series and the vSphere Clustering Deep Dive book series. Duncan also co-hosts the Unexplored Territory Podcast.
        Duncan says:

        Do you have the esxtop memory data? would be cool to see those as well 🙂

    1. Hi,
      we used MCS (CITRIX version of “linked clones”) worked great! and was the first time we (and CITRIX) tested MCS for more than 2,000 users, FAST Cache helped here a lot. we didn’t use XenApp or APP-V, all the applications were installed in the replica (and therefore, “in” the MCS vms)

  2. Hi itzikr Great post, thanks for the information. just wondering was the intel speedstep on the processors disabled, as UCS blades has it enabled by default and currently having problems trying to disable on B230’s. B200’s is no problem to disable. See this blog http://www.unidesk.com/blog/speedstep-and-vdi-it-good-thing-not-me . We have seen the exact same in the above blog on our B230’s and B200’s regarding performance. Cheers (IE crashed as i posted so hope it does not appear twice) 🙂

    1. MCS..traditionally speaking, MCS requires more IOPS but when using the EMC FAST Cache this goes away and you end up with the administrative benefits of MCS Vs PVS

  3. Hey Itzik,

    Sent you a mail regarding VMware View RA’s etc recently. I had been briedfed that the customer wants to deplot VMware View, when visiting the customer I noted that they actually wanted CITRIX XenDesktop with vSphere provided by VBLOCK infrastructure…Wanted to thank you for the above as it really helped me deliver a great presentation…

    🙂 Thanks again

    1. Hi,
      vSphere 5.0 wont change the results so much, however when CITRIX XD take advantage of CBRC…

Leave a ReplyCancel reply