Host Configuration for VMware® vSphere On EMC XtremIO
Hi, There has been a lot of questions lately around best practices when using XtremIO with vSphere, attached below is the extract of our user guide vSphere section, why am […]
Dell Storage, PowerStore, PowerFlex PowerMax & PowerScale, Virtualization & Containers Technologies
Hi, There has been a lot of questions lately around best practices when using XtremIO with vSphere, attached below is the extract of our user guide vSphere section, why am […]
Hi,
There has been a lot of questions lately around best practices when using XtremIO with vSphere, attached below is the extract of our user guide vSphere section, why am I posting it online then? Because, user guides are can be somewhat difficult to find if you don’t know where to look and google is your best friend..
Please note that this section refer to the vSphere cluster as it exclusively connected to the XtremIO array, if you are using a mixed cluster environment, some of these parameters will be different, a later post will follow up on that scenario.
Note: XtremIO Storage Array supports both ESX and ESXi. For simplification, all references to ESX server/host apply to both ESX and ESXi, unless stated otherwise.
Note: In hosts running a hypervisor, such as VMware ESX or Microsoft Hyper-V, it is important to ensure that the logical unit numbers of XtremIO volumes are consistent across all hosts in the hypervisor cluster. Inconsistent LUNs may affect operations such as VM online migration or VM power-up.
Note: When using Jumbo Frames with VMware ESX, the correct MTU size must be set on the virtual switch as well.
Fibre Channel HBA Configuration
When using Fibre Channel with XtremIO, the following FC Host Bus Adapters (HBA) issues should be addressed for optimal performance.
To install one or more EMC-approved HBAs on an ESX host, follow the procedures in one of these documents, according to the FC HBA type:
For Qlogic and Emulex HBAs – Typically the driver for these HBAs is preloaded with ESX. Therefore, no further action is required. For details, refer to the vSphere and HBA documentation.
For Cisco UCS fNIC HBAs (vsphere 5.x and above) – Refer to the Virtual Interface Card Drivers section in the Cisco UCS Manager Install and Upgrade Guides for complete driver installation instructions
(http://www.cisco.com/en/US/partner/products/ps10281/prod_installation_guides _list.html).
Note: Changing the HBA queue depth is designed for advanced users. Increasing queue depth may cause hosts to over-stress other arrays connected to the ESX host, resulting in performance degradation while communicating with them. To avoid this, in mixed environments with multiple array types connected to the ESX host, compare the XtremIO recommendations with those of other platforms before applying them.
This section describes the required steps for adjusting I/O throttle and queue depth settings for Qlogic, Emulex, and Cisco UCS fNIC. Follow one of these procedures according to the vSphere version used.
The queue depth setting controls the amount of outstanding I/O requests per a single path. On vSphere, the HBA queue depth can be adjusted through the ESX CLI.
Execution throttle settings control the amount of outstanding I/O requests per HBA port.
The HBA execution throttle should be set to the maximum value. This can be done on the HBA firmware level using the HBA BIOS or CLI utility provided by the HBA vendor:
Qlogic – Execution Throttle – This setting is no longer read by vSphere and is therefore not relevant when configuring a vSphere host with Qlogic HBAs.
Emulex – lpfc_hba_queue_depth – No need to change the default (and maximum) value (8192).
For Cisco UCS fNIC, the I/O Throttle setting determines the total number of outstanding I/O requests per virtual HBA.
For optimal operation with XtremIO storage, it is recommended to adjust the queue depth of the FC HBA. With Cisco UCS fNIC, it is also recommended to adjust the I/O Throttle setting to 1024.
Note: For further information on adjusting HBA queue depth with ESX, refer to VMware KB article 1267 on the VMware website
(http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displa yKC&externalId=1267).
Note: If the execution throttle in the HBA level is set to a value lower than the queue depth, it may limit the queue depth to a lower value than set.
Note: The setting adjustments in this section for Cisco UCS fNIC HBA apply to VMware vSphere only. Since these settings are global to the UCS chassis, they may impact other blades in the UCS chassis running different OS (e.g. Windows).
To adjust HBA I/O throttle of the Cisco UCS fNIC HBA:
Note: For more details on Cisco UCS fNIC FC adapter configuration, refer to:
Fibre Channel HBA Configuration
To adjust the HBA queue depth on a host running vSphere 5.x or above:
HBA Vendor | Command |
Qlogic | esxcli system module list | egrep “ql|Loaded” |
Emulex | esxcli system module list | egrep “lpfc|Loaded” |
Cisco UCS fNIC | esxcli system module list | egrep “fnic|Loaded” |
Example (for a host with Emulex HBA):
# esxcli system module list | egrep “lpfc|Loaded” Name Is Loaded Is Enabled lpfc true true lpfc820 false true
In this example the native lpfc module for the Emulex HBA is currently loaded on ESX.
Note: The commands displayed in the table refer to the Qlogic qla2xxx/qlnativefc, Emulex lpfc and Cisco UCS fNIC modules. Use an appropriate module name based on the output of the previous step.
HBA Vendor | Command |
Qlogic | vSphere 5.x: esxcli system module parameters set -p ql2xmaxqdepth=256 -m qla2xxx |
vSphere 6.x: esxcli system module parameters set -p qlfxmaxqdepth=256 -m qlnativefc |
|
Emulex | esxcli system module parameters set -p lpfc0_lun_queue_depth=128 -m lpfc |
Cisco UCS fNIC |
esxcli system module parameters set –p fnic_max_qdepth=128 –m fnic |
Note: The command for Emulex HBA adjusts the HBA queue depth for the lpfc0 Emulex HBA. If another Emulex HBA is connected to the XtremIO storage, change lpfc0_lun_queue_depth accordingly. For example, if lpfc1 Emulex HBA is connected to XtremIO, replace lpfc0_lun_queue_depth with lpfc1_lun_queue_depth.
Note: If all Emulex HBAs on the host are connected to the XtremIO storage, replace lpfc0_lun_queue_depth with lpfc_lun_queue_depth.
esxcli system module parameters list -m <driver>
Note: When using the command, replace <driver> with the module name, as received in the output of step 2 (for example, lpfc, qla2xxx and qlnativefc).
Examples:
# esxcli system module parameters list -m qla2xxx | grep ql2xmaxqdepth
ql2xmaxqdepth int 256 Max queue depth to report for target devices.
# esxcli system module parameters list -m qlnativefc | grep qlfxmaxqdepth
qlfxmaxqdepth int 256 Maximum queue depth to report for target devices.
# esxcli system module parameters list -m lpfc | grep lpfc0_lun_queue_depth
lpfc0_lun_queue_depth int 128 Max number of FCP commands we can queue to a specific LUN
If queue depth is adjusted for all Emulex HBAs on the host, run the following command instead:
# esxcli system module parameters list|-m lpfc | grep lun_queue_depth
Host Parameters Settings
This section details the ESX host parameters settings necessary for optimal configuration when using XtremIO storage.
Note: The following setting adjustments may cause hosts to over-stress other arrays connected to the ESX host, resulting in performance degradation while communicating with them. To avoid this, in mixed environments with multiple array types connected to the ESX host, compare these XtremIO recommendations with those of other platforms before applying them.
When using XtremIO storage with VMware vSphere, it is recommended to set the following parameters to their maximum values:
Disk.SchedNumReqOutstanding – Determines the maximum number of active storage commands (I/Os) allowed at any given time at the VMkernel. The maximum value is 256.
Note: When using vSphere 5.5 or above, the Disk.SchedNumReqOutstanding parameter can be set on a specific volume rather than on all volumes presented to the host. Therefore, it should be set only after XtremIO volumes are presented to the ESX host using ESX command line.
Disk.SchedQuantum – Determines the maximum number of consecutive “sequential” I/Os allowed from one VM before switching to another VM (unless this is the only VM on the LUN). The maximum value is 64.
In addition, the following parameter setting is required:
Disk.DiskMaxIOSize
– Determines the maximum I/O request size passed to
storage devices. With XtremIO, it is required to change it from 32767 (default setting of 32MB) to 4096 (4MB). This adjustment allows a Windows VM to EFI boot from XtremIO storage with a supported I/O size of 4MB.
Note: For details on adjusting the maximum I/O block size in ESX, refer to VMware KB article 1003469 on the VMware website
(http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&docType= kc&docTypeID=DT_KB_1_1&externalId=1003469).
These setting adjustments should be carried out on each ESX host connected to XtremIO cluster via either the vSphere Client or the ESX command line.
To adjust ESX host parameters for XtremIO storage, follow one of these procedures:
Using the vSphere WebUI client:
Note: Do not apply step 5 in a vSphere 5.5 (or above) host, where the parameter is set on a specific volume using ESX command line.
Using the ESX host command line (for vSphere 5.0 and 5.1):
SchedNumReqOutstanding, and DiskMaxIOSize parameters, respectively:
Using the ESX host command line (for vSphere 5.5 or above):
vCenter Server Parameter Settings
The maximum number of concurrent full cloning operations should be adjusted, based on the XtremIO cluster size. The vCenter Server parameter
config.vpxd.ResourceManager.maxCostPerHost determines the maximum
number of concurrent full clone operations allowed (the default value is 8). Adjusting the parameter should be based on the XtremIO cluster size as follows:
10TB Starter X-Brick (5TB) and a single X-Brick – 8 concurrent full clone operations
Two X-Bricks – 16 concurrent full clone operations
Four X-Bricks – 32 concurrent full clone operations
Six X-Bricks – 48 concurrent full clone operations
To adjust the maximum number of concurrent full cloning operations:
vStorage API for Array Integration (VAAI) Settings
VAAI is a vSphere API that offloads vSphere operations such as virtual machine provisioning, storage cloning and space reclamation to storage arrays that supports VAAI. XtremIO Storage Array fully supports VAAI.
To ensure optimal performance of XtremIO storage from vSphere, VAAI must be enabled on the ESX host before using XtremIO storage from vSphere. Failing to do so may expose the xtremIO cluster to the risk of datastores becoming inaccessible to the host.
This section describes the necessary settings for configuring VAAI for XtremIO storage.
When using vSphere version 5.x and above, VAAI is enabled by default. Before using the XtremIO storage, confirm that VAAI features are enabled on the ESX host.
To confirm that VAAI is enabled on the ESX host:
If any of the above parameters are not enabled, adjust them by clicking the Edit icon and click OK.
If VAAI setting is enabled after a datastore was created on XtremIO storage, the setting does not automatically propagate to the corresponding XtremIO Volumes. The setting must be manually configured to avoid data unavailability to the datastore.
Perform the following procedure on all datastores created on XtremIO storage before VAAI is enabled on the ESX host.
To manually set VAAI setting on a VMFS-5 datastore created on XtremIO storage with VAAI disabled on the host:
# vmkfstools -Ph -v1 <path to datastore> | grep public
In the following example, a datastore volume is configured as “public”
# vmkfstools -Ph -v1 /vmfs/volumes/datastore1 | grep public Mode: public
In the following example, a datastore volume is configured as “public ATS-only”
# vmkfstools -Ph -v1 /vmfs/volumes/datastore2 | grep public Mode: public ATS-only
# vmkfstools –configATSOnly 1 <path to datastore>
By default, vSphere instructs the storage array to copy data in 4MB chunks. To optimize VAAI XCOPY operation with XtremIO, it is recommended to adjust the chunk size to 256KB. The VAAI XCOPY chunk size is set using the MaxHWTransferSize parameter.
To adjust the VAAI XCOPY chunk size, run the following CLI commands according to the vSphere version running on your ESX host:
For vSphere version earlier than 5.5:
esxcli system settings advanced list -o
/DataMover/MaxHWTransferSize
esxcli system settings advanced set –int-value 0256
–option /DataMover/MaxHWTransferSize
For vSphere version 5.5 and above:
esxcfg-advcfg -s 0256 /DataMover/MaxHWTransferSize
Disabling VAAI in ESX
In some cases (mainly for testing purposes) it is necessary to temporarily disable VAAI.
As a rule, VAAI should be enabled on an ESX host connected to XtremIO. Therefore, avoid disabling VAAI or temporarily disable it if required.
Note: For further information about disabling VAAI, refer to VMware KB article 1033665 on the VMware website
(http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displa yKC&externalId=1033665).
As noted in the Impact/Risk section of VMware KB 1033665, disabling the ATS (Atomic Test and Set) parameter can cause data unavailability in ESXi 5.5 for volumes created natively as VMFS5 datastore.
To disable VAAI on the ESX host:
Multipathing Software Configuration
Note: You can use EMC Virtual Storage Integrator (VSI) Path Management to configure path management across EMC platforms, including XtremIO. For information on using this vSphere Client plug-in, refer to the EMC VSI Path Management Product Guide.
XtremIO supports the VMware vSphere Native Multipathing (NMP) technology. This section describes the procedure required for configuring native vSphere multipathing for XtremIO volumes.
For best performance, it is recommended to do the following:
Set the native round robin path selection policy on XtremIO volumes presented to the ESX host.
Note: With NMP in vSphere versions below 5.5, clustering is not supported when the path policy is set to Round Robin. For details, see vSphere MSCS Setup Limitations in the Setup for Failover Clustering and Microsoft Cluster Service guide for ESXi
5.0
or ESXi/ESX 4.x. In vSphere 5.5, Round Robin PSP (PSP_RR) support is introduced. For details, see MSCS support enhancements in vSphere 5.5 (VMware KB 2052238).
Set the vSphere NMP Round Robin path switching frequency to XtremIO volumes from the default value (1000 I/O packets) to 1.
These settings ensure optimal distribution and availability of load between I/O paths to the XtremIO storage.
Note: Use the ESX command line to adjust the path switching frequency of vSphere NMP Round Robin.
To set vSphere NMP Round-Robin configuration, it is recommended to use the ESX command line for all the XtremIO volumes presented to the host. Alternatively, for an XtremIO volume that was already presented to the host, use one of the following methods:
Per volume, using vSphere Client (for each host where the volume is presented)
Per volume, using ESX command line (for each host where the volume is presented)
The following procedures detail each of these three methods.
To configure vSphere NMP Round Robin as the default pathing policy for all XtremIO volumes, using the ESX command line:
Note: Use this method when no XtremIO volume is presented tothe host. XtremIO volumes already presented to the host are not affected by this procedure (unless they are unmapped from the host).
esxcli storage nmp satp rule add -c tpgs_off -e “XtremIO
Active/Active” -M XtremApp -P VMW_PSP_RR -O iops=1 -s
VMW_SATP_DEFAULT_AA -t vendor -V XtremIO
This command also sets the vSphere NMP Round Robin path switching frequency for newly defined XtremIO volumes to one (1).
Note: Using this method does not impact any non-XtremIO volume presented to the ESX host.
To configure vSphere NMP Round Robin on an XtremIO volume in an ESX host, using vSphere WebUI Client:
To configure vSphere NMP Round Robin on an XtremIO volume in an ESX host, using ESX command line:
#esxcli storage nmp path list | grep XtremIO -B1
esxcli storage nmp device set –device <naa_id> –psp VMW_PSP_RR
Example:
#esxcli storage nmp device set –device naa.514f0c5e3ca0000e
–psp VMW_PSP_RR
Note: When using this method, it is not possible to adjust the vSphere NMP Round Robin path switching frequency. Adjusting the frequency changes the NMP PSP policy for the volume from round robin to iops, which is not recommended with XtremIO. As an alternative, use the first method described in this section.
For details, refer to VMware KB article 1017760 on the VMware website
(http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&docType=kc& docTypeID=DT_KB_1_1&externalId=1017760).
Note: For the most updated information on PowerPath support with XtremIO storage, refer to the XtremIO Simple Support Matrix.
XtremIO supports multipathing using EMC PowerPath on Linux. PowerPath provides array-customized LAMs (native class support) for XtremIO volumes. PowerPath array-customized LAMs feature optimal failover and load balancing behaviors for the XtremIO volumes, managed by PowerPath.
For details on the PowerPath/VE releases supported for your VMware vSphere host, refer to the XtremIO Simple Support Matrix.
For details on native class support with XtremIO for your host, refer to the EMC PowerPath/VE release notes document for the PowerPath/VE version you are installing.
For details on installing and configuring PowerPath/VE with XtremIO native class on your host, refer to the EMC PowerPath on VMware vSphere Installation and Administration Guide for the PowerPath/VE version you are installing. This guide provides the required information for placing XtremIO volumes under PowerPath/VE control.
When host configuration is completed, you can use the XtremIO storage from the host. For details on creating, presenting and managing volumes that can be accessed from the host via either GUI or CLI, refer to the XtremIO Storage Array User Guide that matches the version running on your XtremIO cluster.
EMC Virtual Storage Integrator (VSI) Unified Storage Management version 6.2 and above can be used to provision from within vSphere Client Virtual Machine File System (VMFS) datastores and Raw Device Mapping volumes on XtremIO. Furthermore, EMC VSI Storage Viewer version 6.2 (and above) extends the vSphere Client to facilitate the discovery and identification of XtremIO storage devices allocated to VMware ESX/ESXi hosts and virtual machines.
For further information on using these two vSphere Client plug-ins, refer to the VSI Unified Storage Management product guide and the VSI Storage Viewer product guide.
When creating volumes in XtremIO for a vSphere host, the following considerations should be made:
Disk logical block size – The only logical block (LB) size supported by vSphere for presenting to ESX volumes is 512 bytes.
Note: In XtremIO version 4.0.0 (and above), the Legacy Windows option is not supported.
Disk alignment – Unaligned disk partitions may substantially impact I/O to the disk.
With vSphere, data stores and virtual disks are aligned by default as they are created. Therefore, no further action is required to align these in ESX.
With virtual machine disk partitions within the virtual disk, alignment is determined by the guest OS. For virtual machines that are not aligned, consider using tools such as UBERalign to realign the disk partitions as required.
Note: When using iSCSI software initiator with ESX and XtremIO storage, it is recommended to use only lower case characters in the IQN to correctly present the XtremIO volumes to ESX. For more details, refer to VMware KB article 2017582 on the VMware website. http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=display KC&externalId=2017582
When adding Initiator Groups and Initiators to allow ESX hosts to access XtremIO volumes, specify ESX as the operating system for newly-created Initiators, as shown in the figure below.
Note: Refer to the XtremIO Storage Array User Guide that matches the version running on your XtremIO cluster.
Following a cluster upgrade from XtremIO version 3.0.x to version 4.0 (or above), make sure to modify the operating system for each initiator that is connected to an ESX host.
It is recommended to create the file system using its default block size (using a non-default block size may lead to unexpected behavior). Refer to your operating system and file system documentation.
This section details the considerations and steps that should be performed when using LUN 0 with vSphere.
Notes on the use of LUN numbering:
In XtremIO version 4.0.0 (or above), volumes are numbered by default starting from LUN id 1 (and not 0 as was the case in previous XtremIO versions).
Although possible, it is not recommended to manually adjust the LUN id to 0, as it may lead to issues with some operating systems.
When a cluster is updated from XtremIO version 3.0.x to 4.0.x, an XtremIO volume with a LUN id 0 remains accessible following the upgrade.
With XtremIO version 4.0.0 (or above), no further action is required if volumes are numbered starting from LUN id 1.
By default, an XtremIO volume with LUN0 is inaccessible to the ESX host.
Note: Performing the described procedure does not impact access to XtremIO volumes with LUNs other than 0.
When native multipathing is used, do not to use LUN0, or restart the ESX if the rescan fails to find LUN0.
For optimal performance, it is recommended to format virtual machines on XtremIO storage, using Thick Provision Eager Zeroed. Using this format, the required space for the virtual machine is allocated and zeroed on creation time. However, with native XtremIO data reduction, thin provisioning, and VAAI support, no actual physical capacity allocation occurs.
Thick Provision Eager Zeroed format advantages are:
Logical space is allocated and zeroed on virtual machine provisioning time, rather than scattered, with each I/O sent by the virtual machine to the disk (when Thick Provision Lazy Zeroed format is used).
Thin provisioning is managed in the XtremIO Storage Array rather than in the ESX host (when Thin Provision format is used).
To format a virtual machine using Thick Provision Eager Zeroed:
This section provides a comprehensive list of capacity management steps for achieving optimal capacity utilization on the XtremIO array, when connected to an ESX host.
Data space reclamation helps to achieve optimal XtremIO capacity utilization. Space reclamation is a vSphere function, enabling to reclaim used space by sending zeros to a specific address of the volume, after being notified by the file system that the address space was deleted.
Unlike traditional operating systems, ESX is a hypervisor, running guest operating systems on its file-system (VMFS). As a result, space reclamation is divided to guest OS and ESX levels.
ESX level space reclamation should be run only when deleting multiple VMs, and space is reclaimed from the ESX datastore. Guest level space reclamation should be run as a periodic maintenance procedure to achieve optimal capacity savings.
The following figure displays a scenario in which VM2 is deleted while VM1 and VM3 remain.
On VSI environments, every virtual server should be treated as a unique object. When using VMDK devices, T10 trim commands are blocked. Therefore, it is required to run space reclamation manually. RDM devices pass through T10 trim commands.
There are two types of VDI provisioning that differ by their space reclamation guidelines:
Temporary desktop (Linked Clones) – Normally, temporary desktops are deleted once the end users log off. Therefore, running space reclamation on the guest OS is not relevant, and only ESX level space reclamation should be used.
Persistent desktop (Full Clones) – Persistent desktop contains long-term user data. Therefore, space reclamation should be run on guest OS level first, and only then on ESX level.
On large-scale VSI/VDI environments, it is recommended to divide the VMs to groups to avoid overloading the SAN fabric.
ESX 5.1 and below
In versions prior to ESX 5.5, the vmkfstools command is used for space-reclamation. This command supports datastores up to 2TB.
The following example describes running vmkfstool on a datastore XtremIO_DS_1 with 1% free space to allow user writes.
# cd /vmfs/volumes/XtremIO_DS_1
# vmkfstools -y 99
Vmfs reclamation may fail due to T10 commands blocking (VPLEX). In such cases, it is required to apply a manual copy of zeroes to the relevant free space.
The following example describes running a manual script on X41-VMFS-3 datastore (refer to “ESX Shell Reclaim Script” on page 62).
# ./reclaim_space.sh X41-VMFS-3
ESX 5.5 and above
ESX 5.5 introduces a new command for space reclamation and supports datastores larger than 2TB.
The following example describes running space reclamation on a datastore XtremIO_DS_1:
# esxcli storage vmfs unmap –volume-label=XtremIO_DS_1 –reclaim-unit=20000
The reclaim-unit argument is an optional argument, indicating the number of vmfs blocks to UNMAP per iteration.
Vmfs reclamation may fail due to T10 commands blocking (VPLEX). In such cases, it is required to apply a manual copy of zeroes to the relevant free space.
The following exmaple describes running a manual script on X41-VMFS-3 datastore (refer to “ESX Shell Reclaim Script” on page 62):
# ./reclaim_space.sh X41-VMFS-3
The following example describes an ESX shell reclaim script.
for i in $1 do size=$(df -m|grep $i|awk ‘{print $4}’) name=$(df -m|grep $i|awk ‘{print $NF}’) reclaim=$(echo $size |awk ‘{printf “%.f\n”,$1 * 95 / 100}’) echo $i $name $size $reclaim dd count=$reclaim bs=1048576 if=/dev/zero of=$name/zf sleep 15 /bin/sync rm -rf $name/zf done
Note: While Increasing percentage leads to elevated precision, it may increase the probability of receiving a ‘no free space’ SCSI error during the reclamation.
TPSTUN is a VAAI primitive that enables the array to notify vSphere when a LUN is running out of space due to thin provisioning over-commit. The command causes suspending all virtual machines on that LUN. XtremIO supports this VAAI primitive.
A virtual machine provisioned on a LUN that is aproaching full capacity usage becomes suspended, and the following message appears:
At this point, the VMware administrator can resolve the out-of-space situation on the XtremIO cluster, and prevent the guest OS in the VMs from crushing.
Hi Itzik ,
This is a really really good tutorial.
Is this config also comes handy with vplex gen2 with XIO under.
Regards
Elad Mako
Hi Elad,
some of the practices wont be relevant, best to consult with a VPLEX expert.
Any news on the UNMAP implementation on VPLEX Metro cluster?
Also, I have seen that running the nightly backups, once the “backup snapshots” are removed by the backup software, the space is not claimed back by ESX on Xtremio, through VPLEX.
This will be true also in the case VPLEX will implement the UNMAP feature since it’s ESX “fault”: it is not issuing the UNMAP command when deleting a snapshot.
Any roadmap for the “automatic unmap” implementation in ESX?
Thanks.
Matteo