A guest post by Jodey Hogeland Introduction Nearly two years ago Dell Technologies took the storage array industry by storm with the introduction of PowerStore. In fact, the PowerStore architecture […]
Since the release of PowerStore, Dell Technologies has rapidly accelerated the pace of innovation. This is proof that PowerStore’s container based architecture is delivering features to market faster – as promised. Since launching PowerStore in May of 2020, there have been two major software version updates as well as a hardware update. The latest software update (3.0) was announced at Dell Technologies World 2022 and included a massive performance increase for customers utilizing the VMware VAAI (VMware Array API Integration) primitive known as XCOPY.
More speed is always a plus, but how do improvements in PowerStore’s XCOPY performance translate into business value and how can customers take advantage of it? Let’s discuss.
Customer Use Cases
VMware’s ESXi operating environment is one of the most widely adopted hypervisors by PowerStore customers. PowerStore customers leverage VMware’s VAAI XCOPY primitive in a couple of different way:
VM deploy and VM clone
Deploying a virtual machine from a template creates a virtual machine that is a copy of that template. The new virtual machine has the virtual hardware, installed software, and other properties that are configured for the template.
One of the steps involved in the VM deploy process is the allocation of space on a datastore (which can be configured as either VMFS or vVol ) and copying data from the template’s datastore space. This copy operation is offloaded to PowerStore as an XCOPY VAAI command which does an internal copy from the source datastore space to the destination datastore space.
Copying data is a time-consuming step of VM deploy process and the performance of the XCOPY operation determines the throughput of VM deploy (i.e. number of VMs deployed per minute).
The VM clone operation follows the same methodology where the new virtual machine is configured with the same virtual hardware, installed software, and other properties that were configured for the original virtual machine. In some cases, admins clone a small amount of VMs, in other cases hundreds or even thousands. Throughput of the VM clone operation is again determined by performance of the XCOPY operation.
Storage vMotion is a component of VMware vSphere that allows live migration of a running virtual machine’s file system from the source data-store to another destination data-store. This vMotion process could take place from datastores within the same PowerStore array or from one array to a different array. Depending on the size of VMDK, this process could take time to complete, consume significant network bandwidth, and also impact performance of other VMs on the same physical server and/or using the same network.
One way to speed-up the process and lower the performance impact is to leverage XCOPY operations supported by PowerStore when performing SVMotion between data-stores within a PowerStore array. By using this functionality, vSphere can reduce the time required to complete SVMotion by orders of magnitude and also lower the load on the ESXi server to complete SVMotion.
Copy Offload (XCOPY)
XCOPY is one of the VAAI primitives used for offloading tasks to the storage array. For example, XCOPY can be used to offload operations like migration or cloning of virtual machines to the array instead of consuming vSphere resources. XCOPY enables the storage arrays to make full copies of data within the array without having the host HBAs read and write the data. This operation reduces the time and network/SAN load when cloning virtual machines, provisioning from a template, or migrating with Storage vMotion.
PowerStore XCOPY Implementation
PowerStore implements XCOPY as a metadata-copy operation instead of a full data-copy operation. That is, instead of reading and writing the data – which involves going through the full software stack and hardware pipelines – PowerStore only updates mapping-pointers of Logical Block Addresses (LBA’s). To be more specific, write-LBAs to point to read-LBAs. This has couple of advantages:
It allows PowerStore to achieve XCOPY performance which is orders of magnitude higher than a standard data-copy (which involves reading the source and writing to the destination). All of this is accomplished withoutexhausting drive level bandwidth (BW), generating SSD wear, and consuming extreme amounts of CPU cycles. XCOPY performance is achieved both in terms of increased bandwidth and lower latency.
XCOPY operations are completed without writing any data to the drives. PowerStore basically deduplicates these blocks inline since this is a meta-data operation and is a natural part of the data path.
Another challenge generally associated with servicing XCOPY operations is its impact on regular host IO serviced by the storage array (i.e. noisy neighbor problem). This is because XCOPY operations received from ESXi servers are large in size and bursty in nature. PowerStore’s autonomous Quality-of-Service (QoS) functionality regulates XCOPY operations to keep impact on regular host IOs at minimum. Also, XCOPY operations are split into smaller pieces, which are still big enough to be very efficient, but small enough to complete very quickly, allowing host IOs to be served when needed. With this autonomous QoS functionality and efficient implementation of XCOPY operations, PowerStore can achieve high throughput for XCOPY operations while keeping the noisy-neighbor problem in check.
XCOPY performance measurements in PowerStoreOS 3.0 are demonstrating an incredible 10x improvement in bandwidth when compared to PowerStoreOS 2.0.
ESXi host settings to optimize for PS performance
PowerStore supports the SCSI T10 based VMW_VAAIP_T10 plug-in which is needed for ESXi servers to use XCOPY mechanisms. PowerStore is set to support maximum of 8 segments with a maximum segment size of 32MB-1 (32MB minus 1) or 0xffff blocks.
To enable XCOPY, the claim rule of the VAAI class needs to be created. If there is no claim rule set ESXi defaults to using a single segment. This single segment default means that the XCOPY transfer size will be 4MB for all devices unless the device reports a smaller size when queried during device configuration.
The claim rule recommended for PowerStore is created with the arguments:
These settings enable the following:
ESXi will query PowerStore settings
PowerStore will report support for a maximum of 8 segments with a maximum segment size of 32MB-1block (0xffff blocks)
ESXi will now send PowerStore XCOPY commands with up to 8 segments, with each segment being a maximum size of 30MB since this is the current maximum supported size on the ESXi side. The maximum “data size” in an XCOPY request is then a command with the max number (8) of the maximum sized segments (30MB). Total size 8 X 30 = 240MB.
It’s worth noting that the ESXi side sends whatever size commands it needs to use at that point. With the settings we want to use the average amount of data in XCOPY commands will be a good bit lower than 240MB. There will be some command sequences where four to five 240MB commands are issued sequentially. There will also be some that are “small”.
PowerStore 3.0 is delivering big across the board. The performance improvement of VMware XCOPY is massive and this allows customers using PowerStore for VMWare storage to enjoy blazing fast VM deploy, copy, and Storage vMotion times. Up to 10x faster performance is incredible and stay tuned because there is more to come!
Leave a Reply