So Far, we provided, an high level overview of the PowerStore 2.0 release and, a deep dive into the DRE enhancements Today we officially released support of NVMe-oF support on […]
Today we officially released support of NVMe-oF support on the PowerStore. This is an important step forward for PowerStore that now has full end-to-end support of NVMe:
So, first of all, what is NVMe?
NVMe is a non-uniform memory access (NUMA) optimized and highly scalable storage protocol that connects a host to a solid-state
memory subsystem. Since NVMe is NUMA-optimized, multiple CPU cores can share the ownership of queues and individual read and write
commands. As such, NVMe SSDs can scatter and gather map commands and process them order optimized to lower command completion
NVMe over FC
NVMe-oF stands for Non-Volatile Memory express over Fabrics
NVMe over FC (FC-NVMe) is a communications standard, which defines how NVMe commands can be encapsulated inside of FC frames and
transported over an FC network. This is similar to the way that SCSI-FCP allows SCSI frames to be encapsulated and sent over that same
network. Once encapsulated, NVMe frames can be routed over an FC network the exact same way that SCSI frames are routed today. If an
FC switch is successfully configured to connect to a storage device, then no additional configuration on the FC switch is required to enable
NVMe over FC connectivity to that storage device.
This is the HBA used to discover NVMe-oF storage devices
This is the target manager on the array, A controller is associated with one or several NVMe namespaces and provides an access path between the ESXi host and the namespaces in the storage array. To access the controller, the host can use two mechanisms, controller discovery and controller connection.
In the NVMe storage array, a namespace is a storage volume backed by some quantity of non-volatile memory. In the context of ESXi, the namespace is analogous to a storage device, or LUN. After your ESXi host discovers the NVMe namespace, a flash device that represents the namespace appears on the list of storage devices in the vSphere Client. You can use the device to create a VMFS datastore and store virtual machines.Namespace ID (NSID)
The namespace ID is used as an identifier for a namespace from any given controller. Once again, this equates to a Logical Unit Number (LUN) with SCSI-based storage.
Asymmetric Namespace Access (ANA)
Asymmetric Namespace Access (ANA) is an NVMe standard that was implemented as a way for the target to inform an initiator of the most optimal way to access a given namespace.
is the establishment of multiple physical routes between a server and the storage device that supports it. This is done to prevent Single Point of Failure and achieve continuous operations.
NVMe Qualified Name (NQN)
The NVMe Qualified Name (NQN) is used to uniquely identify the remote target or initiator, it is similar to an iSCSI Qualified Name (IQN)
Users can setup NVMe host through:
- PowerStore Manager
- PowerStore CLI
Set up Fibre Channel Front End ports (zoned)
Create Host or Host Groups and select NVMe as protocol
- Add initiator(s)
- nqn is the NVMe identifier similar to the iqn for iSCSI
Create Volume/Thin Clone or Volume Groups
- Not support with vVol
Map the NVMe Host to the Volume(s)
Starting with PowerStoreOS 2.0, PowerStore systems with the 32 Gb Fibre Channel I/O module support NVMe over Fibre Channel. NVMe over Fibre Channel support with PowerStore requires 32 Gb speeds, and the Fibre Channel I/O module must be configured with 32 Gb SFPs to support this feature. The NVMe over Fabric functionality and configuration is fully integrated in PowerStore Manager which allows registration of NVMe hosts using NQN and mapping volumes using NVMe protocol.
To create a new NVMe host in PowerStore, you need to click on the Add Host button and select the NVMe Initiator Type
Next, you need to select the host NQN from the list of all available imitators.
Next, you need to create a volume and map it to the host you’ve just created.
Upon Volume creation NVMe Unique IDs are allocated (in addition to SCSI wwn):
- NSID – Volume ID on host perspective
NGUID – NVMe Global Unique Identifier (equivalent to SCSI wwn)
- Both IDs assigned internally by the system
- Both IDs assigned internally by the system
With NVMe there’s no need for rescans, NVMe has Async evens which allows the array to instantly inform a host of a new storage, resize, etc.
In vSphere, you will automatically see the new volume appear as a namespace, In the figure below are the details on devices, paths, namespaces, and controllers is available.
You can also see more information about the volume under the standard storage device pane
Now, the volume can be used to create a VMFS datastore.
Pluggable Storage Architecture (PSA)
To manage storage multipathing, ESX/ESXi uses a special VMkernel layer, Pluggable Storage Architecture (PSA). The PSA is an open modular framework that coordinates the simultaneous operation of multiple multipathing plugins (MPPs). PSA is a collection of VMkernel APIs that allow third party hardware vendors to insert code directly into the ESX storage I/O path. This allows 3rd party software developers to design their own load balancing techniques and failover mechanisms for particular storage array. The PSA coordinates the operation of the NMP and any additional 3rd party MPP.
Native Multipathing Plugin (NMP)
The VMkernel multipathing plugin that ESX/ESXi provides, by default, is the VMware Native Multipathing Plugin (NMP). The NMP is an extensible module that manages subplugins. There are two types of NMP subplugins: Storage Array Type Plugins (SATPs), and Path Selection Plugins (PSPs). SATPs and PSPs can be built-in and provided by VMware,or can be provided by a third party.
By default, the native multipathing plug-in (NMP) supplied by VMware is used to manage I/O for non-FC-NVMe devices. NMP is not supported for FC-NVMe. VMware uses a different plug-in called the High-Performance Plug-in or HPP.
High-Performance Plug-in (HPP)
VMware provides the High-Performance Plug-in (HPP) to improve the performance of storage devices on your ESXi host.
The HPP replaces the NMP for high-speed devices, such as NVMe. The HPP is the default plug-in that claims NVMe-oF targets. Within ESXi, the NVMe-oF targets are emulated and presented to users as SCSI targets. The HPP supports only active/active and implicit ALUA targets.
Path Selection Schemes (PSS)
The High-Performance Plug-in uses Path Selection Schemes (PSS) to manage multipathing just as NMP uses PSP. HPP offers the following PSS options:
Fixed – Use a specific preferred path
LB-RR (Load Balance – Round Robin) – this is the default PSS. After 1000 IOPs or 10485760 bytes (whichever comes first), that path is switch in a round robin fashion. This is the equivalent of NMP PSP RR.
LB-IOPS (Load Balance – IOPs) – When 1000 IOPs are reached (or set number), VMware will switch paths to the one that has the least number of outstanding IOs.
LB-BYTES (Load Balance – Bytes) – When 10 MB are reached (or set number), VMware will switch paths to the one that has the least number of outstanding bytes.
Load Balance – Latency (LB-Latency) – this is the same mechanism available with NMP, VMware evaluates the paths and decides which one has the lowest latency.
HPP can be managed in vSphere Client as well as esxcli commands, we recommend using the default LB-RR policy and change the IOPS per path from 1000 to 1:
VMware Limitations for NVMe datastores:
|Shared Storage Functionality||SCSI over Fabric Storage||NVMe over Fabric Storage|
|Core dump||Supported||Not supported|
|SCSI-2 reservations||Supported||Not supported|
|Clustered VMDK||Supported||Not supported|
|Shared VMDK with multi-writer flag||Supported||Supported|
|In vSphere 7.0 Update 1 and later.|
|Hardware acceleration with VAAI plug-ins||Supported||Not supported|
|Default MPP||NMP||HPP (NVMe-oF targets cannot be claimed by NMP)|
|Limits||LUNs=1024, Paths=4096||Namespaces=32, Paths=128 (maximum 4 paths per namespace in a host)|
VMware vSphere Storage APIs Array Integration (VAAI) Support:
|Feature||SCSI Command||NVMe Command|
|Hardware Accelerated Locking||COMPARE AND WRITE||Compare and Write|
|Hardware Accelerated Init||WRITE SAME||Write Zeroes|
|Dead Space Reclamation (Block Delete)||UNMAP||Deallocate|
|Hardware Accelerated Copy||XCOPY||Not supported with NVMeOF|
XCOPY is currently not supported with NVMe-oF. This isn’t a limitation with PowerStore or VMware but rather the NVMe spec. Not all offloading capabilities have been translated from SCSI to NVMe.
Below you can see a demo showing how it all works:
A guest post by Tomer Nahumi