Dell EMC PowerScale OneFS 9.2, Part3 – Simplicity at any scale

In the first post, we covered the richness of the PowerScale platform. On the second post, we went in depth on the our new platform the F900 and in the third post, we covered the S3 support. Now it’s time to look the power of OneFS 9.2 and what it can do for you. Before we go there, let’s review the core elements of OneFS features

OneFS features: Manage what matters

Throughout that lifecycle, to provide critical file system Data Services, OneFS includes a robust suite of capabilities to handle file system needs in the enterprise.

One of the primary design goals of the PowerScale family, however, is simplified management.

And the key to accomplishing this is to remove abstractions and manage by policy, which is precisely how you do it in OneFS when managing features such as: Snapshots, Replication, Compliance Locking, Quotas, and Tiering.

In OneFS you don’t have the traditional concepts of RAID groups, pools, and other abstractions, so you simply focus on managing files and folders and setting a policy.

Just like with data protection, you select a folder at any level, set a policy, and let the system take care of everything else.

InsightIQ provides you a powerful performance monitoring and reporting tools to help you maximize the performance of your PowerScale platform.

DataIQ enabled in-depth analytics and cluster troubleshooting. It also provided monitoring, reporting, and analysis for OneFS functionality such as quota analysis, tiering analysis, and file-system analytics.

Any Data: Enterprise class unstructured data services with simultaneous multi-protocol support


Our unstructured storage portfolio empowers your business to overcome the challenges previously discussed and build a data-first foundation for your file, object, and streaming data. The three core pillars of the foundation are our PowerScale, ECS and Streaming Data Platform, or SDP, product lines.

Taken together, this next-gen portfolio delivers:

  • Simplicity at scale by making it extremely simple to manage and expand capacity as needed
  • Multiprotocol flexibility via the ability to read and write data across numerous protocols and API constructs including— NFS, SMB, HDFS, REST, HTTP, NDMP, FTP and S3 support.
  • And finally, the capability to store any unstructured data anywhere across edge, core, and cloud environments

This flexibility allows any user to get to the data they need to create, share, collaborate, and develop using an incredibly powerful, multi-lingual data platform.

PowerScale technology gives you the ability to innovate faster and unlock the potential of your data.

What is new in OneFS 9.2

  • S3 protocol – ETag consistency​
  • NFSv3 over RDMA support
  • In-line data reduction ​
  • Full IPv6 support
  • External key management for SEDs​
  • Cluster configuration export/import​
  • CELOG WebUI​
  • Cluster upgrade improvements – drain-based upgrade​
  1. S3 protocol – ​ OneFS S3 ETag consistency with AWS S3 specification
  • Objects created by the PUT Object, POST Object, or Copy operation have ETags that are an MD5 digest of their data. The MD5 can be provided by clients with “Content-MD5” HTTP header or calculated by server side. ​
  • Before OneFS 9.2.0, OneFS does not calculate the MD5 digest for objects if the “Content-MD5” is not contained in HTTP requests. ​
  • Start from OneFS 9.2.0, OneFS exposes options to calculate or validate the MD5 digest for objects. ​
  • Users can enable/disable these options as needed. ​

https://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonResponseHeaders.html

New CLI and WebUI options

  1. NFSv3 over RDMA support

Remote Direct Memory Access (RDMA) originated with InfiniBand and evolved gradually on
Ethernet network
environment. Currently, the following network protocols support RDMA on Ethernet, including
Internet Wide Area RDMA Protocol (iWARP), and RDMA Over Converged Ethernet (RoCE), please refer to RoCE for more details.
NFS over RDMA is defined in RFC8267. Starting with OneFS 9.2.0, OneFS supports NFSv3 over RDMA by leveraging the ROCEv2 (also known as Routable RoCE or RRoCE) network protocol. Note that neither ROCEv1 nor NFSv4 over RDMA are supported in the OneFS 9.2 release. With NFSv3 over RDMA support, direct memory access between OneFS and NFSv3 clients is available with consuming less client CPU
resource, improving OneFS cluster network performance with lower latency, lower CPU load and
higher throughput.

NFSv3 is implemented over RDMA for data transferring, while its auxiliary protocols (mount, nlm, nsm, rpc portmapper) still works on TCP/UDP. You must add PowerScale nodes RoCEv2 capable front-end network interfaces into an IP pool before the NFSv3 clients can access OneFS cluster data over RDMA.

 

NFS over RDMA Network Stack

Cluster requirement:

  • Node type: All Gen6 (F800/F810/H600/H500/H400/A200/A2000), F200, F600, F900
  • Front end network: Mellanox ConnectX-3 Pro, ConnectX-4 and ConnectX-5 network adapters that deliver 25/40/100 GigE speed.
  • How to check if a cluster nodes interfaces support NFSv3 over RDMA?

Client requirement:

  • RoCEv2 capable NICs: Mellanox ConnectX-3 Pro, ConnectX-4, ConnectX-5, and ConnectX-6
  • NFS over RDMA Drivers: Mellanox OpenFabrics Enterprise Distribution for Linux (MLNX_OFED) or OS Distributed inbox driver. It is recommended to install Mellanox OFED driver to gain the best performance. The RDMA traffic cannot be captured by tcpdump tool as the OS Kernel is not involved. Instead, you can use ibdump tool to capture the RDMA traffic which is contained in Mellanox OFED driver package.
  • Making sure your client is running with RoCEv2, details please refer to your OS documentation and Mellanox documentation RoCE Mode.

Note: As of MLNX_OFED v4.7, NFSoRDMA driver is no longer installed by default. To install it over a supported kernel, add the “–with-nfsrdma” installation option when install MLNX_OFED driver.

Performance:

  • NFSv3 over RDMA’s primary advantage is with single/few threads (Sequential read), where throughput shows a big boost over TCP.
  • Less CPU consumption on both client and cluster when using NFSv3 over RDMA
  1. In-line data reduction ​– PowerScale F900


    The in-line data reduction write path comprises three main phases:



  • Zero block removal detects blocks containing only zeros and prevents them from being written to disk.
    • Reduces disk space requirements
    • Avoids unnecessary writes to SSD, increasing drive longevity.
  • Zero block removal occurs first in the in-line data reduction pipeline.
    • Reduces amount of work in-line dedupe and compression need to perform.
  • Checking for zero data does incur some overhead.
    • To minimize impact, check is terminated on the first non-zero data found in a block.


  • In-line deduplication occurs in real time as data is written to the cluster.
  • Dedupe is performed in software.
  • Data is scanned for identical blocks as it is received.
  • When a duplicate is found, a single copy of the block is moved to a shadow store.
  • Shadow stores are containers that allow common blocks to be shared.
  • Files can contain both data and pointers to shared blocks in shadow stores.
  • Each node has an in-memory hash index that it compares block ‘fingerprints’ against.
    • The index lives in RAM and accessed directly with physical addresses.
  • Avoids traversing virtual memory mappings.
  • Minimizes performance impact.


    • F900, F600 & F200 and H5600 use software igzip compression algorithm
      • F810 uses zlib with FPGA.
    • When a file is written using in-line compression, its logical space is divided up into equal sized chunks.
      • Compaction creates 128KB compression chunks.
      • 128KB equals OneFS stripe unit size, avoids packing.
    • Efficiency savings must be 8KB+ for compression.
      • If savings are <8KB, chunk or file will be passed over and remain uncompressed.
    • Once compressed, a file is then FEC protected.

    Controlling In-Line data reduction

    • In-line compression and dedupe configuration is binary: On or off across a cluster.
    • Compression is enabled by default on new PowerScale F900, F600 & F200 and Isilon F810 & H5600 nodepools.
  • To disable:

    • In-line dedupe + single instancing is disabled by default:
      • To enable:

  1. Full IPv6 support
  • OneFS 9.2 introduces support to meet the USGv6 requirements for United States Government deployments. The USGv6 feature implements Router Advertisements and Duplicate Address Detection.
  • The isi network interfaces list command is enhanced. The state shown is fetched live from each node. It now includes:
    • MTU and IPv4/IPv6 Gateway information
    • SmartConnect Service IPs and IPv6 Link Local (by request)
    • Information is reported by Interface and VLAN
    • New query parameters (owner, VLAN, address type)
    • NIC flags (ACCEPT_ROUTER_ADVERT, SUPPORTS_RDMA_RoCE, SUPPORTS_RDMA_RRoCE)


  1. External key management for SEDs​

    OneFS data-at-rest encryption utilizes SEDs – Data is encrypted during writes and decrypted during reads.

  • Data stored on the SEDs are encrypted and decrypted with a 256-bit data AES encryption key, referred to as the Data Encryption Key (DEK).
  • OneFS takes the standard SED encryption further:
    • DEK for each SED is wrapped in an Authentication Key (AK)
    • AKs for each drive are placed in a Key Manager (KM)
    • KM is stored securely in an encrypted database, the Key Manager Database (KMDB)
    • KMDB is encrypted with a 256-bit Master Key (MK)


256-bit Master Key (MK) is stored in KMIP server


  • PowerScale OneFS release 9.2 provides support for an external key manager by storing the 256-bit Master Key (MK) in a Key Management Interoperability Protocol (KMIP) compliant key manager server.
  1. Cluster configuration export/import
  • Supports the following components:
    • HTTP
    • quota
    • snapshot
    • NFS
    • SMB
    • S3
    • NDMP

By default, configuration backup and restore files reside at:

  • Backup JSON file: /ifs/data/Isilon_Support/config_mgr/backup/<JobID>/<component>_<JobID>.json
  • Restore JSON file: /ifs/data/Isilon_Support/config_mgr/restore/<JobID>/<component>_<JobID>.json

To trace the progress of backup/restore, use the log file for configuration manager at /var/log/config_mgr.log

Example – Export and view NFS and SMB config

# isi cluster config exports create –components=nfs,smb –verbose

# isi cluster config exports view Deccan-20210131105345

Example – Import and view NFS and SMB config

# isi cluster config imports create Deccan-20210131105345 –components=smb,nfs

#isi cluster config imports view Deccan-20210131110659

  1. New CELOG WebUI​ – OneFS Cluster Event Log

CELOG provides a single source for the logging of events that occur in an Isilon cluster. Events are used to communicate a picture of cluster health for various components. CELOG provides a single point from which notifications about the events are generated, including sending alert emails and SNMP traps.

  • Enable/Disable in WebUI

  • Continue to log events, but “no” alert notifications

# isi event groups list –maintenance-mode=true

ID Started Ended Causes Short Lnn Events Severity

——————————————————————————————

16 02/09 11:49 — HW_CLUSTER_ONEFS_VERSION_NOT_SUPPORTED 1 1 critical

17 02/09 12:05 02/09 12:19 HW_ONEFS_VERSION_NOT_SUPPORTED 1 1 critical

  • Capability to review all the alert notifications when maintenance mode is disabled.
    • View details
    • Ignore
    • Resolve

  1. Drain-based Upgrade

    The drain-based upgrade supports the following scenarios and available for WebUI, CLI, and PAPI.

  • SMB protocol
  • OneFS upgrades
  • Firmware upgrades
  • Cluster reboots
  • Combined upgrades (OneFS and Firmware)

Three options for existing connections

  • Wait: wait till the SMB connections get to “0” or it hits the drain timeout value.
  • Delay: add the node into the delay list to delay client draining
  • Skip: Stop waiting for clients to migrate away from the draining node and reboot immediately.

# isi upgrade start –parallel –skip-optional –install-image-path=/ifs /data/<installation-file-name> –drain-timeout=60m –alert-timeout=45m


In addition of drain-based upgrade we also combined reboot upgrade

  • Combine OneFS upgrade and firmware upgrade workflows. (1 reboot per node instead of 2)

    # isi upgrade start–fw-pkg=/ifs/path/fw.pkg –install-image-path=/ifs/path/install.tar.gz—parallel

Leave a Reply