Dell EMC PowerStore 2.0 – Part2, DRE​: Double Drive Failure

In the first post of there series, which you can read here, we covered, in high-level, the features of the 2.0 release of PowerStore. Starting with this post, we will go deeper into the features, starting with the enhancements we have made to PowerStore Dynamic Resiliency Engine (DRE)

Before PowerStoreOS 2.0, an appliance was able to handle only a single drive failure within a resiliency set at any given time until the internal rebuild was finished. I blogged about it here and you know what? since the GA of PowerStore (one year ago), we did not have a single customer production systems with a dual drive failure which proved the design was RIGHT but as always, we thrive to satisfy our customer’s requirements, whatever they are so, here we go:



We’ve added a dual parity option to give customers additional protection for mission-critical data within the appliance. Our 100% software-based redundancy and sparing method has always provided a more efficient and automated way to protect data within your array – and now customers with strict dual-parity requirements can benefit as well. DRE is a smarter approach to enterprise-class availability, protecting against simultaneous multi-drive failures while intelligently managing both performance and efficiency.

  • It gives you more flexibility with up to 98% less management effort compared to traditional RAID
  • Automates all the complex processes associated with drive configuration, redundancy, and sparing.
  • You can add drives one-at-a-time and mix drives sizes to meet cost goals
  • Everything’s handled in by intelligent software, right down to replenishing spare capacity if a drive fails, so you never have a lapse in protection levels.

PowerStore gives you superior resiliency at a lower cost.

Starting with PowerStoreOS 2.0, the initial configuration wizard provides an option to enable the Double Drive Failure Tolerance Level feature when more than 7 drives are initially inserted.

Once enabled, a resiliency set within an appliance can handle up to two drive failures at the same time.. The drive failure tolerance level can be set for each appliance individually during the initial configuration or when adding an appliance to an existing cluster.

Within PowerStore’s Dynamic Resiliency Engine (DRE), all drives within the system are automatically consumed within an appliance and the appropriate amount of redundancy is applied​

  • Proprietary algorithms are used to store and protect data within the system​
  • Resiliency Sets are used as fault domains to improve reliability while minimizing spare space overhead​
    • Resiliency Sets are also known as fault resiliency sets
    • A maximum of 25 drives in each RS in PowerStore 1.0 For an appliance with single parity DRE
  • Having multiple failure domains increases the reliability of the system ​

Drive fault tolerance is the number of concurrent drive failures, per RS, that a system can sustain without causing a Data Unavailable/Data Loss (DU/DL) situation​

The tolerance level for an appliance set during the initial configuration defines how many simultaneous failures a resiliency set can tolerate at any given time

.

In the PowerStoreOS 2.0 release, the user has the choice on what fault tolerance level he wishes to assign to the appliance. the drive tolerance level can be set to single drive failure or double drive failure during the initial configuration of an appliance

Initial configuration could be initial cluster creation or when the appliance is being added to an existing cluster

Configuring the drive tolerance level sets the data protection tolerance level for all Resiliency Sets created within the appliance

The drive tolerance level is set for the lifetime of the appliance and cannot be changed without a non-data-in-place factory reset

  • Pre-2.0 systems utilize single drive failure protection

Different appliances within a cluster can have different tolerance levels
Tolerance Level Comparison: Single Drive Failure vs. Double Drive Failure

Single Drive Failure

  • Default value
  • DRE single parity protection is used within each Resiliency Set for user data
  • Up to one simultaneous drive failure per Resiliency Set without encountering DU/DL

Double Drive Failure

  • DRE dual parity protection is used within each Resiliency Set for user data
  • Up to two simultaneous drive failure per Resiliency Set without encountering DU/DL
  • Parity overhead for dual parity DRE is higher than single parity for smaller configurations. For instance: Appliances with less than 19 drives have a higher parity overhead, appliances with 19 – 25 drives have the same parity overhead as singe parity DRE while appliances with more than 25 drives offer better usable capacity with dual parity DRE due to the lower spare overhead.

Resiliency Sets & Supported configurations Comparison

Single Drive Failure

  • Resiliency Set maximum drive count = 25 drives (Same as previous releases)
  • Each Resiliency Set reserves one drive worth of spare space

Double Drive Failure

  • Resiliency Set maximum drive count = 50 drives
  • Each Resiliency Set reserves one drive worth of spare space

Below you can see a demonstration video showing a double failure scenario:

A guest post by Tomer Nahumi

Similar Posts

Leave a ReplyCancel reply