In the first post of there series, which you can read here, we covered, in high-level, the features of the 2.0 release of PowerStore. Starting with this post, we will go deeper into the features, starting with the enhancements we have made to PowerStore Dynamic Resiliency Engine (DRE)

Before PowerStoreOS 2.0, a cluster was able to handle only a single drive failure at a time until the internal rebuild was finished. I blogged about it here and you know what? since the GA of PowerStore (one year ago), we did not have a single customer production systems with a dual drive failure which proved the design was RIGHT but as always, we thrive to satisfy our customer’s requirements, whatever they are so, here we go:





We’ve added a dual parity option to give customers additional protection for mission-critical data within the appliance. Our 100% software-based redundancy and sparing method has always provided a more efficient and automated way to protect data within your array – and now customers with strict dual-parity requirements can benefit as well. DRE is a smarter approach to enterprise-class availability, protecting against simultaneous multi-drive failures while intelligently managing both performance and efficiency.

It gives you more flexibility with up to 98% less management effort compared to traditional RAID

compared to traditional RAID Automates all the complex processes associated with drive configuration, redundancy, and sparing.

You can add drives one-at-a-time and mix drives sizes to meet cost goals

Everything’s handled in by intelligent software, right down to replenishing spare capacity if a drive fails, so you never have a lapse in protection levels.

PowerStore gives you superior resiliency at a lower cost.

Starting with PowerStoreOS 2.0, the initial configuration wizard provides an option to enable the Double Drive Failure Tolerance Level feature when more than 7 drives are initially inserted.

Once enabled the cluster can handle up to two drive failures at the same time. The drive failure tolerance level can be set for each appliance individually during the initial configuration or when adding an appliance to an existing cluster.

Within PowerStore’s Dynamic Resiliency Engine (DRE), all drives within the system are automatically consumed within an appliance and the appropriate amount of redundancy is applied​

Proprietary algorithms are used to store and protect data within the system​

RAID Resiliency Sets (RRS) are used as fault domains to improve reliability while minimizing spare space overhead​ RAID Resiliency Sets are also known as fault resiliency sets A maximum of 25 drives in each RRS in PowerStore 1.0

Having multiple failure domains increases the reliability of the system ​

Drive fault tolerance is the number of concurrent drive failures, per RRS, that a system can sustain without causing a Data Unavailable/Data Loss (DU/DL) situation​

The RAID protection scheme within the RRS defines how many failures can occur





In the PowerStoreOS 2.0 release, the user has the choice on what fault tolerance level he wishes to assign to the appliance. the drive tolerance level can be set to single drive failure or double drive failure during the initial configuration of an appliance

Initial configuration could be initial cluster creation or when the appliance is being added to an existing cluster

Configuring the drive tolerance level sets the data protection tolerance level for all RRSs created within the appliance

The drive tolerance level is set for the lifetime of the appliance and cannot be changed without a non-data-in-place factory reset

Pre-2.0 systems utilize single drive failure protection

Different appliances within a cluster can have different tolerance levels

Tolerance Level Comparison: Single Drive Failure vs. Double Drive Failure



Single Drive Failure



Default value

DRE single parity protection is used within each RRS for user data

Up to one simultaneous drive failure per RRS without encountering DU/DL

Metadata and other mirrored data uses 2-way mirroring

Double Drive Failure



DRE dual parity protection is used within each RRS for user data

Up to two simultaneous drive failure per RRS without encountering DU/DL

Metadata and other mirrored data uses 3-way mirroring

Reduced capacity in similar configurations when compared to single

RAID Resiliency Sets & Supported RAID configurations Comparison



Single Drive Failure



RRS maximum drive count = 25 drives (Same as previous releases)

Each RRS reserves one drive worth of spare space

4+1 and 8+1 RAID widths supported, RRS width set based on drive count at initialization 4+1 requires a minimum of 6 drives 8+1 requires a minimum of 10 drives



Double Drive Failure



RRS maximum drive count = 50 drives

Each RRS reserves one drive worth of spare space

4+2, 8+2, and 16+2 RAID widths supported, RRS width set based on drive count at initialization 4+2 requires a minimum of 7 drives 8+2 requires a minimum of 11 drives 16+2 requires a minimum of 19 drives



RAID widths do not change as drives are added to the system

Tolerance Level: Single Drive example Below is an example of the configuration of a RAID Resiliency Set on an appliance containing 12 drives. In this example, the appliance was initialized with 12 drives, and single drive failure was selected as the tolerance level. In this example, the RAID width of 8+1 was selected on the system. This is based on the tolerance level selection and the number of drives within the system at initialization. Capacity from 9 drives is utilized to create the 8+1 protection. Spare space is also reserved from the drives within the RRS as space to use for rebuilds if a failure occurs. This example shows the capacity blocks for each User Data stripe selected in an orderly fashion. This is only for demonstration purposes. Internal algorithms will spread the User Data blocks out efficiently, making the block layout more random.



Tolerance Level: Double Drive example Below is an example of the configuration of a RAID Resiliency Set on an appliance containing 12 drives. In this example, the appliance was initialized with 12 drives, and double drive failure was selected as the tolerance level. In this example, the RAID width of 8+2 was selected on the system. This is based on the tolerance level selection and the number of drives within the system at initialization. Capacity from 10 drives is utilized to create the 8+2 protection. Spare space is also reserved from the drives within the RRS as space to use for rebuilds if a failure occurs. Note: This example shows the capacity blocks for each User Data stripe selected in an orderly fashion. This is only for demonstration purposes. Internal algorithms will spread the User Data blocks out efficiently, making the block layout more random.





Below you can see a demonstration video showing a double failure scenario:

A guest post by Tomer Nahumi



Share this: Twitter

LinkedIn

Email

Facebook

WhatsApp

Pocket



Like this: Like Loading...

Related