Dell EMC PowerStore 2.0 – Part2, DRE: Double Drive Failure
In the first post of there series, which you can read here, we covered, in high-level, the features of the 2.0 release of PowerStore. Starting with this post, we will […]
In the first post of there series, which you can read here, we covered, in high-level, the features of the 2.0 release of PowerStore. Starting with this post, we will go deeper into the features, starting with the enhancements we have made to PowerStore Dynamic Resiliency Engine (DRE)
Before PowerStoreOS 2.0, a cluster was able to handle only a single drive failure at a time until the internal rebuild was finished. I blogged about it here and you know what? since the GA of PowerStore (one year ago), we did not have a single customer production systems with a dual drive failure which proved the design was RIGHT but as always, we thrive to satisfy our customer’s requirements, whatever they are so, here we go:
We’ve added a dual parity option to give customers additional protection for mission-critical data within the appliance. Our 100% software-based redundancy and sparing method has always provided a more efficient and automated way to protect data within your array – and now customers with strict dual-parity requirements can benefit as well. DRE is a smarter approach to enterprise-class availability, protecting against simultaneous multi-drive failures while intelligently managing both performance and efficiency.
PowerStore gives you superior resiliency at a lower cost.
Starting with PowerStoreOS 2.0, the initial configuration wizard provides an option to enable the Double Drive Failure Tolerance Level feature when more than 7 drives are initially inserted.
Once enabled the cluster can handle up to two drive failures at the same time. The drive failure tolerance level can be set for each appliance individually during the initial configuration or when adding an appliance to an existing cluster.
Within PowerStore’s Dynamic Resiliency Engine (DRE), all drives within the system are automatically consumed within an appliance and the appropriate amount of redundancy is applied
Drive fault tolerance is the number of concurrent drive failures, per RRS, that a system can sustain without causing a Data Unavailable/Data Loss (DU/DL) situation
In the PowerStoreOS 2.0 release, the user has the choice on what fault tolerance level he wishes to assign to the appliance. the drive tolerance level can be set to single drive failure or double drive failure during the initial configuration of an appliance
Initial configuration could be initial cluster creation or when the appliance is being added to an existing cluster
Configuring the drive tolerance level sets the data protection tolerance level for all RRSs created within the appliance
The drive tolerance level is set for the lifetime of the appliance and cannot be changed without a non-data-in-place factory reset
Different appliances within a cluster can have different tolerance levels
Tolerance Level Comparison: Single Drive Failure vs. Double Drive Failure
Single Drive Failure
Double Drive Failure
RAID Resiliency Sets & Supported RAID configurations Comparison
Single Drive Failure
Double Drive Failure
Tolerance Level: Single Drive example Below is an example of the configuration of a RAID Resiliency Set on an appliance containing 12 drives. In this example, the appliance was initialized with 12 drives, and single drive failure was selected as the tolerance level. In this example, the RAID width of 8+1 was selected on the system. This is based on the tolerance level selection and the number of drives within the system at initialization. Capacity from 9 drives is utilized to create the 8+1 protection. Spare space is also reserved from the drives within the RRS as space to use for rebuilds if a failure occurs. This example shows the capacity blocks for each User Data stripe selected in an orderly fashion. This is only for demonstration purposes. Internal algorithms will spread the User Data blocks out efficiently, making the block layout more random.
Tolerance Level: Double Drive example Below is an example of the configuration of a RAID Resiliency Set on an appliance containing 12 drives. In this example, the appliance was initialized with 12 drives, and double drive failure was selected as the tolerance level. In this example, the RAID width of 8+2 was selected on the system. This is based on the tolerance level selection and the number of drives within the system at initialization. Capacity from 10 drives is utilized to create the 8+2 protection. Spare space is also reserved from the drives within the RRS as space to use for rebuilds if a failure occurs. Note: This example shows the capacity blocks for each User Data stripe selected in an orderly fashion. This is only for demonstration purposes. Internal algorithms will spread the User Data blocks out efficiently, making the block layout more random.
Below you can see a demonstration video showing a double failure scenario:
A guest post by Tomer Nahumi
