Dell EMC PowerStore: Dynamic Resiliency Engine (DRE)
A guest post by Dan Cummins, Shuyu Lee & Kiruthika Gopal Since the launch of PowerStore, which I blogged about here, it has been well received, the product contains so […]
Dell Storage, PowerStore, PowerFlex PowerMax & PowerScale, Virtualization & Containers Technologies
A guest post by Dan Cummins, Shuyu Lee & Kiruthika Gopal Since the launch of PowerStore, which I blogged about here, it has been well received, the product contains so […]
A guest post by Dan Cummins, Shuyu Lee & Kiruthika Gopal
Since the launch of PowerStore, which I blogged about here, it has been well received, the product contains so much innovation that sometime, can be hard to go deeper into each one of it’s architectural strategies to understand how they translate to real user value. One example for this is it’s Drive protection strategy, we may have, over simplified the message around it and now it’s time, to go deeper and explain our motivation, how they are being translated to a real user value, which is even more amplified these days with the uncertainties that surround us all.
Let’s talk Resiliency
Enterprise class storage systems require high levels of reliability and protection from data loss and latent drive failures. Traditional data protection schemes are based on RAID groups of a fixed layout that protect a volumes data. The bandwidth and rebuild speed in this traditional design is limited by the number of drives participating in the group and the speed of the rebuild is limited by the number of tolerated drive failures of that RAID level. For example, a 6+2 could achieve the read speed of 6 drives but only sustain 2 drives worth of rebuild speed in the case of a dual drive failure.
The reliability of the data being protected depends heavily on the Bit Error Rate (BER) of the drive, the amount of data that must be rebuilt, and the number of drives. As capacity and drive counts increase it becomes more difficult to maintain reliability when traditional RAID protection schemes are used.
Furthermore, economics and reliability are key buying decisions. The cost of a storage solution is driven by the cost of the drives and ultimately, it’s price/performance and effective capacity ($/IOP and $/Effective Capacity). As storage and drive capacity grows:
PowerStore’s DRE Architecture is the answer to these challenges.
Dell EMC PowerStore Dynamic Resiliency Engine is a patented technology, built ground up to intelligently and flexibly address the reliability, performance and system cost challenges as the drive landscape continues to evolve. (drives size, drives technology etc’)
Overview
PowerStore implements proprietary algorithms where every drive is partitioned into multiple virtual segments and redundancy extents are created by utilizing the segments across several drives.
It automatically consumes the drives within an appliance and creates appropriate redundancy using all the drives. This improves overall performance and allows performance to scale as more drives are added to the appliance.
Data written to a volume can be spread across any number of drives within an appliance. As new drives are added, the data is automatically re-balanced.
Data placement in PowerStore
Distributed Sparing:
Unlike traditional RAID protection strategies, PowerStore does not require dedicated spare drives. Spare space is distributed across the entire appliance, small chunks of reserved capacity segments from each drive are used for sparing in the event of a drive failure. PowerStore dynamically provisions a drives worth of spare capacity per Resiliency Set.
When a drive fails, only the portion of the drive which has data written will be rebuilt. By doing so, the spare capacity is efficiently managed by consuming only the required space. This also shortens rebuild time as only data that has written to the drive needs to be rebuilt.
Distributed Sparing
Resiliency Sets:
PowerStore implements Resiliency Sets to improve the reliability while minimizing spare and protection overhead. Having multiple failure domains aka Resiliency Sets increases the reliability of the system since it allows the appliance to tolerate a drive failure within each of these Resiliency Sets if the failure occurs at the same time.
Tolerance for drive failure within multiple resiliency sets simultaneously
The appliance can tolerate multiple drive failures even within the same Resiliency Set, if the failure occurs at different instances (second drive fails after the rebuild on first failed drive is complete)
What’s non-obvious is that because of the dynamic spare capacity mapping, the volume distribution strategy, and the speed of the distributed rebuild you can survive multiple successive drive failures in a single Resiliency Set while still achieving the required reliability. The key to this is the Resiliency Set limits the fault domain and the probability multiple drives within the set failing at the same time before consistency can be achieved.
In the current release, Resiliency Sets contain up to 25 drives. The number of Resiliency Sets dynamically increases as more drives are added. For eg: If a 26th drive is added, the resiliency set dynamically splits into 2.
Resiliency Sets can span across physical enclosures based on the number of drives in the appliance and can have mixed drive sizes.
Key Benefits:
Enterprise Class Availability
Parallel rebuild of single drive to distributed spare space
Intelligent Infrastructure
Rebuild data chunks to spare space after drive failure and replenish spare space with unused user space
Flexible Configurations
The ability to grow in a single drive increments AND able to accommodate different drives sizes is, very relevant these days where we don’t want to commit in advance to a potential, unneeded CAPEX investment.
Single Drive Expansion
In summary, I hope we were able to share some deeper info on one of the unique values of Dell PowerStore, I encourage you to read about many more of it’s core features and eco-system support in my blog by clicking here, https://volumes.blog/?s=powerstore
1 Comment »