A guest post by Dan Cummins, Shuyu Lee & Kiruthika Gopal Since the launch of PowerStore, which I blogged about here, it has been well received, the product contains so […]
Since the launch of PowerStore, which I blogged about here, it has been well received, the product contains so much innovation that sometime, can be hard to go deeper into each one of it’s architectural strategies to understand how they translate to real user value. One example for this is it’s Drive protection strategy, we may have, over simplified the message around it and now it’s time, to go deeper and explain our motivation, how they are being translated to a real user value, which is even more amplified these days with the uncertainties that surround us all.
Let’s talk Resiliency
Enterprise class storage systems require high levels of reliability and protection from data loss and latent drive failures. Traditional data protection schemes are based on RAID groups of a fixed layout that protect a volumes data. The bandwidth and rebuild speed in this traditional design is limited by the number of drives participating in the group and the speed of the rebuild is limited by the number of tolerated drive failures of that RAID level. For example, a 6+2 could achieve the read speed of 6 drives but only sustain 2 drives worth of rebuild speed in the case of a dual drive failure.
The reliability of the data being protected depends heavily on the Bit Error Rate (BER) of the drive, the amount of data that must be rebuilt, and the number of drives. As capacity and drive counts increase it becomes more difficult to maintain reliability when traditional RAID protection schemes are used.
Furthermore, economics and reliability are key buying decisions. The cost of a storage solution is driven by the cost of the drives and ultimately, it’s price/performance and effective capacity ($/IOP and $/Effective Capacity). As storage and drive capacity grows:
- Performance scales
- Relative cost of the controller diminishes
- Uncertainty around the number of drives you will need on day 2 AND day 20
- The probability of encountering drive failures increases
- Protection and system metadata overhead increases, impacting effective storage capacity
Higher rebuild speeds are needed to maintain reliability
PowerStore’s DRE Architecture is the answer to these challenges.
- It avoids the pitfalls of traditional RAID schemes by dynamically mapping and distributing protection and spares. Every drive contributes to performance and there is no need for a dedicated spare.
- New writes are dynamically mapped to available capacity and are fully protected at the time of the write – avoiding the traditional RAID write hole – and improving reliability.
- Implements a concept called Resiliency Sets which defines a protection domain. Volume data stripes are written fully consistent within a Resiliency Set and distributed across them. This enables DRE to minimize protection overhead while maintaining required reliability.
- Rebuild speeds are extremely fast – Volume data and spare space are distributed which enables every drive to contribute to rebuild performance. It distributes missing segments to available capacity and only needs to rebuild inconsistent stripes – distributing the rebuild performance and rebuilding only what needs to be rebuilt.
Dell EMC PowerStore Dynamic Resiliency Engine is a patented technology, built ground up to intelligently and flexibly address the reliability, performance and system cost challenges as the drive landscape continues to evolve. (drives size, drives technology etc’)
PowerStore implements proprietary algorithms where every drive is partitioned into multiple virtual segments and redundancy extents are created by utilizing the segments across several drives.
It automatically consumes the drives within an appliance and creates appropriate redundancy using all the drives. This improves overall performance and allows performance to scale as more drives are added to the appliance.
Data written to a volume can be spread across any number of drives within an appliance. As new drives are added, the data is automatically re-balanced.
Data placement in PowerStore
Unlike traditional RAID protection strategies, PowerStore does not require dedicated spare drives. Spare space is distributed across the entire appliance, small chunks of reserved capacity segments from each drive are used for sparing in the event of a drive failure. PowerStore dynamically provisions a drives worth of spare capacity per Resiliency Set.
When a drive fails, only the portion of the drive which has data written will be rebuilt. By doing so, the spare capacity is efficiently managed by consuming only the required space. This also shortens rebuild time as only data that has written to the drive needs to be rebuilt.
PowerStore implements Resiliency Sets to improve the reliability while minimizing spare and protection overhead. Having multiple failure domains aka Resiliency Sets increases the reliability of the system since it allows the appliance to tolerate a drive failure within each of these Resiliency Sets if the failure occurs at the same time.
Tolerance for drive failure within multiple resiliency sets simultaneously
The appliance can tolerate multiple drive failures even within the same Resiliency Set, if the failure occurs at different instances (second drive fails after the rebuild on first failed drive is complete)
What’s non-obvious is that because of the dynamic spare capacity mapping, the volume distribution strategy, and the speed of the distributed rebuild you can survive multiple successive drive failures in a single Resiliency Set while still achieving the required reliability. The key to this is the Resiliency Set limits the fault domain and the probability multiple drives within the set failing at the same time before consistency can be achieved.
In the current release, Resiliency Sets contain up to 25 drives. The number of Resiliency Sets dynamically increases as more drives are added. For eg: If a 26th drive is added, the resiliency set dynamically splits into 2.
Resiliency Sets can span across physical enclosures based on the number of drives in the appliance and can have mixed drive sizes.
Enterprise Class Availability
- Faster rebuild times with distributed sparing
Rebuild smaller chunks of the drive simultaneously to multiple drives in the appliance
Parallel rebuild of single drive to distributed spare space
Automatically allocate unused user space to replenish spare space to handle multiple failures
- DRE dynamically transfers unused user capacity to replenish spare capacity if there is sufficient unused capacity available on the appliance.
Intelligently vary the rebuild speed based on incoming IO traffic while maintaining availability
- PowerStore utilizes intelligence to automatically adjust the rebuild rate and prioritize host IO to optimize performance and reliability in the presence of a drive failure. So instead of providing a fixed-rate rebuild, we are examining the performance envelope of the array and act accordingly!
Rebuild data chunks to spare space after drive failure and replenish spare space with unused user space
Lower TCO with ability to expand storage by adding single drives
- An appliance can have a minimum of 6 drives and can scale up to 96 drives. Capacity can be added non disruptively, customers have the flexibility to expand their storage by adding one or more drives based on their need.
The ability to grow in a single drive increments AND able to accommodate different drives sizes is, very relevant these days where we don’t want to commit in advance to a potential, unneeded CAPEX investment.
Flexible options to add different drive sizes based on storage need
- PowerStore implements proprietary algorithms to manage drives with different sizes by optimizing the distribution of redundancy segments across multiple drives.
Single Drive Expansion
In summary, I hope we were able to share some deeper info on one of the unique values of Dell PowerStore, I encourage you to read about many more of it’s core features and eco-system support in my blog by clicking here, https://volumes.blog/?s=powerstore