During Dell Tech World 2022, we announced PowerStore 3.0
Starting with PowerStoreOS 3.0, asynchronous file replication is available. Asynchronous replication can be used to protect against a storage-system outage by creating a copy of data to a remote system. Replication is a software feature that synchronizes data to a remote system within the same site or a different location. Replicating data helps to provide data redundancy and safeguards against storage system failures at the main production site. Having a remote disaster recovery (DR) site protects against system and site-wide outages. It also provides a remote location that can resume production and minimize downtime due to a disaster. The PowerStore platform offers many data-protection solutions that can meet disaster recovery needs in various environments.
The asynchronous replication for PowerStore is designed to have minimal impact on host I/O latency. Host writes are acknowledged once they are saved to the local storage resource, and no additional writes are needed for change tracking. Because write operations are not immediately replicated to a destination resource, all writes are tracked on the source. This data is replicated during the next synchronization. With protection policies, asynchronous replication uses the concept of a recovery point objective (RPO). The RPO is the acceptable amount of data, which is measured in units of time, that may be lost due to an outage. This delta of time also affects the amount of data that must be replicated during the next synchronization. It also reflects the amount of potential data loss in a disaster scenario.
The replication itself uses the optimized Dell proprietary TCP-based replication protocol through Ethernet (LAN) connections. All configuration and management operations in this section are shown in PowerStore Manager, but the PowerStore CLI and REST API may also be used.
In PowerStore, native asynchronous replication is supported on the following storage resources:
- Thin clones
- Volume groups
- NAS servers
- File systems
Asynchronous replication operates in the same way for volumes, thin clones, and now file resources on PowerStore. When asynchronous replication is configured on a resource in PowerStore Manager, a single replication session is created, and the destination storage resource is created with the same size and type as the source storage resource.
File system and NAS server replication sessions are created by assigning a protection policy with a replication rule to a NAS server. Once applied to a NAS server, the NAS server and all underlying file systems will be replicated to the destination system. An individual replication session will be created for each file system associated with the NAS server being replicated and for the NAS server itself. File replication can only be applied, managed, and removed at the NAS server level. It is not possible to modify the replication state at the individual file system level. Any file systems created or deleted from the NAS server will automatically have a replication session created or deleted as applicable. While user operations and management for file replication is handled at the NAS server level, each file system will have its own replication session. This is a key distinction between how a NAS server and file systems replicate compared to a volume group and its member volumes.
For file replication, the NAS server replication session is shown at the top level. This session can be expanded by clicking the expansion icon to the left of the session. Once expanded, all underlying file system replication sessions for that NAS server are shown. All operations are performed at the NAS server level and are applied to every underlying file system replication session.
For a more detailed view of the replication state, it is possible to use the individual session states to view the selected replication session. The window below displays a Session Summary, and the local storage resource is always tagged with Current System.
All replication sessions on the system can be viewed from the Replication page. To view this page in PowerStore Manager, under Protection click Replication. This page shows the information regarding each session and includes the following details:
- Replication Session Status
- Source System including the source system and the source storage resource
- Destination System including the destination system name and the destination storage resource
- Resource Type
- Protection Policy
- ETA (estimated time) when the current synchronization will be finished (this shows — when the session is not actively synchronizing)
The Replication page for the source NAS server shows the following buttons:
- PAUSE – to pause the replication
- SYNCHRONIZE – to initiate a manual replication between regular RPO cycle
- PLANNED FAILOVER – to manually initiate a failover during a planned maintenance window
A PLANNED FAILOVER operation allows for controlled failover to occur while also replicating the latest acknowledged host data on source resource . When initiating the operation, the following dialog allows to select Reprotect after failover optionally. When a planned failover starts, the replication session fails over after completing a synchronization between the volumes. The synchronization before failover ensures all data is replicated since last RPO triggered or manual synchronization. The planned failover option is available on the source storage resource when the replication session is “Operating Normally” or a synchronization is in progress. It makes a short period of data unavailable during the failover operation. Before the Planned Failover operation is issued, it is suggested to issue a manual sync first. This action reduces the amount of data to copy during the planned failover with sync. It is suggested to quiesce I/O to the source volume before performing a planned failover.
After the planned failover completes, the destination storage resource is available for production I/O and the original source no longer allows read/write I/O. If host access is configured on the destination resource, hosts can access the data currently. If reprotect after failover is not selected when initiating the failover, replication does not resume in either direction when a planned failover is used.
The unplanned failover option is only available on the destination of the replication session. This failover type fails over to the latest available common base image that exists at the target without any synchronization occurring beforehand. An unplanned failover assumes that a disaster has occurred on the production system, and the destination image is made read/write available. When FAILOVER is selected on a destination resource of a replication session, read/write access is removed from the original source if the source is available to receive management commands. The replication session also pauses and does not automatically switch the direction for replication. The replication session is left in this state until the user issues another replication operation. If I/O occurs to the original destination resource while in this state, the data must be replicated to the original source when the source becomes available. For file resources, FAILOVER is not supported on the destination resource if the source system and production NAS server are still online. If the source is still functioning, please issue a PLANNED FAILOVER from the source.
PowerStore allows initiating an unplanned failover operation during a disaster scenario or even when the replication is in a Paused, Failing Over, or Failed Over state. Any changes made on the source system while the session is in these states might not be replicated to the destination. Since no final synchronization is performed, an unplanned failover can result in data inconsistency or data loss. It should be only initiated when the source system is not available anymore. Use a planned failover whenever possible
In the example below for the vSphere client events logs, we can see that ESXi hosts lost access to the NFS datastore during the failover, and the connection has been restored after a few seconds once the file share has become accessible again on the remote PowerStore cluster
Below, you can see a demo, how it all works
A post by Tomer Eitan