A guest post by Alan NG Over the years, backup has evolved into what is now known as Data Protection. However, Data Protection is more than just backing up data. It […]
A guest post by Alan NG
Over the years, backup has evolved into what is now known as Data Protection. However, Data Protection is more than just backing up data. It is a crucial component of data lifecycle management and a fundamental feature that most storage vendors offer as part of their product portfolio.
Without proper data protection, personal information and sensitive data can be vulnerable to unauthorized access, theft, and misuse by hackers. Organizations are also at risk of falling victim to ransomware attacks, resulting in the loss of critical data, disruptions in business operations, and significant financial losses not to mention reputation.
Even though everyone seems to understand the importance of data protection, when it comes to designing new solutions, data protection is often considered last. This is partly because the IT department has a limited budget and prefers to prioritize the rollout of the main system before implementing a backup system. While this approach is not necessarily wrong, it does create a window of vulnerability where the main system may be compromised and result in data loss.
PowerFlex native Data protection
In view of this temporary lapse in data security, PowerFlex does offer its very own native data protection capabilities like replication and snapshots while customers finalise their overall IT data protection strategy.
PowerFlex supports native asynchronous replication for all consumption models.
PowerFlex replication is a data service that allows you to replicate one or more volumes from a source PowerFlex cluster to a target PowerFlex cluster over IP network. The replication is asynchronous, meaning that the source volume does not wait for the target volume to acknowledge the write operation before completing it. This minimizes the impact on performance and latency for the source workload.
PowerFlex replication uses Replication Consistency Groups (RCGs) to manage the attributes and behavior of the replication of one or more volume pairs. An RCG defines:
– The target cluster where the replicated data will be stored
– The Recovery Point Objective (RPO), which is the maximum acceptable amount of data loss in case of a disaster
– The source and target protection domains, which are logical groups of volumes within a cluster
You can create multiple RCGs based on your application requirements, data retention, data type, or related application quiescing procedures to enable read-consistent snapshots. For example, you can create different RCGs for different types of applications and assign all volumes to a single RCG, or for different RPOs (such as 15 seconds, 5 minutes, etc.).
Supported Topologies for 3.5.x and 3.6.x
It has been a while since PowerFlex replication was first introduced in 3.5.x. and it is critical to know what are the supported topologies available across different versions. PowerFlex systems that run on version 3.5.x can act as active replication peers alongside systems that run on version 3.6.x, even if the source and destination systems have different code versions in the long term. The acceptable difference, unless stated otherwise, is a maximum of 2 major code versions. However, if the systems’ code versions are incompatible, the newer system will reject the connection from the older system, and the peer relationship will remain disconnected. Additionally, some replication features of the newer version may not be enabled if the peer system has a lower code version. The same reasoning applies for 4.x.x.
New Supported Topologies for 4.0
PowerFlex 4.0 has a maximum limit of five (5) systems in any arbitrary topology. If a replication topology has more than two systems, pre-4.0 systems cannot be included as this is strictly enforced. A single PowerFlex 4.0 system can only have a maximum of four peers, and this limit is also enforced. The peering for a Replication Consistency Group (RCG) is only between two systems, and this peering is enforced at the creation of the RCG. PowerFlex systems running on versions 3.5.x or 3.6.x can function as active replication peers with a system running on version 4.x. However, support for a mixed version topology (3.5.x or 3.6.x or 4.x) is limited to two peer systems, regardless of their versions.
Why Use PowerFlex Replication?
PowerFlex replication offers several advantages over traditional storage-based replication solutions:
– Flexibility: Any volume within a PowerFlex cluster can be replicated to another PowerFlex cluster that has been peered and exchanged certificates with it. The hardware configuration and storage pool type do not need to match between the source and target clusters, nor do they need to have the same properties (e.g., thick vs. thin, compressed vs. non-compressed, medium granularity vs fine granularity). Volumes can be resized independently on both sides without affecting replication, but the target volume should always be expanded first.
– Scalability: The source and target clusters can be scaled up or down independently without affecting replication. Volumes can also be added or removed from an RCG dynamically without disrupting replication.
– Efficiency: Compression features can be used on both the source and target volumes. However, it is important to note that the data replicated over the network is not compressed. Thin provisioning can also be used on both sides to optimize capacity utilization.
– Simplicity: All aspects of replication can be managed through PowerFlex Manager, which provides both a graphical user interface (GUI) and command-line interface (CLI) options. Integration with VMware Site Recovery Manager (SRM) is also possible through the PowerFlex Storage Replication Adapter (SRA), allowing for the automation of failover and failback operations.
How to Design and Deploy PowerFlex Replication?
Before starting to replicate data with PowerFlex, it is important to consider factors that may affect design and deployment decisions:
Network bandwidth: It is important to ensure that there is sufficient network bandwidth between the source and target sites to support the desired RPOs. Factors such as network latency, packet loss, and congestion should also be taken into account as they may impact replication performance.
Data change rate: It is important to estimate the rate at which data changes on the source volumes. This will determine the amount of data that needs to be replicated during each RPO interval. If the rate of data change exceeds the available network bandwidth, replication may experience backlog or lag.
Data consistency: It is important to ensure that your replicated data is consistent with your application state at all times. This may require coordination or pausing the application before taking snapshots or initiating failover/failback operations. In PowerFlex, this is managed through the use of RCGs. RCGs are crash-consistent and flexible, and do not require any special requirements from the storage platform.
Data recovery: It is important to define the recovery point objective (RPO) and recovery time objective (RTO) for each application or workload that will be protected by replication. This will help in selecting the appropriate RCG settings and test scenarios.
Replication Configuration Walkthrough
Replication in PowerFlex is easily configured and provision through RCG. Hence, let’s go ahead and demonstrate how easy it is to deploy and configure a PowerFlex replication clusters in just a few clicks.
To deploy PowerFlex replication, these are the steps:
1. Peer two PowerFlex clusters by exchanging certificates between them as shown below.
2. Once the source cluster is added. Repeat the same steps on the target storage cluster. Upon completion, the systems are peered in both directions and are ready to start pairing replicated volumes.
3. Creating RCG on the PowerFlex cluster is the next steps of the process. Enter the following information:
Source protection domain
Target protection domain
3. Add replication pairs
4. In the last step, the summary page displays the selected volume pairs for review. At this point, the administrator has the option to add the pairs without activating synchronization or to add the pair and activate synchronization at the same time. This provides flexibility in planning and execution. Once the RCG is activated, data flow will begin.
After pairing, the volume pairs may show as being in an inconsistent state and the status may indicate an RPO violation. This usually resolves quickly, within a few seconds. However, if the error persists, it may be worthwhile to check the capacity of the journal volume.
5. Once RCG is created, it will be displayed in the Remote Protection : Overview dashboard as shown. It will be the same on both clusters and ready to start replicating.
6. Under Remote Protection : RCGs, both RCGs will be showing as inactive and partially consistent as they were created, added but not activated. It can be now activated.
7. Once activated, the status will now change to OK and 100% compliance and consistent.
8. On the drop down menu, more options is now open. You can pause the replication, create snapshots, failover, Test Failovers, terminate or remove.
9. The same is shown on the other cluster and same action can be taken on either side.
Creating a replication pair is the easy part but understanding and utilizing the various features that come with it can be challenging. PowerFlex Replication offers a range of features that can help maximize the functionality of the system. Below is a brief description of these features.
Pause :- This action pauses replication between source and target. This prevents journals from being shipped to the target cluster until replication is resumed. Writes to the replicated volumes are still collected in the source journal volumes
Create snapshots :- Generates snapshots of each volume in the RCG on the target system. This can be useful for remotely testing an application or DR activity. There is no RCG menu option to manage or delete the snapshots, so they must be mapped/unmapped and manually deleted on the target system’s snapshot list.
Failover :- Forces a failover event, passing primary ownership of the volumes within the RCG to the target system. This also switches the Host Access profile on the source-side volumes read-only and on the target to read/write. Once this is done, for planned failovers, you can also select the Reverse command to resume protection of the RCG volumes, only now in the opposite direction. If you wish to abort the failover operation, select the Restore option to return to the original replication state and direction
Test Failover :- This automatically creates a snapshot on the target system and replaces the original target volume mapping with a mapping to the snapshot. Using this command, you can perform write testing to the volume while preventing the source volume from being corrupted by the test activity
Activate/Terminate :- These options are new in PowerFlex version 3.6. If an RCG was created but not activated, or has been placed into an inactive state, then it can be activated here. Activation initiates all of the replication-related processes and begins the flow of I/O through the SDR on the source system. If a user Terminates an RCG, this not only stops the flow of replication data between sites, it also releases the SDR from proxying the I/O and writing to the journal. A terminated or inactive RCG consumes no additional system resources and is merely a configuration placeholder.