Dell EMC PowerScale OneFS 9.2, Part5 – Introduction to SyncIQ

In the first post, we covered the richness of the PowerScale platform. On the second post, we went in depth on the our new platform the F900 , in the third post, we covered the S3 support and, in the forth post, we covered the new, OneFS 9.2 features.

PowerScale SyncIQ offers powerful, flexible, and easy-to-manage asynchronous replication for collaboration, disaster recovery, business continuity, disk-to-disk backup, and remote disk archiving.

SyncIQ delivers unique, highly parallel replication performance that scales with the dataset to provide a solid foundation for disaster recovery. SyncIQ can send and receive data on every node in a PowerScale cluster, taking advantage of any available network bandwidth, so replication performance increases as the data store grows. Data replication starts and remains a simple process because both the replication source and target can scale to multiple petabytes without fragmentation into multiple volumes or file systems.

A simple and intuitive web-based user interface allows administrators to easily organize SyncIQ replication job rates and priorities to match business continuity priorities. Typically, a SyncIQ recurring job is defined to protect the data required for each major Recovery Point Objective (RPO) in the disaster recovery plan. For example, an administrator may choose to sync every 6 hours for customer data, every 2 days for HR data, and so on. A directory, file system or even specific files may be configured for more- or less-frequent replication based on their business criticality. In addition, administrators can create remote archive copies of non-current data that needs to be retained, reclaiming valuable capacity in a production system.

SyncIQ can be tailored to use as much or as little system resource and network bandwidth as necessary, and the sync jobs can be scheduled to run at any time, in order to minimize the impact of the replication on production systems.

SyncIQ Deployment topologies

SyncIQ provides an array of configuration options, ensuring administrators have flexible options to satisfy all workflows with simplicity.

Under each deployment, the configuration could be for the entire cluster or a specified source directory. Additionally, the deployment could have a single policy configured between the clusters or several policies, each with different options aligning to RPO and RTO requirements.

One-to-one

In the most common deployment scenario of SyncIQ, data replication is configured between a single source and single target cluster

One-to-many

SyncIQ supports data replication from a single source cluster to many target clusters, allowing the same dataset to exist in multiple locations. A one-to-many deployment could also be referenced as a hub-and spoke deployment, with a central source cluster as the hub and each remote location representing a spoke. In this example SyncIQ is set up to replicated data from the LA office (source cluster) to NJ, NY and UK (target clusters).

Many-to-one

The many-to-one deployment topology is essentially the flipped version of the one-to-many explained in the previous section. Several source clusters replicate to a single target cluster. The many-to-one topology may also be referred to as a hub-and-spoke configuration. However, in this case, the target cluster is the hub, and the spokes are source clusters.
In this example SyncIQ is set up to replicated data from NJ, NY and UK (Source clusters) to LA (target cluster).

Local target

A local target deployment allows a single PowerScale cluster to replicate within itself providing the SyncIQ powerful configuration options in a local cluster. If a local target deployment is used for disaster readiness or archiving options, the cluster protection scheme and storages pools must be considered.

Cascaded

A cascaded deployment replicates a dataset through a series of clusters. It allows a primary cluster to replicate to a secondary cluster, next to a tertiary cluster, and so on. Essentially, each cluster replicates to a next in the chain. For a cascaded SyncIQ implementation, consider how the replication start times are configured on the 2nd and subsequent clusters. Ensure the start times do not start before the SyncIQ job completes from the previous cluster. In this example cluster in NJ replicating to a cluster in NY, then the cluster in NY replicating to a cluster in LA and finally the cluster in LA replicating to a cluster in UK.

Custom

A custom deployment combines the previous deployments. A primary cluster replicates to a secondary, and then the secondary replicates to a set of clusters. Essentially, this implementation is a combination of the ‘Cascaded’ and ‘One-to-many’ deployments. In this example the cluster in NJ replicating data to a cluster in NY. And then the cluster in NY replicating data to a cluster in LA and a cluster in UK.

Use cases

PowerScale SyncIQ offers powerful, efficient, and easy-to-manage data replication for disaster recovery, business continuity, remote collaboration, disk-to-disk backup, and remote disk archive.

Disaster recovery

Disaster recovery requires quick and efficient replication of critical business data to a secondary site. SyncIQ delivers high performance, asynchronous replication of data, providing protection from both local site and regional disasters, to satisfy a range of recovery objectives. SyncIQ has a very robust policy-driven engine that allows customization of replication datasets to minimize system impact while still meeting data protection requirements. SyncIQ automated data failover and failback reduces the time, complexity and risks involved with transferring operations between a primary and secondary site, in order to meet an organization’s recovery objectives. This functionality can be crucial to the success of a disaster recovery plan.

Business continuance

By definition, a business continuity solution needs to meet the most aggressive recovery objectives for the most timely, critical data. The SyncIQ highly efficient architecture provides performance that scales to maximize usage of any available network bandwidth and provides administrators the best-case replication time for aggressive Recovery Point Objectives (RPO). SyncIQ can also be used in concert with Dell EMC PowerScale SnapshotIQ software, which allows the storage of point-in-time snapshots to support secondary activities like the backup to tape.

Disk-to-disk backup and restore

Enterprise IT organizations face increasingly complex backup environments with costly operations, shrinking backup and restore windows, and stringent service-level agreement (SLA) requirements. Backups to tape are traditionally slow and hard to manage as they grow, compounded by the size and rapid growth of digital content and unstructured data. SyncIQ, as a superior disk-to-disk backup and restore solution delivers scalable performance and simplicity, enabling IT organizations to reduce backup and restore times and costs, eliminate complexity, and minimize risk. With PowerScale scale-out network-attached storage (NAS), petabytes of backup storage can be managed within a single system-as one volume, and one file system and can be the disk backup target for multiple PowerScale clusters.

Remote archive

For data that is too valuable to throw away, but not frequently accessed enough to justify maintaining it on production storage, replicate it with SyncIQ to a secondary site and reclaim the space on the primary system. Using a SyncIQ copy policy, data can be deleted on the source without affecting the target, leaving a remote archive for disk-based tertiary storage applications or staging data before it moves to offline storage. Remote archiving is ideal for intellectual property preservation, long-term records retention, or project archiving.

Architecture and processes

SyncIQ leverages the full complement of resources in a PowerScale cluster and the scalability and parallel architecture of the Dell EMC PowerScale OneFS file system. SyncIQ uses a policy-driven engine to execute replication jobs across all nodes in the cluster.

Multiple policies can be defined to allow for high flexibility and resource management. The replication policy is created on the source cluster, and data is replicated to the target cluster. As the source and target clusters are defined, source and target directories are also selected, provisioning the data to replicate from the source cluster and where it is replicated on the target cluster. The policies can either be executed on a user-defined schedule or started manually. This flexibility allows administrators to replicate datasets based on predicted cluster usage, network capabilities, and requirements for data availability.

Once the replication policy starts, a replication job is created on the source cluster. Within a cluster, many replication policies can be configured.

During the initial run of a replication job, the target directory is set to read-only and is solely updated by jobs associated with the replication policy configured. When access is required to the target directory, the replication policy between the source and target must be broken. Once access is no longer required on the target directory, the next jobs require an initial or differential replication to establish the sync between the source and target clusters.

When a SyncIQ job is initiated, from either a scheduled or manually applied policy, the system first takes a snapshot of the data to be replicated. SyncIQ compares this to the snapshot from the previous replication job to quickly identify the changes that need to be propagated. Those changes can be new files, changed files, metadata changes, or file deletions. SyncIQ pools the aggregate resources from the cluster, splitting the replication job into smaller work items and distributing these amongst multiple workers across all nodes in the cluster. Each worker scans a part of the snapshot differential for changes and transfers those changes to the target cluster. While the cluster resources are managed to maximize replication performance, administrators can decrease the impact on other workflows using configurable SyncIQ resource limits in the policy.

Replication workers on the source cluster are paired with workers on the target cluster to accrue the benefits of parallel and distributed data transfer. As more jobs run concurrently, SyncIQ employs more workers to utilize more cluster resources. As more nodes are added to the cluster, file system processing on the source cluster and file transfer to the remote cluster are accelerated, a benefit of the PowerScale scale-out NAS architecture.

SyncIQ is configured through the OneFS WebUI, providing a simple, intuitive method to create policies, manage jobs, and view reports. In addition to the web-based interface, all SyncIQ functionality is integrated into the OneFS command line interface. For a full list of all commands, run isi sync –-help.

SyncIQ Processes

In order to understand how SyncIQ implements each policy, it is essential to understand the processes associated with data replication

Scheduler

Each PowerScale node has a Scheduler process running. It is responsible for the creation and launch of SyncIQ data replication jobs and creating the initial job directory. Based on the current SyncIQ configuration, the Scheduler starts a new job and updates jobs based on any configuration changes.

Coordinator

The Scheduler launches the Coordinator process. The Coordinators create and oversee the worker processes as a data replication job runs. The Coordinator is responsible for snapshot management, report generation, bandwidth throttling, managing target monitoring, and work distribution. Snapshot management involves capturing the file system snapshots for SyncIQ. The snapshots are locked while in use and deleted after completion. Report management acquires job data from each process and combines this to a single report. Bandwidth throttling provides the Coordinator with bandwidth information to align jobs with available bandwidth. Target monitoring management is monitoring the target cluster’s worker process. And finally, work distribution maximizes job performance by ensuring all worker process have even utilization.

Primary and secondary workers

Primary workers and secondary workers run on the source and target clusters, respectively. They are responsible for the actual data replication piece during a SyncIQ job.

Target monitor

The target monitor provides critical information about the target cluster and does not participate in the data transfer. It reports back with IP addresses for target nodes including any changes on the target cluster. Additionally, the target monitor takes target snapshots as they are required.