In the first post, we covered the richness of the PowerScale platform. On the second post, we went in depth on the our new platform the F900 , in the third post, […]
In the first post, we covered the richness of the PowerScale platform. On the second post, we went in depth on the our new platform the F900 , in the third post, we covered the S3 support , in the forth post, we covered the new, OneFS 9.2 features &
In the last post (Part5), we covered the intro to SyncIQ. On this post, we will cover configuration of SyncIQ jobs.
To recap, SyncIQ software delivers high-performance, asynchronous replication of unstructured data to address a broad range of recovery point objectives (RPO) and recovery time objectives (RTO). SyncIQ does not impose a hard limit on the size of a replicated file system so will scale linearly with an organization’s data growth up into the multiple petabyte ranges.
SyncIQ is easily optimized for either LAN or WAN connectivity and includes policy level bandwidth control and reservation. This allows replication over both short and long distances, while meeting consistent SLAs and providing protection from both site-specific and regional disasters. Additionally, SyncIQ utilizes a highly parallel, policy-based replication architecture designed to leverage the performance and efficiency of clustered storage. As such, aggregate throughput scales with capacity and allows a consistent RPO over expanding data sets.
When SyncIQ replicates data, it goes through one of three phases. The three phases are
After a policy is configured, the first time it runs, an Initial Replication is executed. During the policy configuration, a user can configure a synchronization or copy policy.
The synchronization policy ensures the target cluster has a precise duplicate of the source directory. As the source directory is modified through additions and deletions, those updates are propagated to the target cluster when the policy runs next. Under Disaster Recovery use cases, the synchronization policy supports a failover to the target cluster, allowing users to continue with access to the same dataset as the source directory.
On the contrary, a copy policy is targeted for archive and backup use cases. A copy policy maintains current versions of files stored on the source cluster.
The first segment of the Initial Replication is the job start. A scheduler process is responsible for starting a data replication job. It determines the start time based on either the scheduled time or a manually started job. Once the time arrives the scheduler updates the policy to a pending status on the source record and creates a directory with information specific to the job.
An Incremental Replication of a SyncIQ policy only transfers the portions of files that have changed since the last run. Therefore, the amount of data replicated, and bandwidth consumption is significantly reduced in comparison to the initial replication.
Like the Initial Replication explained above, at the start of an incremental replication, the scheduler processes create the job directory. Next, the coordinator starts a process of collecting changes to the dataset, by taking a new snapshot and comparing it to the previous snapshot. The changes are compiled into an incremental file with a list of LINs that have been modified, added, or deleted.
Once all the new modifications to the dataset are logged, workers read through the file and start to apply the changes to the target cluster. On the target cluster, the deleted LINs are removed first, followed by updating directories that have changed. Finally, the data and metadata are updated on the target cluster.
As all updates complete, the coordinator creates the job report, and the replication is complete.
Differential replication or target aware sync
In the event where the association between a source and target is lost or broken, incremental replications will not work. At this point, the only available option is to run an initial replication on the complete dataset. Running the initial replication again, is bandwidth and resource intensive, as it is essentially running again as a new policy. The Differential Replication offers a far better alternative to running the initial replication again.
A Differential Replication, like an Incremental Replication only replicates changed data blocks and new data that does not exist on the target cluster. Determining what exists on each cluster is part of the differential replication’s algorithm. The files on the source directory are compared to the target directory to decide if replication is required. The algorithm to determine if a file should be replicated is based on if the file or directory is new, the file size and length, and finally the short and full hash of the file.
Configuring a SyncIQ policy
SyncIQ is configured through policies. The policies provide the starting point of OneFS data replication. The policies offer a breadth of options for an administrator to configure data replication specific to a workflow. SyncIQ is disabled by default on Greenfield PowerScale clusters on OneFS 9.1 or newer.
The SyncIQ policies are configurable through the CLI or the web interface. To configure SyncIQ from the CLI, start with the command
isi sync policies –help
To access the SyncIQ policies from the web interface, once logged in, click Data Protection > SyncIQ, then click the “Policies” tab. A new SyncIQ policy is created by clicking “Create a SyncIQ Policy”, displaying the “Create SyncIQ Policy” window
Naming and enabling a policy
The “Policy Name” field should be descriptive enough for administrators to easily gather the policy workflow, as several policies could be configured on a cluster. A unique name makes it easy to recognize and manage. Additionally, the “Description” field can be used to explain further.
The “Enable this policy” checkbox is a powerful option allowing an administrator to start configuration prior to a target cluster or directory being ready for replication. Temporarily disabling a policy allows for a less intrusive option to deleting a policy when it may not be required. Additionally, after completing the configuration for a policy, it can be reviewed for a final check, prior to enabling.
Synchronization and copy policies
SyncIQ provides two types of replications policies: synchronization and copy. Data replicated with a synchronization policy is maintained on the target cluster precisely as it is on the source – files deleted on the source are deleted next time the policy runs. A copy policy produces essentially an archived version of the data – files deleted on the source cluster will not be deleted from the target cluster. However, there are some specific behaviors in certain cases, explained below.
If a directory is deleted and replaced by an identically named directory, SyncIQ recognizes the re-created directory as a “new” directory, and the “old” directory and its contents will be removed.
If an administrator deletes “/ifs/old/dir” and all of its contents on the source with a copy policy, “/ifs/old/dir” still exists on the target. Subsequently, a new directory is created, named “/ifs/old/dir” in its place, the old “dir” and its contents on the target will be removed, and only the new directory’s contents will be replicated.
SyncIQ keeps track of file moves and maintains hard-link relationships at the target level. SyncIQ also removes links during repeated replication operations if it points to the file or directory in the current replication pass.
If a single linked file is moved within the replication set, SyncIQ removes the old link and adds a new link. Assume the following:
The SyncIQ policy root directory is set to /ifs/data.
/ifs/data/user1/foo is hard-linked to /ifs/data/user2/bar.
/ifs/data/user2/bar is moved to /ifs/data/user3/bar.
With copy replication, on the target cluster, /ifs/data/user1/foo will remain, and ifs/data/user2/bar will be moved to /ifs/data/user3/bar.
If a single hard link to a multiply linked file is removed, SyncIQ removes the destination link.
Using the example above, if /ifs/data/user2/bar is deleted from the source, copy replication also removes /ifs/data/user2/bar from the target.
If the last remaining link to a file is removed on the source, SyncIQ does not remove the file on the target unless another source file or directory with the same filename is created in the same directory (or unless a deleted ancestor is replaced with a conflicting file or directory name).
Running a SyncIQ job
A SyncIQ Policy may be configured to run with four different options.
- On a schedule
- RPO alerts
- Whenever the source is modified
The manual option allows administrators to have a SyncIQ Policy completely configured and ready to run when a workflow requires data replication. If continuous data replication is not required and on an ‘as needed’ basis, this is the best option. Administrators can simply select the policy to run when it is required, limiting cluster overhead and saving bandwidth.
On a schedule
Running a SyncIQ Policy on a schedule is one of the more common options. Once this option is selected, another drop-down appears, to specify the frequency of the job
Options include daily, weekly, monthly, or yearly. Once the frequency is selected further options appear to refine the frequency selection.
An option for sending RPO alerts is available when “On a Schedule” is selected for a running a job. Administrators can specify an RPO (recovery point objective) for a scheduled SyncIQ policy and trigger an event to be sent if the RPO is exceeded. The RPO calculation is the interval between the current time and the start of the last successful sync job.
For example, consider a policy scheduled to run every 8 hours with a defined RPO of 12 hours. Suppose the policy runs at 3 pm and completes successfully at 4 pm. Thus, the start time of the last successful sync job is 3 pm. The policy should run next at 11 pm, based on the 8-hour scheduled interval. If this next run completes successfully before 3 am, 12 hours since the last sync start, no alert will be triggered, and the RPO timer is reset to the start time of the replication job. If for any reason the policy has not run to successful completion by 3 am, an alert will be triggered, since more than 12 hours elapsed between the current time (after 3 am) and the start of the last successful sync (3 pm).
Whenever the source is modified
The “Whenever the Source is Modified” option is also referred to as, ‘SyncIQ continuous mode’, or ‘Replicate on Change’. When the “Whenever the source is modified” policy configuration option is selected (or –- schedule when-source-modified on the CLI), SyncIQ will continuously monitor the replication data set and automatically replicate changes to the target cluster. Continuous replication mode is applicable when the target cluster data set must always be consistent with the source, or if data changes at unpredictable intervals.
Events that trigger replication include file additions, modifications and deletions, directory path, and metadata changes. SyncIQ checks the source directories every ten seconds for changes.
As a best practice, if the “Whenever the source is modified” option is selected, configure the “ChangeTriggered Sync Job Delay” option for a reasonable delay to propagate multiple updates into a single update
Whenever a snapshot of the source directory is taken
A SyncIQ policy can be configured to trigger when the administrator takes a snapshot of the specified source directory and matching a specified pattern.
If this option is specified, the administrator-taken snapshot will be used as the basis of replication, rather than generating a system snapshot. Basing the replication start on a snapshot is useful for replicating data to multiple targets – these can all be simultaneously triggered when a matching snapshot is taken, and only one snapshot is required for all the replications. To enable this behavior, select the “Whenever a snapshot of the source directory is taken” policy configuration option on the GUI. Alternatively, from the CLI, use the flag, — schedule=when-snapshot-taken.
Source cluster directory
The Source Cluster section is used to specify where the source data resides that will be replicated to the target cluster.
A SyncIQ policy by default includes all files and folders under the specified root directory. Optionally, directories under the root directory can be explicitly included or excluded.
File matching criteria
In addition to refining the source dataset through the included and excluded directories, file matching further refines the selected source dataset for replication.
A SyncIQ policy can have file-criteria statements that explicitly include or exclude files from the policy action. A file-criteria statement can include one or more elements, and each file-criteria element contains a file attribute, a comparison operator, and a comparison value. To combine multiple criteria elements into a criteria statement, use the Boolean ‘AND’ and ‘OR’ operators. Any number of ‘AND’ and ‘OR’ file-criteria definitions may be configured.
Restricting SyncIQ source nodes
SyncIQ utilizes a node’s front-end network ports to send replication data from the source to the target cluster. By default, SyncIQ policies utilize all nodes and interfaces to allow for maximum throughput of a given policy. However, an administrator may want to exclude certain nodes from a SyncIQ policy. Excluding nodes from a SyncIQ policy is beneficial for larger clusters where data replication jobs can be assigned to certain nodes. In other cases, a client workflow may require a higher priority on a performance node over participating in data replication. From the policy configuration window, an option is available to run the policy on all nodes, or specifying a subnet and pool.
By selecting a predefined IP address pool, administrators can restrict replication processing to specific nodes on the source cluster. This option is useful to ensure that replication jobs are not competing with other applications for specific node resources. Specifying the IP address pool allows administrators to define which networks are used for replication data transfer.
Target host and directory
In the “Target Host” field, specify the IP address or fully qualified domain name of the target cluster. It is important to ensure the DNS hosts specified on the source cluster can resolve the FQDN of the target cluster. In the “Target Directory” field, specify the directory where data from the source cluster is replicated. As stated above, it is recommended to consider the Access Zones best practices as the location of the target directory eases failover and failback operations in the future.
When a policy target cluster name or address is specified, a SmartConnect DNS zone name is used instead of an IP address or a DNS name of a specific node. An administrator may choose to restrict the connection to nodes in the SmartConnect zone, ensuring the replication job will only connect with the target cluster nodes assigned to that zone. During the initial part of a replication job, SyncIQ on the source cluster establishes an initial connection with the target cluster using SmartConnect. Once a connection with the target cluster is established, the target cluster replies with a set of target IP addresses assigned to nodes restricted to that SmartConnect zone. SyncIQ on the source cluster will use this list of target cluster IP addresses to connect local replication workers with remote workers on the target cluster.
Depending on the administrator’s requirements, archiving snapshots may be required on the target cluster. Configuring snapshot archival on the target cluster is an optional configuration.
By default, if the “Enable capture of snapshots on the target cluster” is not selected, the target cluster only retains the most recent snapshot, which is used during a failover.
To enable snapshot archiving on the target cluster, a SnapshotIQ license is required. When SyncIQ policies are set with snapshots on the target cluster, on the initial sync a snapshot will be taken at the beginning and the end. For incremental syncs, a snapshot will only be taken at the completion of the job.
SyncIQ Advanced Settings provide several options to configure a SyncIQ policy.
From the “Priority” drop-down, select a priority level for the SyncIQ policy. PowerScale SyncIQ provides a mechanism to prioritize policies. Policies can optionally have a priority setting – policies with the priority bit set will start before unprioritized policies. If the maximum number of jobs are running, and a prioritized job is queued, the shortest running unprioritized job will be paused by the system to allow the prioritized job to run. The paused job will then be started next.
From the “Log Level” drop-down, specify a level of logging for this SyncIQ policy. The log level may be modified as required during a specific event. SyncIQ logs provide detailed job information. To access the logs, connect to a node and view its /var/log/isi_migrate.log file. The output detail depends on the log level, with the minimal option being “Fatal” and the maximum logging option being “Trace”.
Validate file integrity
The “Validate File Integrity” checkbox, provides an option for OneFS to compare checksums on SyncIQ file data packets pertaining to the policy. In the event a checksum value does not match, OneFS attempts to transmit the data packet again.
Prepare policy for accelerated failback performance
PowerScale SyncIQ provides an option for an expedited failback by running a ‘domainmark’ process. The data is prepared for failback the first time a policy runs or retroactively in the future.
Keep reports duration
The “Keep Reports” option, defines how long replication reports are retained in OneFS. Once the defined time has exceeded, reports are deleted.
Record deletions on synchronization
Depending on the IT administration requirements, a record of deleted files or directories on the target cluster may be required. By default, OneFS does not record when files or directories are deleted on the target cluster. However, the “Record Deletions on Synchronization” option, can be enabled if it is required.
Deep copy for CloudPools
PowerScale clusters that are using CloudPools to tier data to a cloud provider have a stub file, known as a SmartLink, that is retained on the cluster with the relevant metadata to retrieve the file at a later point. Without the SmartLink, a file that is tiered to the cloud, cannot be retrieved. If a SmartLink is replicated to a target cluster, the target cluster must have CloudPools active with the same configuration as the source cluster, to be able to retrieve files tiered to the cloud. For more information about SyncIQ and CloudPools, refer to Section 13, SyncIQ and CloudPools.
SyncIQ can conduct a trial run of a policy without transferring file data between locations; this is referred to as an “Assess Sync”. Not only does an “Assess Sync” double-check the policy configuration, but it also provides an indication of the time and the level of resources an initial replication policy is likely to consume. This functionality is only available immediately after creating a new policy before it has been run for the first time. To run an “Assess Sync”, from the SyncIQ Policies tab, click “More” for the appropriate policy, and select “Assess Sync”.
Impacts of modifying SyncIQ policies
SyncIQ policies may be modified and updated through the CLI or the web interface. The impact of the change is dependent upon how the policy is modified. Rather than modifying or deleting a policy when a suspension is required, the policy may also be disabled, allowing for it to be re-enabled with minimal impact at a later point.
After a policy is configured and the policy has run, SyncIQ will run either the initial replication again or a differential replication if the following variables are modified:
• Source directory
• Included or excluded directories
• File criteria: type, time, and regular expressions
• Target cluster, even if the new target cluster is identical to the old one
– IP and DNS changes will not trigger a full replication. However, if the cluster GUID changes, the job will fail at runtime. Also, unlike the other settings, a manual reset of the affected policy is required to be able to run an associated job.
• Target directory
If a SyncIQ replication policy is deleted, replication jobs will not be created for the policy. Any snapshots and reports associated with the policy are also deleted. The target cluster will break the association to the source cluster, removing the local target entry, and the target directory will allow writes.
SyncIQ performance rules
Performance Rules provide several options for administrators to define limits of resource consumption for SyncIQ policies during specific times or continuously. Setting performance limits allows for minimal impact to high priority workflows but allows nodes to participate in replication within a defined set of resources.
SyncIQ uses aggregate resources across the cluster to maximize replication performance, thus potentially affecting other cluster operations and client response. The default performance configurations, number of workers, network use, and CPU consumption may not be optimal for certain data sets or the processing needs of the business. CPU and network use are set to ‘unlimited’ by default. However, SyncIQ allows administrators to control how resources are consumed and balance replication performance with other file system operations by implementing a number of cluster-wide controls. Rules are created to define available resources for SyncIQ policies for different time periods.
To view or create SyncIQ Performance Rules from the OneFS web interface, click Data Protection > SyncIQ and select the “Performance Rules” tab. Existing Performance Rules are displayed. Click “Create a SyncIQ Performance Rule”, to add a new rule
From the Rule Type drop-down menu, select one of the following options:
• Bandwidth: This option provides a limit on the maximum amount of network bandwidth a SyncIQ policy can consume. Once “Bandwidth” is selected the “Limit” field changes to kb/s. In the “Limit” field, specify the maximum allowable bandwidth in kb/s.
• File Count: This option allows administrators to define a maximum number of files that replication jobs can send per second. Once “File Count” is selected, the “Limit” field changes to files/sec. In the “Limit” field, specify the maximum allowable files/sec.
• CPU: This option limits the CPU consumption to a percentage of the total available. Once “CPU” is selected, the “Limit” field changes to “%”. In the “Limit” field, specify the maximum allowable “%” for the maximum CPU consumption.
• Workers: This option limits the number of workers available to a percentage of the maximum possible. Once “Workers” is selected, the “Limit” field changes to “%”. In the “Limit” field, specify the maximum percentage of workers.
These performance rules will apply to all policies executing during the specified time interval.