It’s just about genius metadata structure…

Copy Data Management (CDM) is a basic feature of every modern storage array today… It’s just impossible to deal with the exponential data growth and IT industry demands for data protection, immediate restore, parallel access, reporting and more and more without CDM capabilities … The storage vendor’s solutions in this area looks as Siamese Twins with different labels… Snapshots, Clones, ShadowImages, FlashCopy, etc… The basic functionality for most of technologies is similar, where the important differences are:

  1. Functional Flexibility
  2. Resource Consumption
  3. Metadata Efficiency

Here I’d like to share with you some XtremIO array internals… Why do you need to follow this technical explanation? The reason is simple… I believe that customers purchase IT/storage equipment to meet their organization business goals… The customer evaluate storage systems by CPU, memory, power consumption spec sheets comparing it to system costs… However, the “internal features” resource consumption are “hidden” (not listed in any spec sheet), but become very significant when you start sharing the resources between YOUR goals /business objectives and the background “HIDDEN” operations… The Intellectual Property/”secret sauce” implemented in different IT solutions is not visible like CPU frequency, but it will define what the real solution’s ROI.

Traditional way

Let’s discuss a bit the conventional block array metadata structure. General idea is always based on block level pointers between Volume Address Space and the Physical Data level and similar to below table where LBA offset is linked to RAID physical address. The below table represent simplest array metadata structure prior to copy creation.

Implementing Copy Data Management based on this metadata structure requires sharing the physical layer between Original and Copy volumes. The typical way storage vendors achieve this goal duplicating the entire volume metadata structure to allow access to the same physical data for both original and copy volume. Your cost: array memory, CPU, power and internal bandwidth consumption


XtremIO Approach

XtremIO internal architecture is based on Content-Address data access. Every data block is represented by a unique Fingerprint (aka Hash). The metadata is separated between two main metadata structures:

Address–to–Fingerprint table (aka A2H = Address to Hash).

The table contains entries for every written data block per volume. Unwritten data blocks don’t consume any metadata resources and automatically respond “zero” content.

Fingerprint–to–Physical Location table (aka HMD = Hash Meta Data).

The table contains entries of the Fingerprint data linked to the Physical location of the XtremIO Data Protection layer and a logical address reference count. The HMD data is global (entry is shared between all the array volumes) and dedup-aware

The above content address metadata structure allows complete abstraction between User Volume Address Space managed on A2H table, and the Physical Meta data managed on HMD table.

Now, let’s see what are the benefits of this metadata architecture for XtremIO iCDM solution.

The XtremIO Virtual Copies (XVC) logic is implemented based on A2H table ONLY, no relationships at all with HMD or physical layer.

  1. XtremIO architecture breaks the paradigm that require LBA – to – Physical layer relationships management once the Copy Volume is created, Refreshed or Recovered. When no metadata copy is required, the copy creation is instant and resource efficient. No additional memory space and CPU resources are required to duplicate it
  2. The performance impact during the copy creation is a function of resources the array “spend” during this operation. Since the A2H table management during copy creation doesn’t involve metadata copy the latency impact is negligible for XVC’s creation
  3. The metadata structure isn’t duplicated on A2H when the Copy is created. The efficient relationships structure allow XVC addresses “link” to relevant Hash. The result – No Array Metadata consumption for XVCs that are not written to. When the copy data is overwritten, the memory consumption is equal to regular volume write operation.

Let’s play a bit with array metadata just to show how it really works. Every new step here is based on previous metadata content.

  1. XVC/ Copy creation

Following XVC creation, the original metadata content stays in-tact and becomes an internal resource serving both Source (Vol-A) and XVC (Vol-B). At this point, and without any dependency on Vol-A size we spent less than 50kb array memory to enable XVC logic. The “red” frame below represents the real metadata while the “green” frames represent just XVC internal access algorithm but no actual metadata

2. XVC and Source volume write or update

When Source or XVC volume address is overwritten by the Host, the “Fingerprint” of new data is updated as a new A2H entry update. Only at this point we are actually consuming the new metadata entry for updated LBA.

3. Copy of a copy

Creating additional level of copy follow literally the same rules of in-memory operation efficiency as described above.

4. Write update on second XVC Level

The “Copy-of-Copy” levels have the same data services and Volume access options as any other array volume. In the below example the “Vol-C” volume Addr-2 is updated with the new content by user. Exactly like in previous example, only user-updated entries will consume memory space, all the rest is “free of charge” in terms of memory and CPU allocation

5. Refresh XVC from original volume

Every XVC volume could be refreshed from ANY other volumes belongings to the same tree (VSG = Volume Snapshot Group) without limitations. In the below example the “Vol-B” content refreshed by “Vol-A”:

Like the Copy creation operation, the Refresh operation is managed in the A2H metadata table as an in-memory operation. The physical data structure layer (XtremIO Data Protection / RAID managing the data location on SSD) is not involved. The A2H table optimization (aka merge) algorithm developed to simplify the XVC tree following multiple operations.

6. XVC / Copy deletion

The iCDM meta data structures support volume or XVC deletion on any level. Let’s see what is the impact of “Vol-C” deletion:

The deleted volume “Address to Fingerprint” relationships are instantly destroyed like as a volume representation on SCSI target. The metadata structure is optimized as batch process when “Internal3” metadata content is merged with Internal1 which is not relevant anymore. The HMD table entry reference count for related Hash entry discarded as well, the physical layer offset marked as free.

All the metadata management processes triggered as part of XVC deletion are background tasks running beside user IO. However, these background processes managed with DIFFERENT priority and resource allocation to prevent performance impact. The basic XtremIO iCDM design assumption is to manage user activity first, and only then allocate the available resources for internal operations.

One of the new cool features in XtremIO X2 generation is “Management Lock” flag. Once applied on volume level the property will block iCDM Refresh, Restore or Volume deletion on specific volume protecting mission critical data from mistakenly taken operation.

  1. All iCDM operations are based on Metadata structure updates only. The operations are highly efficient. The array internal memory and CPU consumption for XVC creation or refresh is close to new volume definition.
  2. There is NO data or metadata copy or movement, nor impact on user performance or efficiency.
  3. The IO performance is identical for source volumes and XVC copies and not impacted following copies creation
  4. The restore or refresh operations based on the same metadata level concept – it is fast and efficient.
  5. The nested copy volume creation is allowed without any “deep level” limitation
  6. There is no limitations for source and copy volume deletion.
  7. No physical layer operations or preparations.
  8. No dedicated pools for snapshot data, no reserved capacity.
  9. All data services allowed without limitations for both source and copy volumes.

And here is some bonus questions for these who will say “so what? Every storage array have copy management solution today”:

  • What is the cost of using snapshots in terms of physical capacity and memory?
  • Are snapshots inherently writeable?  Are extra steps needed to use them as writeable volumes?
  • What is the performance penalty when snapshots are used?
  • What is the impact of taking a snapshot?
  • What is the impact on the source (production) volume?
  • How was the snapshot implementation designed and optimized to use flash as a media?
  • Are all data services fully enabled and fully performing on snapshots?
  • Do I need to configure pools and reserve capacity for snapshots?
  • Can I consolidate dev & test copies with production workloads?  And will the dev/test copies have identical performance in all metrics?
  • How agile are your snapshots?
  • Does your snapshot implementation support nested snaps (snapshots of snapshots)?
  • Can I create snaps of snaps without impacting performance?
  • Can I get the same performance on all snaps regardless of their location in the hierarchy?
  • Can I delete a particular nested snapshot without losing any child snapshots or the source entity?
  • Can I delete the source volume without affecting the rest of the snapshot hierarchy?

if you are looking for a shorter /easier way to understand this:

1 Comment »

Leave a Reply Cancel reply