Wide distribution of Data for Massive Performance Flex widely distributes data across all storage resources in the cluster, which eliminates the architectural problems of other IP-based storage systems. With VxFlex […]
Share this:
Wide distribution of Data for Massive Performance
Flex widely distributes data across all storage resources in the cluster, which eliminates the architectural problems of other IP-based storage systems. With VxFlex OS, ALL of the IOPS and bandwidth of the underlying infrastructure are realized by a perfectly balanced system with NO hot spots.
Massive Availability and Resiliency
Flex has a self-healing architecture that employs many to many, fine-grained rebuilds, which is much different than the serial rebuilds seen with most storage products. When hardware fails, data automatically rebuilt using all other resources in the cluster. This enables a 6×9’s availability profile while using x86 commodity hardware. Flex can rebuild an entire node with 24 drives in mere minutes – a fraction of the time it takes to rebuild a single drive on a traditional array.
Built In Multipathing
Flex automatically distributes traffic across all available resources. Every server can be a target as well as an initiator. This means as you add/remove nodes in the cluster, multipathing is dynamically updated on the fly. Inherent, dynamic built-in multipathing.
Dell EMC VxFlex Ready Nodes converge storage and compute resources into a single layer architecture, aggregating capacity and performance with simplified management capable of scaling to over a thousand nodes. VxFlex OS provides the maximum in flexibility and choice. VxFlex OS supports high performance databases and applications, at extreme scale (again from as little as 3 nodes to over 1000 per cluster) and supports multiple OS, Hypervisors or Media. You can build the infrastructure that best supports your applications. Choose your hardware vendors or use what you already have in house.
TWO LAYER-
Similar structure to traditional SAN
Supports organizations who prefer separation between storage and application teams
Allows scaling of storage needs separately from the application servers
New 100Gb Switch for aggregation layer
Or
HCI
Provides maximum flexibility and easier to administrate
Servers host both applications and storage
Modern approach to manage IT Data Center
Provides maximum flexibility and easier to administrate
Maintenance of servers impact both storage and compute
In a Storage-only Architecture:
The SDC exposes VxFlex OS shared block volumes to the application.
Access to OS partition may still be done “regularly”
VxFlex OS data client (SDC) is a block device driver
The SDS owns local storage that contributes to the VxFlex OS storage pool
VxFlex OS data server (SDS) is a daemon / service
IN two-layer, SDC and SDS run on different nodes and can grown independent of each other
We have just released VxFlexOS 3.0 which includes many asked features, here’s what’s new
Fine Granularity (FG) Layout
Fine Granularity layout (FG) – a new, additional storage pool layout using a much finer storage allocation units of 4KB. This is in addition to existing Medium Granularity (MG) storage pools using a 1MB allocation units.
4KB allocation unit allows better efficiency in thin-provisioned volumes and snapshots. For customers that frequently use snapshots, this layout will create significant capacity savings.
Note: FG storage pools require nodes with NVDIMMs and SSD/NVMe media type
Inline compression – Fine Granularity layout enables data compression capability that can reduce the total amount of the physical data that needs to be written to SSD media. Compression saves storage capacity by storing data blocks in the most efficient manner, when combined with VxFlex OS snapshot capabilities, can easily support petabytes of functional application data.
Persistent Checksum – In addition to the ‘inflight checksum’ available, persistent checksum presents an added data integrity for the data and metadata of FG storage pools. Background scanners monitor the integrity of the data and metadata over time.
VxFlex OS 3.0 introduces an ADDITIONAL, more space efficient storage layout
Existing – Medium Granularity (MG) Layout • Supports either thick or thin-provisioned volumes • Space allocation occurs at 1MB units • No attempt is made to reduce the size of user-data written to disk (except with all-zero data) Newly Added – Fine Granularity (FG) Layout • Supports only thin-provisioned, “zero-padded” volumes • Space allocation occurs at finer 4KB units • When possible, reduces actual size of user-data stored on disk • Includes Persistent Check-summing for data integrity A Storage Pool (SP) can be either a FG or MG type • FG storage pools can live alongside MG pools in a given SDS • Volumes can be migrated across the two layouts (MG volumes zero padded) • FG pools require SSD/NVMe media and NVDIMM for acceleration
Inline Compression
• Compression algorithms in general • What’s desirable is something off-the shelf, standard, field-proven • Lempel-Ziv (LZ) based compression (recurring patterns within preset windows) w/wo Huffman coding • Very good for compression: Text (>80%), DB (~70% [ranges from 60% – 80%] ) • The algorithm used in VxFlex OS 3.0 is C-EDRS, • DellEMC proprietary (similar to LZ4). The same algorithm that XtremIO uses • Good balance of compression ratio and performance (light on CPU) • We test compressibility in-line (on the fly) • Some data is not a good candidate for compression (e.g. videos, images, compressed DB rows) • Invest CPU cycles up front. If not reducible more than 20%, consider incompressible and store uncompressed • Don’t waste CPU cycles later decompressing read IOs
Persistent Checksum • Logical Checksum (protects the uncompressed data) • All data written to FG pools, with or without compression, have a logical checksum always calculated by default (cannot be changed) • If we compress the data: the checksum of the original (uncompressed) data is calculated before being compressed and written to the
disk and is stored on disk with the data * • If the data is not compressed (by user selection or because of incompressibility), the checksum is calculated and stored elsewhere ** • Physical Checksum (protects the compressed data) • Protects the integrity of the Log itself • Computed for the Log, after placing Entries into the Log • Computed over the compressed data and the embedded metadata • thus protects the integrity of the compressed data • Metadata Checksum • Maintaining the integrity of the metadata itself is crucial • Cannot reconstruct metadata from (compressed) user data • Disk level metadata • There is a checksum for each physical row in the metadata that lives on each disk • If we detect an error in the metadata, we do not trust anything on the disk and trigger a rebuild
Background Device Scanner
Scans devices in the system for errors • You can enable/disable the Background
Device Scanner & reset its counters • MG storage pools • Disabled by default • No changes, same as 2.x • FG storage pools • Enabled by default • Mode: device_only – report and rebuild on error • Cycle through each SSD and compare the Physical
Checksums against the data in the Logs and Metadata • GUI controls/limits disk IO – default is 1024 KB/s per device
Choose the best Layout for each workload • being able to choose for each workload the layout that works best for you • MG • Workloads with high performance requirements and sensitivity • All of our usual use cases still apply • FG compressed • A great choice for most cases where data is compressible • And where space efficiency is more valuable than raw IO • Esp. when there is snapshot usage / requirements • DevOps and Test/Dev environments • FG non-compressed • Data isn’t compressible (e.g. OS or application-level encryption) • But you use lots of snapshots & want the space savings • Read-intensive workloads w/ >4K IOs • Need persistent checksums • And change your mind… You can migrate “live” to another
on-Disruptive Volume Migration
Prior to v3.0, a volume is bound to a Storage Pool on creation and this binding cannot be later changed • There are various use cases: – Migrating volumes between different performance tiers – Migrating volumes to a different Storage Pool or Protection Domain driven by multi-tenancy needs – Extract volumes from a deprecating Storage Pool or Protection Domain to shrink a system – Change a volume personality • Thin -> Thick • FG -> MG • Migrating volumes from one Storage Pool to another – V-Tree granularity – volume and all snapshots are migrated together – Non-disruptive to ongoing IO, hiccups are minimized – Migration supported across • Storage Pools within the same Protection Domain • Storage Pools across Protection Domains • Supports older v2.x SDCs
Snapshots and Snapshot Policy Management
Volume Snapshots • Prior to v3.0, there was a limit of 32 items in a volume tree (V-Tree) – 31 snapshots + root volume • In v3.0, this is increased to 128 (for both FG and MG layouts) – 127 snapshots + root volume • Snapshots in FG are more space efficient and have better performance – In comparison to MG snapshots – 4KB block management, 256x less to manage with each subsequent write • Remove Ancestor snapshot – Ability to remove the parent of a snapshot and maintain the snapshot in the system – In essence merging the parent to a child snapshot • Policy managed snapshots – Up to 60 policy-managed snapshots per root volume (taken from the 128 total available)
Snapshot Policy • The policy is hierarchical • For example, we would like to keep: – An hourly backup for the most recent day – A daily backup for a week – A weekly backup for 4 weeks • Implementation is simplified – set the basic
cadence, and the number snapshots to
keep at each level – The number of snapshots to keep is the same as
the rate of elevating the snapshot to the next level • Max retention levels = 6 • Max snapshots retained in a policy = 60
Auto Snapshot Group The snapshots of an auto snapshot group
– Consistent (unless mapped) – Share the same expiration and should be deleted at the same time (unless locked) Auto Snapshot Group is NOT a Snapshot Consistency Group – A single snapshot CG may contain several auto snapshot groups – Snapshot CGs are not aware of locked snapshots ▪ Therefore deleting snapshot CGs which contain auto snapshots is blocked
Auto snapshot is a snapshot which
was created by a policy
The auto snapshot group is an internal
bject not exposed to the user – Hinted when snapshots are grouped by date/time in several views
Updated System Limits
Maximum SDS capacity has increased from 96TB to 128TB Maximum SDS per PD has increased from 128 to 256 Maximum snapshot count per source volume is now 128 (FG/MG)
Fine Granularity (FG) ▪ Maximum allowed compression ratio: 10x ▪ Maximum allowed overprovisioning: 10x (compare vs. 5x in MG thin-provisioned)
SDC limitation in vSphere ▪ 6.5 & 6.7 – up to 512 mapped volumes ▪ 6.0 – up to 256 mapped volumes
Updates and Changes
VxFlex OS 3.0 • Added Support for native 4Kn sector drives – Logical Sector size & physical Sector size fields • Removed – Windows backend (SDS,MDM) support ▪ No support for Windows HCI, only compute nodes (SDC) – AMS Compute feature support • Security updates: – Java: Enable newer versions of Java 8 builds – CentOS 7.5 SVM passed NESSUS security scanning and STIG • New mapping required to define disk type in a storage pool (SSD/HDD) – Attempt to add disks that are not of the correct type will be blocked – Existing SPs will need to be assigned a Media type post upgrade • Transition to CentOS 7.5 Storage VM – New 3.0 SVM installations only – Replace SVM from SLES11.3/12.2 to CentOS 7.5 will be available after 3.0 (3.0.x)
OS Patching • New Ability in the IM\GW to run a user provided script on a VxFlex OS system as part of an
orchestrated, non-disruptive process (like NDU), which is usually intended for OS patching • Supported on RHEL and SLES • Using this feature includes two main steps
User should manually copy the script file to each vxFlex OS host using those prerequisites: 1. Main script name must be patch_script (we check result code is 0 at the end of execution) 2. Verification script name must be verification_script (we check result code is 0 at the end of execution) 3. The script must be copied to ~/lia/bin folder and add execution permissions • RC codes are saved in the LIA log and an error is returned if needed to the GW 1. User execute the scripts from IM\GW UI 2. It’s the customer responsibility to test the patch_script and verification_scripts prior to running the process via GW
• Execution steps: 1. Login to IM\Gateway web view 2. Select “Maintain Tab” 3. Enter MDM IP & Credentials 4. Under “System Logs & Analysis” select “Run Scrip On Host”
OS Patching 1. Run Script on Host window open 2. Select the scope of running the script/s on • Entire System – All vxFlex OS Nodes
“In parallel on different Protection Domains” – By default the script is
running on first host’s PD then move to the second and so on. By
selecting this option, the patch_script will run in parallel on all PDs. • Protection Domain : Specific PD • Fault set : specific fault set • SDS : single node
Note: PD’s that don’t have MDM’s will be first , and cluster Nodes will be
last. 3. Define “Running configuration” parameters • Stop process on script failure • Script Timeout: How much time to wait for the script to finish • Verification Script: Do you want to run verification_script after
patch_script was run • Post script action: Do you want to reboot the host after patch_script
executed. • If reboot selected – patch_script will run à Reboot à verification_script
will run
• Press “Run script on Hosts”, Validate phase will start • This phase sends a request to each of the host’s LIA the verify the existence of patch_script and
verification_script (if selected) files under ~/lia/bin • Press “Start execution phase” button • IM will make some verifications: check no filed capacity, check spare capacity, check cluster is in valid
state and no other SDS is in maintenance mode. • Enter SDS to maintenance mode , Run the patch_script • Reboot host (If required) • Run verification script (If required) • Exit from maintenance mode • Operation completed • After successful run the patch_script file is deleted and backup file of it created on the same
folder with the name backup_patch_script OS Patching Configuration Step III
Multi LDAP Servers Support • Deploy GW as usual, post Deploy use FOSGWTool tool to add LDAP servers to the GW login
authority • Support up to 8 LDAP servers • We use the same method as configuring a single LDAP server to support multiple (change in
command syntax) • New capability in FOSGWTool, to add multiple LDAP servers • Details will be available in the LDAP TN • Log file is the same as Gateway logs (operations.log, scaleio.log and scaleio-trace.log) • Usual errors are related to syntax of commands or networking misconfiguration
GW Support in LDAP for LIA
• New ability to deploy system with LIA user already configured to use LDAP • In Deployment you can configure the first LDAP server • In Query phase the communication to LDAP is performed to validate the info • So install will not proceed until the LDAP check has passed • Post upgrade ability to switch LIA from local user to LDAP user • Ability to Add \ Remove up to 8 LDAP servers • Check is done during add or Remove, any error will fail the operation.
You can download VxFlex OS v3.0 from the link below (click the screenshot)
And you can download the documentation from here
You can also watch a video below, showing the new compression and snapshots functionalities