PowerFlex is a software-defined storage platform is designed to significantly reduce operational and infrastructure complexity, empowering organizations to move faster by delivering flexibility, elasticity, and simplicity with predictable performance and resiliency at scale

is uniquely positioned to meet and exceed modern application requirements for storage, including performance, scalability, and fluidity requirements.

PowerFlex’s unrestricted IO throughput, linear scale-out to thousands of nodes makes this storage uniquely positioned for modern applications with the high demand for IOs such as machine learning, artificial intelligence, CDNs, etc.

Rancher is the enterprise computing platform to run Kubernetes on-premises, in the cloud and at the edge. It addresses the operational and security challenges of managing multiple Kubernetes clusters everywhere. Rancher also provides IT operators and development teams with integrated tools for building, deploying, and running cloud-native workloads.

Rancher not only deploys production-grade Kubernetes clusters from data center to cloud to the edge, it also unites them with centralized authentication, access control and observability. Rancher lets you streamline cluster deployment on bare metal, edge devices, private clouds, public clouds, or vSphere and secure them using global security policies. Use Helm or Rancher App Catalog to deploy and manage applications across any or all these environments, ensuring multi-cluster consistency with a single deployment.

Creating a Kubernetes cluster using Rancher Kubernetes Engine (RKE) and managing it using Rancher as the container orchestration layer on Dell EMC PowerFlex family platform allows customer to meet the performance, scalability, resiliency, and availability requirements of the new cloud native application workloads by leveraging Dell EMC PowerFlex CSI driver to dynamically provision persistent volumes on Rancher managed Kubernetes cluster and providing enterprise grade storage capabilitties using the DELL EMC CSM modules.

Helm is the package management tool of choice for Kubernetes In Rancher v2.5, the apps and marketplace feature is used to manage Helm charts, replacing the catalog system.

The PowerFlex CSI installer can be easily added to the apps and marketplace tab so the installation can be done directly from the Rancher UI, built in support will be added in the future.

Once installed, customer can leverage each of the CSI driver features described here to manage all the volumes’ lifecycle of the workloads running on Rancher.

In addition to the basic functionally cover by the CSI spec , we’ve developed enterprise grade suite on top of the CSI called CSM

Dell EMC Container Storage Modules (CSM) aims at improving the observability, usability, and data mobility for stateful applications with Dell Technologies Storage portfolio.  CSM together with the CSI plugins and the pioneering app-aware, app-consistent backup and recovery solutions form the most comprehensive enterprise grade Storage and Data Protection solutions for Kubernetes from Dell Technologies.

CSM contains several modules:

  • Snapshots – Operational recovery with instant & efficient storage array-based snapshots (through CSI)
  • *Volume Placement – Intelligent volume placement for optimal performance
  • Resiliency – Node failure detection and recovery mechanism
  • Observability – Access storage metrics in tools like Grafana and Prometheus.
  • *Data Replication – Industry leading array-based replication to extend Kubernetes clusters across datacenters
  • Authorization – Access control and RBAC to storage infrastructure with user group support

At the moment, the Container Storage Modules are supported with PowerFlex, additional storage systems will be added in the future

*Still under development

This CSM Observability module provides Kubernetes administrators insight into CSI (Container Storage Interface) Driver persistent storage topology, usage, and performance. Metrics data is collected and pushed to the OpenTelemetry Collector, so it can be processed, and exported in a format consumable by Prometheus. Topology data related to containerized volumes that are provisioned by a CSI (Container Storage Interface) Driver is also captured. The metrics and topology data are visualized through Grafana dashboards.

SSL certificates for TLS between nodes are handled by cert-manager.

CSM Observability can be natively integrated with Rancher built in monitoring system which consists of Prometheus & Grafana

It uses The Open Telemetry Collector to push storage metrics so they can be consumed by Prometheus.

Once Grafana and Prometheus are properly configured, customers can import the pre-built observability dashboards, of course, each dashboard can be edited, and customers can build their own customized dashboards

CSM observability can be natively integrated with Rancher, upon installation, by navigating to the monitoring tab and clicking on the service monitor, you can find the OTEL collector which pulls all the data from the PowerFlex system, by navigating to the Grafana monitoring system we can see that in addition to the built it Rancher dashboards, we have 4 PowerFlex dashboards, each dashboard can be edited, and customers can build their own customized dashboards:

I/O Performance by node dashboard Provides visibility into the I/O performance metrics (IOPS, bandwidth, latency) by Kubernetes node

I/O Performance by volume dashboard Provides visibility into the I/O performance metrics (IOPS, bandwidth, latency) by volume

Storage consumption dashboard Provides visibility into the total, used, and available capacity for a storage class and associated underlying storage construct.

Topology dashboard Provides visibility into Dell EMC CSI (Container Storage Interface) driver provisioned volume characteristics in Kubernetes correlated with volumes on the storage system.

One of the most common questions I’m being asked by storage administrators is how can I limit my Kubernetes administrators from consuming all the capacity of my storage array?

Well, the simple answer is CSM Authorization

Since there’s no built-in capability in Kubernetes, we’ve developed our own mechanism which allows storage administrators to apply quota and RBAC rules that instantly and automatically restrict cluster tenant’s usage of storage resources. Users of storage through CSM Authorization do not need to have storage admin root credentials to access the storage system.

Kubernetes administrators have an interface to create, delete, and manage roles/groups that storage rules may be applied. Administrators and/or users may then generate authentication tokens that may be used by tenants to use storage with proper access policies being automatically enforced.

With the help of CSM authorization customers can:

  • Segregate the array usage between multiple tenants
  • Control storage consumption with the help of quota
  • Ensure that a tenant cannot access storage of other tenants
  • Create, update or revoke storage access at any point in time
  • Audit storage access by the tenants
  • Hide the array management credentials from the CSI driver and replace it with a JSON web token

User applications can have problems if you want their Pods to be resilient to node failure. This is especially true of those deployed with StatefulSets that use PersistentVolumeClaims. Kubernetes guarantees that there will never be two copies of the same StatefulSet Pod running at the same time and accessing storage. Therefore, it does not clean up StatefulSet Pods if the node executing them fails.

CSM Resiliency is a project designed to make Kubernetes Applications, including those that utilize persistent storage, more resilient to various failures. The first component of CSM Resiliency is a pod monitor that is specifically designed to protect stateful applications from various failures. It is not a standalone application, but rather is deployed as a sidecar to CSI (Container Storage Interface) drivers, in both the driver’s controller pods and the driver’s node pods. Deploying CSM Resiliency as a sidecar allows it to make direct requests to the driver through the Unix domain socket that Kubernetes sidecars use to make CSI requests.

CSM Resiliency is primarily designed to detect pod failures due to some kind of node failure or node communication failure. this diagram below illustrates the hardware environment that is assumed in the design

CSM Resiliency’s design is focused on detecting the following types of hardware failures, and when they occur, moving protected pods to hardware that is functioning correctly:

  • Node failure. Node failure is defined to be like a Power Failure to the node which causes it to cease operation.
  • K8S Control Plane Network Failure. Control Plane Network Failure often has the same K8S failure signature (the node is tainted with NoSchedule or NoExecute), however if there is a separate Array I/O interface, CSM Resiliency can often detect that the Array I/O Network may be active even though the Control Plane Network is down.
  • Array I/O Network failure is detected by polling the array to determine if the array has a healthy connection to the node.

The Kubernetes ecosphere continues to grow in huge strides, providing more stability, security, and automatic service discovery. Streamlining some of the basic operations, such as the Kubernetes cluster setup and outline along with dynamically provisioning persistent storage using Rancher and the PowerFlex CSI and CSM on PowerFlex family empowers admins to deploy their Kubernetes environment quickly for developers and end-users, enabling uninterrupted utilization of infrastructure.

Below you can see a recorded session from SuseCon 2021 video showing the integration between Rancher and PowerFlex CSI and CSM:

A guest post by Tomer Nahumi

Leave a Reply Cancel reply