Back in 2018, i traveled to a customer advisory board and talked about CSI, i then got ALL the customers basically telling me “That’s great but what about backup”?

Kubernetes is a portable, extensible open-source platform for managing containerized workloads and services.

PowerProtect Data Manger (PPDM) allows you to protect your production workloads in Kubernetes (K8s) environments, ensuring that the data is easy to backup and restore, always available, consistent, and durable in a Kubernetes workload or DR situation.

  1. PPDM has a Kubernetes-native architecture developed for Kubernetes environments.
  2. Easy for the IT Ops team to use and is separate from the dev ops environment; allows centralized governance from the dev ops environment.
  3. Users are protecting into Data Domain, benefiting from secondary storage with unmatched efficiency, deduplication, performance, and scalability – and near-future plans to protect to object storage for added flexibility.

Dell EMC and Velero are working together in an open source community to improve how you protect your data, applications, and workloads for every step of your Kubernetes journey.

Velero is:

  • A an open source tool to safely backup and restore, perform disaster recovery, and migrate Kubernetes cluster resources and persistent volumes.
  • For the Kubernetes administrator or the developers, and it handles protection for Kubernetes.
  • A tool that focuses on backup/restore of the K8s configuration.

PPDM builds on top of these capabilities to:

  • Provide a data protection solution that has single management for VMs, applications, and containers.
  • Provide an enterprise grade solution that allows you to place your production workloads in K8s environment
  • Focus on crash consistent backup/restore that is always available and durable in a K8s workload or DR situation

Kubernetes, or k8s (k, 8 characters, s), or “kube” is a popular open source platform for container orchestration which automates container deployment, container (de)scaling and container load balancing

  • Kubernetes automates container operations.
  • Kubernetes eliminates many of the manual processes involved in deploying and scaling containerized applications.
  • You can cluster together groups of hosts running containers, and Kubernetes helps you easily and efficiently manage those clusters.
  • Kubernetes is an ideal platform for hosting cloud-native applications that require rapid scaling.

Kubernetes Features

  • Automated Scheduling: Kubernetes provides advanced scheduler to launch container on cluster nodes.
  • Self Healing Capabilities: Rescheduling, replacing, and restarting the containers which are died.
  • Automated roll-outs and rollbacks: Kubernetes supports roll-outs and rollbacks for the desired state of the containerized application.
  • Horizontal Scaling and Load Balancing: Kubernetes can scale up and scale down the application as per the requirements.

K8s Architectural Overview

PowerProtect is our answer to modern challenges

  • It allows customers to take existing production workloads or new workloads and start placing them in kubernetes production environments, knowing that they will be protected.
  • It allows IT operations and backup admins to manage k8s data protection from a single Enterprise-grade management UI, as well as allowing a k8s admins to define protection for their workloads from the k8s APIs.
  • We are building the solution in collaboration with VMware Velero – which focuses on data protection and migration for k8s workloads.

When building the solution we focus on 3 pillars:

  • Central Management
    • When given a K8s cluster credentials – Next-Gen SW will discover the namespaces, labels and pods in the environment, you will be able to protect namespaces or specific pods
    • Protection is defined via the same PLC mechanism that Next-Gen SW has
    • Logging, Monitoring, governance, recovery are done through Next-Gen SW, the same way they are done for other assets
    • Efficient and Flexible
    • Same Data Protection Platform – our solution is built into DPD’s single Next-Generation software, so as an IT ops you only need to manage thought one platform – your VMs, your applications and your containers
    • Protection to deduped storage allows great TCO with DD/DDVE/Next-Gen Hardware superior deduplication. Protection can also be performed to S3-compatible storage (ECS, S3 in a public cloud next release).
    • Next-Generation SW is planned to protect any persistent volume (i.e. crash-consistent images), an will protect quiesced applications such as MySQL, Postgres, Mongo and Cassandra in future
    • Protection is planned for any kubernetes deployment, such as:
      • PKS (Essential/Enterprise/Cloud)
      • Openshift
      • GCP (Anthos/GKE)
      • AWS (EKS)
      • Openstack
      • On-prem bare metal
  • Built for Kubernetes
    • By using the k8s APIs we allow flexibility in which clusters can be protected. It’s possible to use additional applications such as Grafana, Prometheus, istio, helm to augment capabilities and automation to the solution.
    • Next-Gen SW discovers, shows and monitors k8s resources – namespaces and persistent volumes.
    • No sidecars – there is no need to install a backup client container for each pod (which is time consuming, has a large vector of attack, consumes lots of resources and does not scale)
    • Node affinity – by providing protection controllers per node we avoid cross-node traffic. This is more efficient and more secure.

PowerProtect is our answer to modern challenges

  • It allows customers to take existing production workloads or new workloads and start placing them in kubernetes production environments, knowing that they will be protected.
  • It allows IT operations and backup admins to manage k8s data protection from a single Enterprise-grade management UI, as well as allowing a k8s admins to define protection for their workloads from the k8s APIs.
  • We are building the solution in collaboration with VMware Velero – which focuses on data protection and migration for k8s workloads.

When building the solution we focus on 3 pillars:

  • Central Management
    • When given a K8s cluster credentials – Next-Gen SW will discover the namespaces, labels and pods in the environment, you will be able to protect namespaces or specific pods
    • Protection is defined via the same PLC mechanism that Next-Gen SW has
    • Logging, Monitoring, governance, recovery are done through Next-Gen SW, the same way they are done for other assets
    • Efficient and Flexible
    • Same Data Protection Platform – our solution is built into DPD’s single Next-Generation software, so as an IT ops you only need to manage thought one platform – your VMs, your applications and your containers
    • Protection to deduped storage allows great TCO with DD/DDVE/Next-Gen Hardware superior deduplication. Protection can also be performed to S3-compatible storage (ECS, S3 in a public cloud next release).
    • Next-Generation SW is planned to protect any persistent volume (i.e. crash-consistent images), an will protect quiesced applications such as MySQL, Postgres, Mongo and Cassandra in future
    • Protection is planned for any kubernetes deployment, such as:
      • PKS (Essential/Enterprise/Cloud)
      • Openshift
      • GCP (Anthos/GKE)
      • AWS (EKS)
      • Openstack
      • On-prem bare metal
  • Built for Kubernetes
    • By using the k8s APIs we allow flexibility in which clusters can be protected. It’s possible to use additional applications such as Grafana, Prometheus, istio, helm to augment capabilities and automation to the solution.
    • Next-Gen SW discovers, shows and monitors k8s resources – namespaces and persistent volumes.
    • No sidecars – there is no need to install a backup client container for each pod (which is time consuming, has a large vector of attack, consumes lots of resources and does not scale)
    • Node affinity – by providing protection controllers per node we avoid cross-node traffic. This is more efficient and more secure.

A PersistentVolume (PV) is a piece of storage in the cluster that has been provisioned by an administrator or dynamically provisioned using Storage Classes. It is a resource in the cluster just like a node is a cluster resource. PVs are volume plugins like Volumes, but have a lifecycle independent of any individual Pod that uses the PV. This API object captures the details of the implementation of the storage, be that NFS, iSCSI, or a cloud-provider-specific storage system.

A PersistentVolumeClaim (PVC) is a request for storage by a user. It is similar to a Pod. Pods consume node resources and PVCs consume PV resources. Pods can request specific levels of resources (CPU and Memory). Claims can request specific size and access modes (e.g., they can be mounted once read/write or many times read-only).

While PersistentVolumeClaims allow a user to consume abstract storage resources, it is common that users need PersistentVolumes with varying properties, such as performance, for different problems. Cluster administrators need to be able to offer a variety of PersistentVolumes that differ in more ways than just size and access modes, without exposing users to the details of how those volumes are implemented. For these needs, there is the StorageClass resource.

Container Storage Interface (CSI) defines a standard interface for container orchestration systems (like Kubernetes) to expose arbitrary storage systems to their container workloads.

We used hostPath in our lab, Kubernetes supports hostPath for development and testing on a single-node cluster. A hostPath PersistentVolume uses a file or directory on the Node to emulate network-attached storage.

K8 Components to be backed up

Namespaces

  • Kubernetes namespaces can be seen as a logical entity used to represent cluster resources for usage of a particular set of users. This logical entity can also be termed as a virtual cluster. One physical cluster can be represented as a set of multiple such virtual clusters (namespaces). The namespace provides the scope for names. Names of resources within one namespace need to be unique.

PersistentVolumeClaim (PVC)

  • PVC is a request for storage by a user. It is similar to a Pod. Pods consume node resources and PVCs consume PV resources. Pods can request specific levels of resources (CPU and Memory). Claims can request specific size and access modes (e.g., they can be mounted once read/write or many times read-only).

storageClassName

  • A claim can request a particular class by specifying the name of a StorageClass using the attribute storageClassName. Only PVs of the requested class, ones with the same storageClassName as the PVC, can be bound to the PVC.

CSI

  • Container Storage Interface (CSI) defines a standard interface for container orchestration systems (like Kubernetes) to expose arbitrary storage systems to their container workloads.
  • Once a CSI compatible volume driver is deployed on a Kubernetes cluster, users may use the csi volume type to attach, mount, etc. the volumes exposed by the CSI driver.
  • The csi volume type does not support direct reference from Pod and may only be referenced in a Pod via a PersistentVolumeClaim object

Note: We used hostPath in our lab, Kubernetes supports hostPath for development and testing on a single-node cluster. A hostPath PersistentVolume uses a file or directory on the Node to emulate network-attached storage.

Protecting K8 workloads using PowerProtect Data Manager

Asset Source

  • Asset source for the Kubernetes cluster is the cluster’s master node’s  FQDN or IP address. In case of a HA cluster, the external IP of the load balancer must be used for the asset source.
  • The default port for a production Kubernetes API server with PPDM is 6443.
  • PowerProtect will use bearer token or a kubeconfig file to authenticate with the Kubernetes API server.

Assets

PowerProtect will discover two types of assets for  protection in Kubernetes clusters

  • Namespaces
  • Persistent Volume Claims (PVC). PVC’s are namespace bound and so should be shown as children of the namespace they belong to in the UI.
  • PowerProtect will use Velero for protection of namespaces (metadata). PowerProtect will drive PVC snapshot and backup using its own controller.

PowerProtect components on the Kubernetes Cluster

PowerProtect will install the following components on the Kubernetes cluster when a Kubernetes cluster is added as an asset source.

  • Custom Resource Definitions for  BackupJob, RestoreJob, BackupStorageLocation, BackupManagement
  • Service account for PowerProtect controller
  • Cluster role binding to bind service account to cluster admin role
  • Deployment for PowerProtect Controller with replica set of 1 for R3
  • Velero
  • PLC Configuration & Asset Configuration
  • When a PLC is created, a new storage unit (SU) is created on the protection storage as part of PLC configuration. In case of a PLC of type Kubernetes,  a BackupStorageLocation containing the SU information will also be created on the cluster.
  • https://www.youtube.com/watch?v=RgthoFIV_dM&feature=youtu.bePowerProtect controller running in the Kubernetes cluster will create a corresponding BackupStorageLocation in the Velero namespace whenever a BackupStorageLocation is created in the PowerProtect namespace.

    Protection driven from PLC

    When protection action is triggered, CNDM will post a BackupJob custom resource (for each namespace asset in that PLC) to the Kubernetes API server and monitor the status. BackupJob custom resource name will be <namespace>-YYYY-MM-DD-SS. Backupjob will include

    The namespace asset that needs to be protected

    All the PVC assets in that namespace that need to be protected

    Backup storage location (target).

    PowerProtect Controller that is watching for these custom resource will be notified.

    PowerProtect Controller will create a Velero Backup custom resource to backup with the following information

    Namespace

    Velero setting to not include PVC and PV resource types

    Velero setting to include cluster resources.

    Velero setting to not take snapshots

    Velero BackupStoragelocation corresponding to the PLC SU

    PowerProtect controller will monitor and wait for Velero backup to complete.

    Since the provider for the BackupStorageLocation is DataDomain, Velero will invoke the DataDomain Object store plugin to write data to the storage unit.

    Once Velero CR status indicates that the backup has completed, PowerProtect controller will then perform steps 7, 8, 9 for each PVC

    Snapshot PVC

    Launch cProxy pod with the snapshot volume mounted to the pod

    cProxy pod will write snapshot contents to DD.

    Once all the PVC’s are backed up, PowerProtect controller will update the status in the BackupJob custom resource. The manifest will include all files created by PowerProtect and Velero. In order to get the list of files created by Velero, PowerProtect controller will read the DataDomain folder after the Velero backup is complete.

    CNDM will create a protection copy set containing protection copy for the namespace and each PVC asset

    Note: Kubernetes etcd datastore has a limit of 1.5 MiB for document size by default. If the PVC’s in a single namespace exceed couple of hundred, we can run into document size limits

    • The controller currently supports up to 10 backupjobs (namespaces) simultaneously.  Within each backupjob, the pvc’s are backed up one at a time. So at a time, there can at the most be 10 cproxy’s running for backup.
      • This is an initial implementation. As a reference point – Velero today is completely serial.
    • CSI snapshots today can only take full snapshots
    • FS agent will be running inside the cproxy. After CSI snapshot is taken, it will be mounted onto a cproxy pod and the FS agent will read that mount and write to DD.

    The following steps describe the restore workflow when a namespace has been deleted and restore is triggered from PPDM UI.

    CNDM will post the Restorejob custom resource to the Kubernetes API server. RestoreJob custom resource will include

    The namespace asset that needs to be restored

    All the PVC assets in that copyset

    Backup storage location (target).

    PowerProtect Controller that is watching for these custom resource will be notified.

    PowerProtect Controller will create a Velero Restore custom resource to restore

    Namespace resource (note: not the whole namespace, just the namespace resource)

    All the resources needed in order to start restore of PVC

    PowerProtect controller will monitor and wait for Velero restore to complete. Since the provider for the BackupStorageLocation is DataDomain, Velero will invoke the DataDomain Object store plugin to read data to the storage unit.

    Once Velero CR status indicates that the restore has completed, PowerProtect controller will then restore each PVC in the restore job

    cProxy pod will be launched for each PVC

    cProxy read contents from DataDomain and write to the PVC

    PowerProtect Controller will then create a second Velero Restore custom resource to restore all the remaining resources in the namespace excluding namespace reesource, PVC and PV

    PowerProtect controller will monitor and wait for Velero restore to complete.

    PowerProtect controller will update the final RestoreJob status

    Below you can see a demo about how it all looks

    Containers Restore:

    • Training Resources

    The following training assets were created for this release. Additional assets also exist that describe the PowerProtect product capabilities and concepts. These training sessions are not specific to this release but do provide an introduction for those that are new to PowerProtect. For more education assets, search for “PowerProtect” on the education portal.

    PowerProtect Data Manager 19.3 Recorded Knowledge Transfer

    Take this 2 hour and 30 minute On-Demand Class to get an overview of the new enhancements, features, and functionality in the PowerProtect Data Manager 19.3 release.

    Registration Link:   https://education.dellemc.com/content/emc/en-us/csw.html?id=933228438

Leave a ReplyCancel reply