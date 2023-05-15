A post by Kailas Goliwadekar

In the recent past, I’ve been working on AI/ML with PowerFlex as SDS.



My objective was to build a cloud-native artificial intelligence platform made up of Red Hat OpenShift cluster, NVIDIA GPU’s and PowerFlex. So initially, I built PowerFlex 4.0 platform of 4 SDS. On separate PowerEdge nodes, I built an OpenShift BareMetal cluster with 3 master nodes and 4 worker nodes. The entire process of OpenShift deployment was carried out with assisted installer.



Then PowerFlex CSI is deployed on the OpenShift worker nodes that enables the pods to connect with PowerFlex storage.



The logical architecture of OpenShift on PowerFlex is showcased in below figure.







To carry out speech recognition services from NVIDIA, I had to first install GPU operator from OpenShift console. The NVIDIA GPU Operator makes the underlying GPUs of a compute node available to containerized workloads.



A prerequisite for running the GPU operator is the Node Feature Discovery (NFD) Operator, which detects hardware features and system configuration at a node level. After installing the NFD Operator and creating a NodeFeatureDiscovery instance, we can start with installing the NVIDIA GPU Operator and creating an instance of ClusterPolicy.



To deploy the Riva API I performed the following steps on my OpenShift cluster.



export NGC_CLI_API_KEY=<your NGC API key>



export VERSION_TAG=”2.11.0″



helm fetch https://helm.ngc.nvidia.com/nvidia/riva/charts/riva-api-${VERSION_TAG}.tgz –username=’$oauthtoken’ –password=$NGC_CLI_API_KEY tar -xvzf riva-api-${VERSION_TAG}.tgz



In the riva-api folder, I have chosen asr,nlp, and tts to true or false as needed. Also, I changed the service.type from LoadBalancer to ClusterIP. This directly exposes the service only to other services within the cluster.



Enable the cluster to run containers needing NVIDIA GPUs using the nvidia device plugin



helm repo add nvdp https://nvidia.github.io/k8s-device-plugin helm repo update helm install –generate-name –set failOnInitError=false nvdp/nvidia-device-plugin



[root@ocp411-admin Samples]# oc get pods



NAME READY STATUS RESTARTS AGE



nvidia-device-plugin-1683609115-2rh8w 1/1 Running 0 3d6h



nvidia-device-plugin-1683609115-9x9tg 1/1 Running 0 3d6h



nvidia-device-plugin-1683609115-gm642 1/1 Running 0 3d6h



nvidia-device-plugin-1683609115-gpmtm 1/1 Running 0 3d6h



Install the Riva Helm Chart. You can explicitly override variables from the values.yaml file, such as the riva.speechServices.[asr,nlp,tts] settings.



helm install riva-api riva-api/ \



–set ngcCredentials.password=`echo -n $NGC_CLI_API_KEY | base64 -w0` \



–set modelRepoGenerator.modelDeployKey=`echo -n tlt_encode | base64 -w0` \



–set riva.speechServices.asr=true \



–set riva.speechServices.nlp=true \



–set riva.speechServices.tts=true



The Helm chart runs two containers in order: a riva-model-init container that downloads and deploys the models, followed by a riva-speech-api container to start the speech service API. Depending on the number of models, the initial model deployment could take an hour or more. To monitor the deployment, use kubectl to describe the riva-api pod and to watch the container logs.



export pod=`kubectl get pods | cut -d ” ” -f 1 | grep riva-api`



kubectl describe pod $pod



kubectl logs -f $pod -c riva-model-init



kubectl logs -f $pod -c riva-speech-api



Since the Riva service is running now, the cluster needs a mechanism to route requests into Riva. So deploy the open source Traefik edge router.



helm repo add traefik https://helm.traefik.io/traefik



helm repo update



helm fetch traefik/traefik



tar -zxvf traefik-*.tgz



Modify the traefik/values.yaml file. Change service.type from LoadBalancer to ClusterIP. This exposes the service on a cluster-internal IP. Now Deploy the modified traefik Helm chart.



helm install traefik traefik/



An IngressRoute enables the Traefik load balancer to recognize incoming requests and distribute them across multiple riva-api services. When you deployed the traefik Helm chart above, Kubernetes automatically created a local DNS entry for that service: traefik.default.svc.cluster.local. The IngressRoute definition below matches these DNS entries and directs requests to the riva-api service. Create the following riva-ingress.yaml file:



apiVersion: traefik.containo.us/v1alpha1



kind: IngressRoute



metadata:



name: riva-ingressroute



spec:



entryPoints:



– web



routes:



– match: “Host(`traefik.default.svc.cluster.local`)”



kind: Rule



services:



– name: riva-api



port: 50051



scheme: h2c



Deploy the IngressRoute.



kubectl apply -f riva-ingress.yaml



The Riva service is now able to serve gRPC requests from within the cluster at the address traefik.default.svc.cluster.local.



Riva provides a container with a set of pre-built sample clients to test the Riva services.



Create the client-deployment.yaml file that defines the deployment and contains the following:



apiVersion: apps/v1



kind: Deployment



metadata:



name: riva-client



labels:



app: “rivaasrclient”



spec:



replicas: 1



selector:



matchLabels:



app: “rivaasrclient”



template:



metadata:



labels:



app: “rivaasrclient”



spec:



nodeSelector:



eks.amazonaws.com/nodegroup: cpu-linux-clients



imagePullSecrets:



– name: imagepullsecret



containers:



– name: riva-client



image: “nvcr.io/nvidia/riva/riva-speech-client:2.11.0”



command: [“/bin/bash”]



args: [“-c”, “while true; do sleep 5; done”]



Deploy the client service.



kubectl apply -f client-deployment.yaml



export cpod=`kubectl get pods | cut -d ” ” -f 1 | grep riva-client`



kubectl exec –stdin –tty $cpod /bin/bash



[root@ocp411-admin Riva]# oc get all



NAME READY STATUS RESTARTS AGE



pod/nvidia-device-plugin-1683609115-2rh8w 1/1 Running 0 3d7h



pod/nvidia-device-plugin-1683609115-9x9tg 1/1 Running 0 3d7h



pod/nvidia-device-plugin-1683609115-gm642 1/1 Running 0 3d7h



pod/nvidia-device-plugin-1683609115-gpmtm 1/1 Running 0 3d7h



pod/riva-client-668dd7594b-cr68q 1/1 Running 0 2d7h



pod/riva-riva-api-7d5b75687b-4t6kn 1/1 Running 0 2d6h



pod/traefik-6fbf57555d-xw82v 1/1 Running 0 2d8h



NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE



service/kubernetes ClusterIP 172.30.0.1 <none> 443/TCP 4d6h



service/openshift ExternalName <none> kubernetes.default.svc.cluster.local <none> 4d6h



service/riva-riva-api ClusterIP 172.30.6.226 <none> 8000/TCP,8001/TCP,8002/TCP,50051/TCP 2d6h



service/traefik ClusterIP 172.30.246.217 <none> 80/TCP,443/TCP 2d8h



NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE



daemonset.apps/nvidia-device-plugin-1683609115 4 4 4 4 4 <none> 3d7h



NAME READY UP-TO-DATE AVAILABLE AGE



deployment.apps/riva-client 1/1 1 1 2d7h



deployment.apps/riva-riva-api 1/1 1 1 2d6h



deployment.apps/traefik 1/1 1 1 2d8h



NAME DESIRED CURRENT READY AGE



replicaset.apps/riva-client-668dd7594b 1 1 1 2d7h



replicaset.apps/riva-riva-api-7d5b75687b 1 1 1 2d6h



replicaset.apps/traefik-6fbf57555d 1 1 1 2d8h



[root@ocp411-admin Riva]#



Let’s jump on to the demo. First login to the riva-client pod and carry out Riva ASR and Riva TTS tests



[root@ocp411-admin Riva]# kubectl exec –stdin –tty $cpod /bin/bash



root@riva-client-668dd7594b-cr68q:/opt/riva# riva_streaming_asr_client \



> –audio_file=wav/en-US_sample.wav \



> –automatic_punctuation=true \



> –riva_uri=traefik.default.svc.cluster.local:80



I0512 13:00:14.664886 47228 riva_streaming_asr_client.cc:150] Using Insecure Server Credentials



Loading eval dataset…



filename: /opt/riva/wav/en-US_sample.wav



Done loading 1 files



what



what



what is



what is



what is



what is now



what is natural



what is natural



what is natural language



what is natural language



what is natural language



what is natural language



what is natural language Processing



what is natural language Processing



what is natural language Processing



what is natural language Processing



what is natural language Processing



what is language Processing



what is language Processing



What is Natural Language Processing?



———————————————————–



File: /opt/riva/wav/en-US_sample.wav



Final transcripts:



0 : What is Natural Language Processing?



Timestamps:



Word Start (ms) End (ms)



What 840 880



is 1160 1200



Natural 1800 2080



Language 2200 2520



Processing? 2720 3200



Audio processed: 4 sec.



———————————————————–



Not printing latency statistics because the client is run without the –simulate_realtime option and/or the number of requests sent is not equal to number of requests received. To get latency statistics, run with –simulate_realtime and set the –chunk_duration_ms to be the same as the server chunk duration



Run time: 0.1486 sec.



Total audio processed: 4.152 sec.



Throughput: 27.9407 RTFX



I have published a short video of all other demo’s for Riva services. Check it out !







