How to deploy helm charts with VMware Aria Automation 8.18

TL;DR — I built a tiny Docker image that contains a Helm client and a small run script, then invoked that image from VMware Aria Automation 8.18 blueprints to deploy Helm charts (tested with Aria 8.18.2, VCF 5.2.1 and TKG). Code and Dockerfile on GitHub, links below:

RTFA…

I’ve been working with Intel on various AI projects, including benchmark testing and distributed training using Intel’s AMX Accelerator in their 4th Gen and later XEON CPUs, which gave me the opportunity to really dig deep into automating workflows. As we moved forward into deploying AI chatbots, Intel has their Open Platform Enterprise AI (OPEA) that I wanted to automate, but it’s deployed using Helm charts.

While VMware Aria Automation is deployed with helm internally, it doesn’t actually support deploying helm charts as a client. A quick search will tell you to use “ABX”, which is now simply ‘Actions’, but that’s not exactly straightforward either. The general consensus was to have a separate helm client that could run the helm commands. I tried to use pip to run them in Aria Automation’s actions runtime environment, but that failed … miserably (for me, especially, I felt like I was the failure). So I set out to find a better way…

Build my own custom Docker container

If a helm client is needed, why not build my own Docker container with the helm client installed, then pass the charts and other needed information through as environment variables? My initial phase of simply running the docker command with -e or –env and the required variables took some trial and error, as most things do, but it didn’t take long to build out a working Docker container. What does the container do, you might ask. At a high level it runs a script at startup, and that script is the core function of the container. The dockerfile itself includes the script and necessary files, among the needed packages. Links to both files on GitHub are below, but at a high level, here’s what the script does:

Validates environment variables
Logs into Tanzu
Downloads & runs the helm charts
starts ssh (for t-shooting)

As an example, here’s my Habana/Gaudi docker testing:

docker build -t thephuck/docker-helm:gaudi-v0.3 .
docker run -d \
  -e VKS_SUPERVISOR_IP="192.168.0.0" \
  -e VKS_NAMESPACE="supervisor-ns" \
  -e VKS_CLUSTER_NAME="habana-tkc" \
  -e K8S_NAMESPACE="k8s-ns" \
  -e VSPHERE_USERNAME="user@vsphere.local" \
  -e VSPHERE_PASSWORD="VMware1!"
  -e CHARTREPO="gaudi-helm https://vault.habana.ai/artifactory/api/helm/gaudi-helm" \
  -e CHARTNAME="habana-ai-operator gaudi-helm/habana-ai-operator" \
  -e CHARTVERSION="1.21.0-555" \
  thephuck/docker-helm:gaudi-v0.3

docker build -t thephuck/docker-helm:gaudi-v0.3 .

docker run -d \

-e VKS_SUPERVISOR_IP="192.168.0.0" \

-e VKS_NAMESPACE="supervisor-ns" \

-e VKS_CLUSTER_NAME="habana-tkc" \

-e K8S_NAMESPACE="k8s-ns" \

-e VSPHERE_USERNAME="[email protected]" \

-e VSPHERE_PASSWORD="VMware1!"

-e CHARTREPO="gaudi-helm https://vault.habana.ai/artifactory/api/helm/gaudi-helm" \

-e CHARTNAME="habana-ai-operator gaudi-helm/habana-ai-operator" \

-e CHARTVERSION="1.21.0-555" \

thephuck/docker-helm:gaudi-v0.3

I was able to start there to test building a yaml file for Kubernetes deployment via kubectl, which the k8s yaml ends up in the manifest section of the Aria Automation design template, scroll down for the example.

Aria Automation integration

Now that I have a working Docker container to run the helm commands, the next step was integrating into Aria Automation. There are some prereqs:

VCF 5.x (or vSphere 8.0)
- I tested with 5.2.1
Tanzu
- TKG embedded deployed through SDDC Manager
Aria Automation 8.18 Patch 2 recommended (patch 1 required, addresses a bug that impacts deployment of TKG Cluster level resources using Template CCI.TKG)
Cloud Consumption Interface configured

From there I was able to build an Aria Automation Design Template (aka Blueprint) that accepts user inputs for the required variables. Here’s a snippet of the yaml of just the input section:

inputs:
  vSpherePassword:
    type: string
    title: vSphere Password
    description: vSphere password
    encrypted: true
  dockerHelmImage:
    type: string
    title: Docker helm image
    default: thephuck/docker-helm:gaudi-v0.3
  projectName:
    type: string
    title: Existing project name
    description: The helm chart will be deployed into the existing cluster in a project
  k8sNS:
    type: string
    title: Kubernetes namespace
    description: Provide a name to build the deployment inside a k8s namespace
    default: habana-ai-operator
  CHARTNAME:
    type: string
    title: Helm Chart
    description: 'chart name and location, example: habana-ai-operator gaudi-helm/habana-ai-operator'
    default: habana-ai-operator gaudi-helm/habana-ai-operator
  CHARTREPO:
    type: string
    title: Chart Repot
    description: 'Project and URL: gaudi-helm https://vault.habana.ai/artifactory/api/helm/gaudi-helm'
    default: gaudi-helm https://vault.habana.ai/artifactory/api/helm/gaudi-helm
  CHARTVERSION:
    type: string
    title: Chart version
    description: 1.21.0-555
    default: 1.21.0-555

inputs:

vSpherePassword:

type: string

title: vSphere Password

description: vSphere password

encrypted: true

dockerHelmImage:

type: string

title: Docker helm image

default: thephuck/docker-helm:gaudi-v0.3

projectName:

type: string

title: Existing project name

description: The helm chart will be deployed into the existing cluster in a project

k8sNS:

type: string

title: Kubernetes namespace

description: Provide a name to build the deployment inside a k8s namespace

default: habana-ai-operator

CHARTNAME:

type: string

title: Helm Chart

description: 'chart name and location, example: habana-ai-operator gaudi-helm/habana-ai-operator'

default: habana-ai-operator gaudi-helm/habana-ai-operator

CHARTREPO:

type: string

title: Chart Repot

description: 'Project and URL: gaudi-helm https://vault.habana.ai/artifactory/api/helm/gaudi-helm'

default: gaudi-helm https://vault.habana.ai/artifactory/api/helm/gaudi-helm

CHARTVERSION:

type: string

title: Chart version

description: 1.21.0-555

default: 1.21.0-555

Those inputs are then passed through to TKG with the following snippet of the yaml. In this case, the docker-helm object is the last item that gets deployed and pulls all the environment variables either from inputs, or other parts of the deployment template:

  docker-helm:
    type: CCI.TKG.Resource
    dependsOn:
      - helmLB
      - vsphere-password-secret
    properties:
      context: ${resource.k8sns.id}
      manifest:
        apiVersion: apps/v1
        kind: Deployment
        metadata:
          name: docker-helm
          labels:
            app: docker-helm
        spec:
          replicas: 1
          selector:
            matchLabels:
              app: docker-helm
          template:
            metadata:
              labels:
                app: docker-helm
            spec:
              containers:
                - name: docker-helm
                  image: ${input.dockerHelmImage}
                  ports:
                    - containerPort: 22
                      protocol: TCP
                  env:
                    - name: VSPHERE_USERNAME
                      value: ${env.requestedBy}
                    - name: VSPHERE_PASSWORD
                      valueFrom:
                        secretKeyRef:
                          name: vsphere-password
                          key: VSPHERE_PASSWORD
                    - name: VKS_SUPERVISOR_IP
                      value: 192.168.240.2
                    - name: VKS_NAMESPACE
                      value: ${resource.cciNamespace.name}
                    - name: VKS_CLUSTER_NAME
                      value: ${resource.cciTKC.manifest.metadata.name}
                    - name: K8S_NAMESPACE
                      value: ${resource.k8sns.manifest.metadata.name}
                    - name: CHARTVERSION
                      value: ${input.CHARTVERSION}
                    - name: CHARTNAME
                      value: ${input.CHARTNAME}
                    - name: CHARTREPO
                      value: ${input.CHARTREPO}

docker-helm:

type: CCI.TKG.Resource

dependsOn:

- helmLB

- vsphere-password-secret

properties:

context: ${resource.k8sns.id}

manifest:

apiVersion: apps/v1

kind: Deployment

metadata:

name: docker-helm

labels:

app: docker-helm

spec:

replicas: 1

selector:

matchLabels:

app: docker-helm

template:

metadata:

labels:

app: docker-helm

spec:

containers:

- name: docker-helm

image: ${input.dockerHelmImage}

ports:

- containerPort: 22

protocol: TCP

env:

- name: VSPHERE_USERNAME

value: ${env.requestedBy}

- name: VSPHERE_PASSWORD

valueFrom:

secretKeyRef:

name: vsphere-password

key: VSPHERE_PASSWORD

- name: VKS_SUPERVISOR_IP

value: 192.168.240.2

- name: VKS_NAMESPACE

value: ${resource.cciNamespace.name}

- name: VKS_CLUSTER_NAME

value: ${resource.cciTKC.manifest.metadata.name}

- name: K8S_NAMESPACE

value: ${resource.k8sns.manifest.metadata.name}

- name: CHARTVERSION

value: ${input.CHARTVERSION}

- name: CHARTNAME

value: ${input.CHARTNAME}

- name: CHARTREPO

value: ${input.CHARTREPO}

While building a container and posting publicly may not be your desired outcome, I also deployed Harbor and was able to integrate that into Aria Automation so I didn’t have to leverage Docker Hub. To do so, I simply created an FQDN for my Harbor deployment, then provided the URL that included the container image & tag.

Key Components

Starting off, here’s the Dockerfile: https://github.com/ThepHuck/VCFAutomation/blob/main/docker/docker-helm/Dockerfile

That’s pretty self explanatory, but pointing out you will need to have these files available to build the container if you’re using docker build to create your own:

COPY vsphere-plugin/bin/ /bin/ COPY run.sh /root/run.sh

You can also pull the Docker image directly from Docker Hub: https://hub.docker.com/repository/docker/thephuck/docker-helm/general

Which brings me to the next file, the run script: https://github.com/ThepHuck/VCFAutomation/blob/main/docker/docker-helm/run.sh

Now this file is specific to Intel’s OPEA, so you would need to adjust for your needs. As an example, here’s a customized one I used for Habana that leverages Intel’s Gaudi accelerator: https://github.com/ThepHuck/VCFAutomation/blob/main/docker/docker-helm-gaudi/run-gaudi.sh

I like the Gaudi variation of my script because it allows you to provide the specific helm repo, helm chart name, and version, so it’s a little more universal than the OPEA version that actually has two different repos, depending on how you want to deploy it.

I specifically left #set -e commented out in both scripts to allow for better troubleshooting. If you uncomment that line, the script will bomb if there are any errors and won’t start ssh. On that note, you’ll want to remove the ssh line in your own custom Docker image, you don’t want ssh running in the container in production, which you can also remove the loadbalancer object.

Lastly, and most importantly, are the Aria Automation design templates, which can all be found here: https://github.com/ThepHuck/VCFAutomation/tree/main/8.18

I broke the design templates into a few types to help with my deployment & troubleshooting, but having one that deploys just the Kubernetes cluster on top of Tanzu is extremely helpful in allowing a couple things:

Deploy one k8s cluster to host multiple k8s deployments
Rip & replace k8s deployments easily
Separate deployments into their own Supervisor Namespace and subsequent k8s namespace(s)

Here are the files, and what they do:

Deploy AMX VKS Cluster.yaml
- This deploys the actual k8s cluster on top of your Tanzu deployment. It’s named VKS for the new “VMware Kubernetes Service”, but it is purely for Tanzu.
- I’ve used this to create a single control plane node and single worker node for testing, as well as a monster k8s cluster with three control plane nodes and four worker nodes to place 1:1 workers to physical hosts.
- A few key points here:
  - run.tanzu.vmware.com/resolve-os-image: os-name=ubuntu
  - tkr > reference > name: v1.29.4---vmware.3-fips.1-tkg.1
  - Those are required to take advantage of Intel’s AMX accelerator, which I contributed to multiple documents regarding benchmarking and distributed training using AMX accelerators. If your installed version of Tanzu/TKG supports newer TKRs, that’s fine, but you must use the ubuntu OS flavor.
  - BUG ALERT: Check the VM hardware version if you want to leverage AMX. Even though your vSphere version can support hardware version 21, having multiple PVCs (Persistent Volume Claims) causes the VM to be created with HW v17, which does not expose the AMX instructions.
  - The Fix: Edit the VM profiles, going through the wizard, ensuring you select HW v21, and save the profile. Even though the existing profile may have HW v21, it’s not enforced. Editing and saving enforces it. And of course you can create your own custom profiles and use them in your Design Template.
Deploy OPEA to existing cluster.yaml
- This deploys the actual OPEA helm chart to an existing k8s cluster, essentially following the above script.
- It’s a little messy in that there are some if/then statements because Intel’s OPEA can be deployed in a full-stack with an integrated chatbot -OR- as an API backend where you have AMX accelerators (or Gaudi, etc) and can distribute the chatbot frontends externally and leverage the accelerated backend.
Deploy OPEA with new VKS cluster.yaml
- This is exactly as it sounds, it’s a combination of the above two design templates/blueprints.
Deploy Gaudi TKC.yaml
- Exactly as it sounds, effectively the same as the deploy AMX VKS cluster yaml but with a newer tkr.
Deploy Gaudi Operator.yaml
- This one deploys the Habana operator helm chart that leverages a Gaudi accelerator. This one is pretty universal and could be used with other helm charts, given you could supply the helm repo, chart name, and version.

Conclusion

So there you have it, you’re now able to use Aria Automation to deploy helm charts without having to install the helm client locally.

How does this help you? Simple, let’s break it down:

Give users the ability to deploy specific helm charts to consume resources without need DevOps type experience
Provide lease times for test
Provide guard rails for deploying cluster sizes
Repeatable
Easy to consume

Now, a note about VCF 9. Unfortunately, this will not work in VCF 9.0. VCF Automation 9.0 simply cannot deploy containers, nor can it create Supervisor Namespaces through a deployment blueprint. There is a GUI workflow to create the Supervisor Namespace, and you can then use a blueprint to deploy the Tanzu Kubernetes Cluster (or VKC? how about simply k8s cluster?), but it ends there.

ThepHuck

RTFA…

Build my own custom Docker container

Aria Automation integration

Key Components

Conclusion

Leave a Reply Cancel reply