TL;DR — I built a tiny Docker image that contains a Helm client and a small run script, then invoked that image from VMware Aria Automation 8.18 blueprints to deploy Helm charts (tested with Aria 8.18.2, VCF 5.2.1 and TKG). Code and Dockerfile on GitHub, links below:

RTFA…

I’ve been working with Intel on various AI projects, including benchmark testing and distributed training using Intel’s AMX Accelerator in their 4th Gen and later XEON CPUs, which gave me the opportunity to really dig deep into automating workflows. As we moved forward into deploying AI chatbots, Intel has their Open Platform Enterprise AI (OPEA) that I wanted to automate, but it’s deployed using Helm charts.

While VMware Aria Automation is deployed with helm internally, it doesn’t actually support deploying helm charts as a client. A quick search will tell you to use “ABX”, which is now simply ‘Actions’, but that’s not exactly straightforward either. The general consensus was to have a separate helm client that could run the helm commands. I tried to use pip to run them in Aria Automation’s actions runtime environment, but that failed … miserably (for me, especially, I felt like I was the failure). So I set out to find a better way…

Build my own custom Docker container

If a helm client is needed, why not build my own Docker container with the helm client installed, then pass the charts and other needed information through as environment variables? My initial phase of simply running the docker command with -e or –env and the required variables took some trial and error, as most things do, but it didn’t take long to build out a working Docker container. What does the container do, you might ask. At a high level it runs a script at startup, and that script is the core function of the container. The dockerfile itself includes the script and necessary files, among the needed packages. Links to both files on GitHub are below, but at a high level, here’s what the script does:

  • Validates environment variables
  • Logs into Tanzu
  • Downloads & runs the helm charts
  • starts ssh (for t-shooting)

As an example, here’s my Habana/Gaudi docker testing:

I was able to start there to test building a yaml file for Kubernetes deployment via kubectl, which the k8s yaml ends up in the manifest section of the Aria Automation design template, scroll down for the example.

Aria Automation integration

Now that I have a working Docker container to run the helm commands, the next step was integrating into Aria Automation. There are some prereqs:

  • VCF 5.x (or vSphere 8.0)
  • Tanzu
    • TKG embedded deployed through SDDC Manager
  • Aria Automation 8.18 Patch 2 recommended (patch 1 required, addresses a bug that impacts deployment of TKG Cluster level resources using Template CCI.TKG)
  • Cloud Consumption Interface configured

From there I was able to build an Aria Automation Design Template (aka Blueprint) that accepts user inputs for the required variables. Here’s a snippet of the yaml of just the input section:

Those inputs are then passed through to TKG with the following snippet of the yaml. In this case, the docker-helm object is the last item that gets deployed and pulls all the environment variables either from inputs, or other parts of the deployment template:

While building a container and posting publicly may not be your desired outcome, I also deployed Harbor and was able to integrate that into Aria Automation so I didn’t have to leverage Docker Hub. To do so, I simply created an FQDN for my Harbor deployment, then provided the URL that included the container image & tag.

Key Components

Starting off, here’s the Dockerfile: https://github.com/ThepHuck/VCFAutomation/blob/main/docker/docker-helm/Dockerfile

That’s pretty self explanatory, but pointing out you will need to have these files available to build the container if you’re using docker build to create your own:

COPY vsphere-plugin/bin/ /bin/
COPY run.sh /root/run.sh

You can also pull the Docker image directly from Docker Hub: https://hub.docker.com/repository/docker/thephuck/docker-helm/general

Which brings me to the next file, the run script: https://github.com/ThepHuck/VCFAutomation/blob/main/docker/docker-helm/run.sh

Now this file is specific to Intel’s OPEA, so you would need to adjust for your needs. As an example, here’s a customized one I used for Habana that leverages Intel’s Gaudi accelerator: https://github.com/ThepHuck/VCFAutomation/blob/main/docker/docker-helm-gaudi/run-gaudi.sh

I like the Gaudi variation of my script because it allows you to provide the specific helm repo, helm chart name, and version, so it’s a little more universal than the OPEA version that actually has two different repos, depending on how you want to deploy it.

I specifically left #set -e commented out in both scripts to allow for better troubleshooting. If you uncomment that line, the script will bomb if there are any errors and won’t start ssh. On that note, you’ll want to remove the ssh line in your own custom Docker image, you don’t want ssh running in the container in production, which you can also remove the loadbalancer object.

Lastly, and most importantly, are the Aria Automation design templates, which can all be found here: https://github.com/ThepHuck/VCFAutomation/tree/main/8.18

I broke the design templates into a few types to help with my deployment & troubleshooting, but having one that deploys just the Kubernetes cluster on top of Tanzu is extremely helpful in allowing a couple things:

  1. Deploy one k8s cluster to host multiple k8s deployments
  2. Rip & replace k8s deployments easily
  3. Separate deployments into their own Supervisor Namespace and subsequent k8s namespace(s)

Here are the files, and what they do:

  • Deploy AMX VKS Cluster.yaml
    • This deploys the actual k8s cluster on top of your Tanzu deployment. It’s named VKS for the new “VMware Kubernetes Service”, but it is purely for Tanzu.
    • I’ve used this to create a single control plane node and single worker node for testing, as well as a monster k8s cluster with three control plane nodes and four worker nodes to place 1:1 workers to physical hosts.
    • A few key points here:
      • run.tanzu.vmware.com/resolve-os-image: os-name=ubuntu
      • tkr > reference > name: v1.29.4---vmware.3-fips.1-tkg.1
      • Those are required to take advantage of Intel’s AMX accelerator, which I contributed to multiple documents regarding benchmarking and distributed training using AMX accelerators. If your installed version of Tanzu/TKG supports newer TKRs, that’s fine, but you must use the ubuntu OS flavor.
      • BUG ALERT: Check the VM hardware version if you want to leverage AMX. Even though your vSphere version can support hardware version 21, having multiple PVCs (Persistent Volume Claims) causes the VM to be created with HW v17, which does not expose the AMX instructions.
      • The Fix: Edit the VM profiles, going through the wizard, ensuring you select HW v21, and save the profile. Even though the existing profile may have HW v21, it’s not enforced. Editing and saving enforces it. And of course you can create your own custom profiles and use them in your Design Template.
  • Deploy OPEA to existing cluster.yaml
    • This deploys the actual OPEA helm chart to an existing k8s cluster, essentially following the above script.
    • It’s a little messy in that there are some if/then statements because Intel’s OPEA can be deployed in a full-stack with an integrated chatbot -OR- as an API backend where you have AMX accelerators (or Gaudi, etc) and can distribute the chatbot frontends externally and leverage the accelerated backend.
  • Deploy OPEA with new VKS cluster.yaml
    • This is exactly as it sounds, it’s a combination of the above two design templates/blueprints.
  • Deploy Gaudi TKC.yaml
    • Exactly as it sounds, effectively the same as the deploy AMX VKS cluster yaml but with a newer tkr.
  • Deploy Gaudi Operator.yaml
    • This one deploys the Habana operator helm chart that leverages a Gaudi accelerator. This one is pretty universal and could be used with other helm charts, given you could supply the helm repo, chart name, and version.

Conclusion

So there you have it, you’re now able to use Aria Automation to deploy helm charts without having to install the helm client locally.

How does this help you? Simple, let’s break it down:

  • Give users the ability to deploy specific helm charts to consume resources without need DevOps type experience
  • Provide lease times for test
  • Provide guard rails for deploying cluster sizes
  • Repeatable
  • Easy to consume

Now, a note about VCF 9. Unfortunately, this will not work in VCF 9.0. VCF Automation 9.0 simply cannot deploy containers, nor can it create Supervisor Namespaces through a deployment blueprint. There is a GUI workflow to create the Supervisor Namespace, and you can then use a blueprint to deploy the Tanzu Kubernetes Cluster (or VKC? how about simply k8s cluster?), but it ends there.