Manage Thousands of Clusters with GitOps and the Cluster API


By Richard Case (@fruit_case), and Charles Sibbald (@casibbald)

At Weaveworks we have been living and breathing GitOps for a few years now. This past year the Cloud Native community really embraced and adopted GitOps for managing application workloads and services on Kubernetes.

But what about managing the actual Kubernetes clusters themselves using GitOps? We asked ourselves that exact question and in this post we introduce using GitOps with Cluster API (CAPI) to manage your actual Kubernetes clusters (ClusterOps anyone?).

The result is that you can declaratively define your clusters and perform operations on the clusters all via Git pull request, as you would the workloads on the clusters. Some of the benefits realized by this include:

  • Clusters can be created on-demand by engineering teams by pull request
  • Operations against clusters are fully audited and attributable (a core GitOps benefit)
  • Makes it easy to have a multi-tenancy approach where each team or group of applications gets their own cluster

At Weaveworks we feel this is an evolutionary step in GitOps. We have been working with a number of customers to implement this in production and at a large scale (think thousands of clusters each with dozens or even hundreds of nodes).

To accomplish this we present a pattern for combining complementary technologies to achieve GitOps for cluster management:

GitOps and CAPI - Overview.pngFigure 1: Cluster API and GitOps Pattern

What is the Cluster API?

In short the Cluster API (a.k.a CAPI) allows you to create and manage your actual clusters declaratively like you are used to with your application workloads.

It's implemented as a number of CRDs and controllers which are grouped into the Cluster API core manager or different types of “providers”:

  • Cluster API Core Manager - this controller manager is responsible for managing the lifecycle of the cluster as a whole. It understands the Cluster, Machine, MachineDeployment and MachineSet resources which are used to declare a cluster without any specific infrastructure details. The specific infrastructure declaration of the cluster is by other resource types that the bootstrap and infrastructure providers understand.These resource instances are referenced from the core CRDs.
  • Bootstrap Provider - the purpose of this provider is to generate a cloud-init script that can be used by the infrastructure providers when creating the machines for the clusters. There can be multiple implementations of this provider and each implementation will have its own CRD. But as it stands today there is only the default provider (CABPK) which uses kubeadm for bootstrapping Kubernetes. This understands the KubeadmConfig CRD.
  • Infrastructure Provider - the purpose of these providers are to provision infrastructure in the target operating environment for the Kubernetes clusters and use the bootstrap configuration from the bootstrap provider. The actual infrastructure provisioned will depend on which provider you use. Each provider will have its own CRDs (infrastructure specific versions of Cluster, Machine, MachineTemplate). For example, for the AWS provider (CAPA) it will provision things like a VPC, NAT Gateway, EC2 Instances and for the vSphere provider (CAPV) it will provision things like virtual machines on a vSphere. There are providers for AWS, Azure, GCP, Packet, vSphere, Metal3, OpenStack and others.

Weaveworks have also implemented their own Cluster API provider as part of wksctl, called the baremetalprovider. It allows you to create a cluster in any public/private cloud or on-premise provided that you supply it a set of IP addresses and an SSH key.

When using the CAPI, in general there are two different types of clusters:

  1. Management Cluster (a.k.a control plane cluster) - as the name implies this cluster is used to manage other clusters using capi. What this means in reality is that the Cluster API CRDs and controllers are applied to this cluster and then you create instances of the CRDs for each of the workload clusters you want to create. This is also the place where the kubeconfig for each of the managed clusters can be retrieved from.
  2. Workload Clusters (a.k.a. tenant cluster) - these are the resulting clusters that are created using Cluster API based on the specification help in the management cluster. The workload clusters are generally used for the application workloads for your company and can provide a good model for tenant separation.

We also have an advanced implementation pattern targeted at multi-cloud (public and/or private) which we call “master-of-masters” (MoM) that introduces a third cluster type. This pattern will be introduced in a later post.

GitOps using Flux

Flux is an operator for Kubernetes that provides a GitOps implementation. It runs in-cluster and implements a GitOps control loop which periodically syncs the running cluster with the desired state kept in Git. If there are differences in the state then Flux will take action to reconcile the state so it matches. For example, if a Deployment instance doesn’t exist in the cluster then Flux will apply the YAML file from Git.

Flux doesn’t care what type of resource it applies. It can be used for the core resources (e.g Deployment, Service) as well as custom resources (e.g. MeshPolicy, SealedSecret, Cluster, Machine). This makes it very powerful in allowing you to define your entire applications’ state.

Additionally it can monitor image repositories. When it detects a new version of an image it automatically deploys the new version by making a change to the desired state of the application in Git. This automatic deployment is optional and configurable based on a set of rules. It’s especially useful for clusters that represent development environments.

Flux is an open source project that was originally developed by Weaveworks. It joined the CNCF as a sandbox project in 2019.

Weaveworks have used Flux as a core part of our Weave Kubernetes Platform (WKP) which brings GitOps to the heart of how Weaveworks believes production ready Kubernetes clusters should be provisioned and operated.

For a full introduction see Stefan Prodan and Alexis Richardson’s session from Kubecon.

Manage CRDs with Helm

Managing Custom Resource Definitions in a very large scale system with hundreds of tenants requiring finite configuration of standard CRD’s goes beyond the limits of what tools like Kustomize are able to provide.

This is where Helm steps in, its ability to manage templates, inject values at deployment is second to none. Helm has a structured release process with versioning and chart hosting that solve all the problems encountered when you get to super massive scale.

Weaveworks has implemented a Helm Operator for Flux allowing for the easy management and provisioning of Helm charts in a Kubernetes cluster. Helm allows organizations of all sizes to manage the components of a deployment in a holistic manner with chart dependencies, versioning and values per environment all catered for and easily driven by Flux.

When working with customers on CAPI-based projects, we tend to wrap the resources that make up a cluster for a specific target environment (i.e. AWS, vSphere) into a Helm chart:

GitOps and CAPI - CRDHelm.pngFigure 2: Cluster API CRDs & Helm

Once this is done then the provisioning of a cluster requires that only the values of the chart be specified in the HelmRelease without having to explicitly define all the resource definitions.

We also, live and breath Continuous Integration and testing. Automating our Helm charts allows us to deliver quality configuration at scale. Such automation is extended to Helm charts and automated testing of our CRD’s to ensure that we always have a working system.

Flux + Cluster API = GitOps for Cluster Management

Using Cluster API, you can declare a cluster with resources just like any other declarative resource in Kubernetes. Flux can be used for GitOps to ensure that the actual state of resources in a Kubernetes cluster matches that of the desired state which is specified in a Git repo (i.e. GitOps control loop). We can combine the two, so that Flux and Cluster API come together in a perfect union to allow you apply GitOps to cluster creation and management.

Blog_ GitOps and CAPI - Flow - updated.png

There are a number of ways the process could work. It is dependant on your technology choices and specific business requirements, but the following is the basic workflow:

  1. Create a management cluster. This can be an existing cluster or it can be a cluster that you create using something like kind.
  2. Install the Cluster API, Bootstrap Provider and any other Infrastructure Providers.
  3. Create a ‘clusters repo’. This is the repository that will hold the desired state of the clusters. You will perform operations against the clusters repo such as cluster creation, manual scaling, cluster deletion via pull request to this repo.
  4. Install Flux in the management cluster and then point it to the clusters repo.
  5. Install the Helm Operator to the management cluster
  6. For each workload/tenant cluster you want to create:
    1. Create an instance of a HelmRelease
    2. Make sure the instance of the HelmRelease points to the chart for your target environment (i.e. AWS, Azure, vSphere…)
    3. Commit the HelmRelease to the clusters repo
  7. Flux will then apply the HelmRelease to the management cluster
  8. The Helm Operator processes the HelmRelease which results in instances of the CAPI CRDs being applied to the management cluster
  9. The CAPI controllers provisions the infrastructure and bootstraps a new Kubernetes cluster.

Ideally, as part of the bootstrapping of the workload/tenant cluster, you would install Flux and the Helm Operator and point it to the repo with the desired state for the application workload for that cluster. Doing so ensures that the new cluster comes up with the applications that are supposed to run on it.

Conclusions

In this post we introduced a pattern that allows you to combine Flux with the Cluster API to enable GitOps for cluster creation and management. This gives you all of the benefits of GitOps (such as audit, attribution, approval and rollback to name a few) during cluster and platform creation.

It also brings consistency in how clusters are declared, provisioned and operated in a multi-cloud environment. Because Kubernetes is bootstrapped the same way (i.e. via Kubeadm bootstrap provider), we can apply profiles to those clusters using WKP and Flux so that a consistent collection of base services are installed on the clusters, like Prometheus or Fluent, before any application workloads are applied.

For a long time cluster provisioning has been the responsibility of a central DevOps/IT team and this is often seen as a bottleneck or area of no control by engineering teams. But with this approach, clusters can be created by engineering teams on demand via a pull request without having to get a dedicated team to provision the cluster for them.

The ease at which the clusters are provisioned (i.e. by approval of a PR) allows for a multi-tenanted approach where every team or application has their own cluster with complete separation. This helps ease any regulatory headaches that some industries like banking feel when working with Kubernetes and the “soft tenancy” of namespaces.

CAPI has the MachineDeployment configuration which is used to effectively handle immutability and upgrades of nodes without needing to modify running nodes, which is a process heavily frowned upon in a GitOps environment. As the machines/nodes are immutable, we can start to easily treat our clusters as cattle and not pets, as we have become used to doing with our workloads running in Kubernetes.

There are a number of variations on the above pattern and areas that easily trip you up. If you need help or advice then Weaveworks have successfully implemented this pattern in production to a large scale.