Anthos on Bare Metal

By Gokul Chandra

As the cluster above is deployed using hybrid mode the cluster comprises Admin cluster machinery.

Cluster API components:

Admin Cluster — CAPI Components

BareMetalCluster CRD:

Admin Cluster — BareMetalCluster CRD

BareMetalMachine CRD (as this is a hybrid cluster, the NUC’s are added as objects):

Admin Cluster — BareMetalCluster CRD

Connecting Anthos Clusters on Bare Metal to Google Cloud

Anthos clusters on bare metal connects the clusters to Google Cloud. This connection lets users manage and observe the isolated clusters from the Cloud Console by using Connect. As part of cluster deployment, a connect agent is deployed in the cluster and ‘bmctl’ utility automatically creates required service accounts (if more control is needed users can manually create connect-agent, connect-register and logging-monitoring service accounts) required for establishing a connection and registers the cluster to Google Cloud GKE Hub.

Connect is an Anthos feature and not specific to Anthos bare metal which allows users to connect any of the Kubernetes clusters to Google Cloud irrespective of where it is deployed. This enables access to cluster and workload management features, including a unified user interface, Cloud Console, to interact with the cluster. Connect allows Anthos Config Management to install or update the Connect in-cluster agent and observe the sync status. Connect also lets the metering agent observe the number of vCPUs in a connected cluster.

This is pivotal for a topology where applications on far edge-sites in isolated network spaces are managed from a central location (HQ/Regional IT operations center etc.)

The Connect Agent can be configured to traverse NATs, egress proxies, and firewalls to establish a long-lived, encrypted connection between the isolated bare metal cluster’s Kubernetes API server and Google Cloud project.

connect-agent deployed on the bare metal cluster :

GKE Connect Agent on Anthos Cluster on Bare Metal

The anthos-creds namespace holds required secrets to connect with GCP services.

Secrets for GCP Services Connectivity

Anthos portal showing the registered cluster (before sign in), the cluster information on the right panel will not be shown until the users log-in :

Anthos Portal Showing Registered Cluster

Kubernetes Engine portal showing the registered cluster (before sign in) :

Kubernetes Engine Showing Registered Cluster

Users can use the Google Cloud Console to sign in to registered clusters. This can be done in three ways: using basic authentication or a bearer token or using an OIDC provider. In the scenario below an admin user is created associated with cluster-admin role and the token is used for log-in (users can create a cluster read-only role based on the level of operation required).

Logging-in to Registered Cluster using Service Account

Logging-in to cluster using token :

Logging-in to Registered Cluster using Service Account Token

As shown below, once the log-in is successful all the management options will be enabled and cluster information will be displayed. The info also shows the GKE Hub Membership ID, users can use ‘gcloud container hub memberships list’ to list all clusters connected.

Kubernetes Engine Showing Registered Cluster

Anthos portal showing cluster information and list of cluster features :

Anthos Portal Showing Registered Cluster

Managing Workloads on Anthos Clusters on Bare Metal

Once the connect agent connects the cluster, users can use Kubernetes Engine portal as a dashboard to access all Kubernetes components and manage workloads on the private Anthos on bare metal cluster as if they are on GKE.

Anthos dashboard enables users to deploy Anthos related components (Features) like ACM, Service Mesh, etc. (provides a catalog of features and instructions) there is limited functionality to just verify the connectivity of the cluster and high-level details like cluster_size and nodes in the portal. Kubernetes Engine portal is the main cluster management entity that enables users to operate on core Kubernetes functionalities like create, edit, update and delete objects, access configuration objects (secrets/configmaps), deploy applications from the marketplace, browse objects and manage/list storage elements like PV/PVC.

Workloads — Kubernetes Engine

Users can edit any Kubernetes object and the same will be reflected on the bare metal cluster.

Edit/Update Workloads — Kubernetes Engine

Accessing cluster Services and Ingress :

Services and Ingress — Kubernetes Engine

Anthos clusters on bare metal uses the local volume provisioner (LVP) to manage local persistent volumes. Three types of storage classes are created for local PV’s in an Anthos clusters on bare metal: LVP Share (creates a local PV backed by subdirectories created during cluster creation in a local, shared file system), LVP node mounts (creates a local PV for each mounted disk in the configured directory), Anthos system (creates pre-configured local PV’s during cluster creation that is used by Anthos system pods).

Accessing PVC’s and available Storage Classes:

Storage — Kubernetes Engine

Anthos portal facilitates users to enable supported features on the connected clusters :

Enabling Features on Registered Cluster — Anthos

Monitoring and Logging

Logging and metrics agents are installed and activated in each cluster when users create a new admin or user cluster. Stackdriver operator manages the lifecycle for all other Stackdriver related components (log-aggregator, log-forwarder, metadata-agent and prometheus) deployed onto the cluster. Users can setup a Cloud Monitoring Workspace within the Cloud project using the monitoring portal to access logs of all system and Kubernetes components in the remote bare metal clusters on the console.

Monitoring and Logging — Stackdriver

The Stackdriver collector for Prometheus constructs a Cloud Monitoring MonitoredResource for the Kubernetes objects from well-known Prometheus labels. A separate entity ‘stackdriver-prometheus-app’ is deployed with the stack and can be configured to monitor the applications in the cluster.

Monitoring and Logging — Stackdriver Prometheus

Google Cloud’s operations suite is the built-in observability solution for Google Cloud. It offers a fully managed logging solution, metrics collection, monitoring, dashboards, and alerting. Cloud Monitoring uses Workspaces to organize and manage its information.

Monitoring and Logging — Workspaces

Integrated logging:

Monitoring and Logging — Workspaces

Akri — A Kubernetes Resource Interface for the Edge

Akri enables discovery and exposing heterogeneous leaf devices as resources and creates services for each device in Kubernetes clusters. This enables applications running on Kubernetes to consume the inputs from the devices. Akri handles the automatic inclusion and removal of devices, as well as the allocation and deallocation of resources to better optimize the clusters.

Akri is a standard Kubernetes extension implemented using two custom resource definitions (configuration and instances), an agent, a controller. Once installed, users can write an Akri definition for the device, the device plug-in acts as an agent, finding hardware that matches the definitions using supported discovery protocols, before using the Akri controller to manage it through a broker running in a Kubernetes pod that exposes the device and its APIs to your application code.

Akri Architecture

Akri is based on the Kubernetes device plugin framework, which provides vendors a mechanism called device plugins which can be used to advertise devices, monitor devices (like health checks), hook the devices into the runtime to execute device specific instructions (e.g: Clean GPU memory, capture video etc.) and make the device available in the container. This enables vendors to advertise their resources and monitor them without writing additional code.

Device Plugin Framework

Akri currently supports protocols: udev (to discover anything in the Linux device file system), ONVIF (to discover IP cameras) and OPC UA (to discover OPC UA Servers). Protocols like Bluetooth, Simple scan for IP/MAC addresses, LoRaWAN, Zeroconf etc. are in the roadmap.

In this post, udev (userspace /dev) protocol is used to discover USB cameras connected to two nodes of the Anthos on Bare metal Kubernetes cluster.

USB Camera connected to NUC

Udev manages device nodes in the /dev directory, such as microphones, security chips, usb cameras, and so on. Udev can be used to find devices that are attached to or embedded in nodes. Akri’s udev discovery handler parses udev rules listed in a Configuration, searches for them using udev, and returns a list of discovered device nodes (ie: /dev/video0). User can configure Akri which device(s) to find by passing udev rules into a Configuration spec.

A user can also allow multiple nodes to utilize a leaf device, thereby providing high availability in the case where a node goes offline. Furthermore, Akri will automatically create a Kubernetes service for each type of leaf device (or Akri Configuration), removing the need for an application to track the state of pods or nodes.

For example, the following is the information of the attached USB camera using ‘udevadm’ on one of the nodes: