In Kubernetes, there are many objects in the core APIs. One of the most overlooked object types is Events, which are reports on the objects in the API Server. Pod started, scheduled, ReplicaSet scaled are informational events, whereas unhealthy nodes, Pod sandbox errors are considered Warning events, which you may want to get alerts when they occur.
Events have a high volume of activity compared to other objects in the Kubernetes. By default, events have a 1-hour life span and a separate etcd cluster is recommended to ensure scalability. Combined with the inability to search or aggregate, events on their own might not be very useful unless they are exported to external systems. Therefore, we created an open-source project, Kubernetes Event Exporter which watches the events and exports to many systems such as Opsgenie, Elasticsearch, Slack or plain webhooks. We also presented this tool on KubeCon North America 2019 and got a warm response from the community and had great feedback.
When using kubectl describe to get formatted information on an object, events regarding this object is also shown. To be more specific, when you hit kubectl describe my-pod, you will probably see event messages such as Created Container, Pulled the image, Assigned Pod to Node X. You can also see warning events when your Pod has been killed, sandbox changed, has unbound volumes. There are many frequently published informational events. However, there are also some infrequent warning events you might want to be noticed.
Events in Kubernetes are structured objects with the following fields:
- Message: A human-readable description of the status of this operation
- Involved Object: The object that this event is about, like Pod, Deployment, Node, etc.
- Reason: Short, machine-understandable string, in other words: Enum
- Source: The component reporting this event, short machine-understandable string. i.e kube-scheduler
- Type: Currently holds only Normal & Warning, but custom types can be given if desired.
- Count: The number of times the event has occurred
Our open-source tool has a configuration language to filter the events and route them to the various outputs. As a starting point, one can export all the events to Elasticsearch or other analytics tools to see what kind of events are thrown. Counting the events by the reason string can give information about the distribution of the events. There are many hidden data in those events that you can convert to actionable observability. For instance, in one of our development clusters, the workloads are not stable and they are causing generally warning events. In another cluster where we run more stable workloads, the events are regular pod creation, image pull events with some minor warning events. The following is the visualization of events that occurred in those 2 clusters in the 7-day period, grouped by Reason field: