Kubernetes Podcast from Google

By Kubernetes Podcast from Google

CRAIG BOX: No, I don't even know if they have Wendy's here. They have a version of it in New Zealand, but I've not seen it in the UK-- not where I live, at least. But tell me about the new Wendy's role-playing game.

ADAM GLICK: It's "Feast of Legends". And at first, I thought this was a joke. But I went and looked at it. We've got a link in the show notes. And if you're a tabletop game player, it is in the spirit of Dungeons and Dragons. So you have an entire quest, and you have manuals. You have characters that you build up. And it's all, of course, in this Wendy's built land. The manual actually has some really nice art in it. Someone has done some really nice job on that. And the manual is almost 100 pages long. So they put a tremendous amount of work into this.

I was like, wow. It always impresses me when I take a look at what fast food chains have done with games over the years to try and turn them into things to help drive awareness and sales. Some of you may have seen that KFC came out with a dating simulator a month or two ago.

CRAIG BOX: Really?

ADAM GLICK: Yeah, so if you ever wanted to try and date the Colonel, I guess that's a way you can try that out. For those that remember the Xbox 360, Burger King had a series of 360 games that they released. I think there were four of those. You had "MC Kids" for Nintendo, "Donald Land," which was also a McDonald's one-- I think that one was only available in Japan-- Taco Bell's "Tasty Temple Challenge." You can even go all the way back to the McDonald's board game, which is old enough to the days of before you had points inflation where you would win by getting 25 points.


ADAM GLICK: Which these days isn't even picking up your first item in most games.

CRAIG BOX: How much research went into this episode?

ADAM GLICK: [LAUGHS] Years and years of video game playing. This is like the culmination of my teenage years. I must say that I did stumble across something that I have not seen yet. I saw a video of it. It's not publicly released, but apparently KFC has a virtual escape room challenge that they use to train people on how to run the fryer and various things there. And I'd be really curious to see what it's like because the video of it makes it look very, very ominous, but really interesting.

CRAIG BOX: I'm not sure that Kubernetes and KFC have much more in common than the letter K, but if any of our listeners are corporate headquartered members of the KFC ecosystem, please get in touch. Adam would love to know more.

ADAM GLICK: Oh, I'd love to try that out. Absolutely.

CRAIG BOX: I remember playing a DOS game in the '90s called "Cool Spot." And it was a "Sonic the Hedgehog" like platformer. But yes, absolutely, it was sponsored by the dot in the middle of the 7 Up.

ADAM GLICK: Oh, if you get into the ones that soda companies have done, Pepsi has done several of them. They did "Pepsi Invaders," which was a "Space Invaders" knockoff. And then they later had another Pepsi challenge one. Once you expand out to other large consumer brands, it is surprising how big it is. Although it doesn't always work out. There are very few of them that you can think of when people look back, and they're like, wow, that was incredible. But that day may come.

CRAIG BOX: Just "Cool Spot."

ADAM GLICK: Except for the "Stranger Things" one. I will still stand behind the "Stranger Things" RPG. That's a free game you can go download in the app stores. That was surprisingly good. I thought they did a great job with that.

CRAIG BOX: Well, you can find a lot of links in the show notes, but until then, let's get to the news.


CRAIG BOX: Rancher Labs, the topic of Episode 57, has released version 2.3 of their namesake Rancher Kubernetes platform. It offers three headline features-- cluster templates for secure and consistent deployment, management of Istio, which they highlight as the leading service mesh solution, and support for Windows containers. Rancher mentioned that they worked with Microsoft to build this capability and were the first vendor to bring Windows nodes to general availability.

ADAM GLICK: That unique selling point was not to last long, though, as within four hours, Amazon had announced that Windows containers were now GA on EKS. Support is available in all regions that EKS operates in. If you want to learn more about Windows containers, please check out Episode 70 with Patrick Lang.

CRAIG BOX: DigitalOcean has announced that their Kubernetes service has now been updated to provide horizontal cluster autoscaling to bring new nodes on and offline as needed based on CPU utilization. Kubernetes 115 is also now available, as well as security access via API tokens and a marketplace with one-click apps.

ADAM GLICK: Elastic has announced a version 1.0.0-beta1, with two ones, two zeros, and one beta, of their Elastic Cloud on Kubernetes. Elastic has opted to build the offering on StatefulSets, a wise choice for a data tier provider, as it helps them provide for faster upgrades and configuration changes. Elastic notes-- it's far easier to reuse a persistent volume than it is to replicate all of your data into a new cluster for each upgrade.

Elastic has also introduced a new API, v1beta1, with two ones and one beta, which will require you to recreate your cluster to use the new features. Additional improvements include the ability to turn off Elastic's Java-based node-to-node encryption if you already have mutual TLS enabled via service mesh.

CRAIG BOX: API management company MuleSoft has released the AnyPoint Service Mesh. The new product is built on Istio and provides discoverability, resilience, optimization, and security of microservices through the single control plane of MuleSoft's AnyPoint platform. In an interview with "Container Journal," MuleSoft's office David Chao said that the company decided to embrace Istio because it is becoming a de facto standard. Chao further said that he believes the need to govern and integrate microservices may very well push organizations to adopt Kubernetes in order to run Istio.

ADAM GLICK: Linkerd has announced version 2.6 with support for distributed tracing as its headline feature. Commercial sponsor Buoyant does call out that it's a fairly complex feature and has posted a separate article on how to configure it.

CRAIG BOX: Cloud 66 has introduced Trackman, an open source tool and Go library for running multiple commands in a workflow. Trackman is a way to run commands, check they ran fine, and then run a next step. It can also run steps in parallel, define dependencies between steps, and handle pre-flight checks before running a step. You'll get to 10,000 steps per day in no time. Trackman powers the installation of Cloud 66's Skycap delivery platform, including things like database migration jobs.

ADAM GLICK: Puppet has announced the public beta of their Project Nebula Workflow Tool Set for cloud native and serverless applications. It comes with built-in templates for 20 common Cloud Native projects, including Helm, Kube Control, Terraform, and more. The tools also provide visualization of your workflow and the ability to check your configurations into source code. Although it's a public beta, Puppet's website still requires a sign-up form for access to the project.

CRAIG BOX: The schedule for the Kubernetes Contributor Summit at KubeCon San Diego has been announced. The topics cover a wide range, from the vision for the architecture of Kubernetes to how to help write docs. There is something for everyone who contributes. If you're interested, registration for the Active Contributor Track is still open, though there is a list for the New Contributor Workshop.

ADAM GLICK: The CNCF have posted a blog this week by Mohamed Ahmed talking about capacity planning for your Kubernetes clusters. The article covers persistent storage, host port, configuration, and resource dependencies, as well as pod priority settings and preemption. Additionally, this week, the CNCF published a new case study about how Booz Allen Hamilton works with the US government to modernize their infrastructure using Kubernetes. In particular, the case study talks about the relaunch of recreation.gov on Kubernetes.

CRAIG BOX: If you don't want to use your cloud provider's load balancer, you have many options for installing your own Ingress. How many? Flant.com came up with 11, ranging from the default Kubernetes NGINX installation to four different Ingress controllers powered by Envoy. With a nod to the comparison site kubedex, they published a spreadsheet showing the feeds and speeds of each, as well as their own selection and how they came to make it.

ADAM GLICK: If you're curious how to manage Kubernetes clusters at scale, Henning Jacobs has posted a blog talking about how Zalondo manages over 140 of them. The article talks through some of their best practices, like avoiding pet clusters and having a pre-production testing environment that mirrors production by deploying clustered updates to both at the same time. They also avoid manual interactions and leverage auto scaling. Henning talks about cluster lifecycle management, how they avoid configuration drift, and how they monitor their clusters. Finally, he espouses the benefit of end-to-end testing and the use of frequent updates.

CRAIG BOX: Another method of managing clusters is the cluster API, and VMware is all over it. They are using the cluster API to power their new project, Pacific, and have written two blog posts on the topic this week. The first talks about new features in the recent v1alpha2 and how the community is evolving. The second talks about how the pattern of pod and replica sets maps to machines and machine sets and how VMware is implementing this into vSphere. Meanwhile, SIG cluster lifecycle is hard at work on a v1alpha3, with commits for that starting to land in the last week.

ADAM GLICK: Red Hat senior director of OpenShift and creator of the OpenShift Kubernetes "Wild West" video game Grant Shipley left Red Hat/IBM this week for a job at VMware. Grant was a longtime Red Hat employee, having spent almost 15 years at the company, and was an important leader on the OpenShift project. His new role looks to be working on Kubernetes and Tanzu strategy for VMware.

CRAIG BOX: SUSE has announced they are sunsetting their OpenStack business in favor of application delivery, and specifically their Kubernetes platform. This change of direction comes 12 weeks after hiring a new CEO and four months after its last OpenStack release. Customers will be supported until the end of their contract, but the product is no longer available for sale and will not be updated any further.

ADAM GLICK: Finally, at SAP Tech Ed in Barcelona this week, SAP demoed their in-memory HANA database running in Kubernetes. This is positioned as part of how SAP will bring their software to multiple clouds and is also a focus for how HANA will be part of a platform for both SAP cloud applications, as well as third party application data layers.

CRAIG BOX: And that's the news.


CRAIG BOX: James Munnelly is a product architect at Jetstack and the founder of the cert-manager open source project. Welcome to the show, James.

JAMES MUNNELLY: Thanks for having me on.

CRAIG BOX: We met just before you joined Jetstack, when your company was working on Kubernetes project with Google Cloud. What was the project you were working on?

JAMES MUNNELLY: We were running a large shopping comparison website and/or a series of websites. And we were looking to move from another cloud over to GKE. This was around the time of Kubernetes, I think, it was 0.19, 0.20, when I first started touching that.

CRAIG BOX: One of my favorite releases.

JAMES MUNNELLY: Yeah. I think it was just before 1.0, so it was getting there, definitely. And we worked with yourselves and a few of the solutions architects to get us moved over and up and running. I think in the process, we saved a fair ton of money.

CRAIG BOX: Brilliant, and that led you to working for a consulting company in the Kubernetes space. Who are Jetstack?

JAMES MUNNELLY: I started working for Jetstack. We're a UK-based consultancy, focused entirely on Kubernetes. So Jetstack was started about four years ago. The two Matts [Matt Barker and Matt Bates], who some may have heard of, they kind of took a bet on Kubernetes becoming the orchestrator that grew into what it is today. So we offer subscription support services, general purpose consultancy, training, and open source software engineering of all descriptions. And we work with all sorts of companies in the space with the move to Kubernetes.

CRAIG BOX: As you mentioned, Jetstack is primarily a consulting company. But your role as a product architect, you actually work full time on open source. Starting out as a small consulting company, how did you get to the point where that was possible?

JAMES MUNNELLY: It's taken time, definitely. Especially when I joined the company, we were only three or four people. So obviously, at that point, it wasn't possible to have anyone full time, just working on open source outward facing projects. So I think the cert-manager project itself came about after some other similar projects in the space kind of grew that my colleague Christine worked on-- kube-lego and the like.

I think just over time as the project grew and the number of people getting involved with it grew, it became more and more like we needed to do something about this thing. And we're seeing more and more opportunities in what we can do with it as a company and I think initially start to deliver value just through the number of candidates who were coming to interview with us and saying, oh, we know you because of the cert-manager project. And that really gave it a leg up, I think, and started to make the two Matts at least realize that this is actually something out there significant and worthwhile.

CRAIG BOX: So the cert-manager project is your full-time job at this point?

JAMES MUNNELLY: I work full time on cert-manager. I do a few bits internally managing some other products as well that we've got going on and incubating. But for the past about year or so, it's been pretty much me full time on there. And my colleague Josh and a number of other people over there are also working on the project. So Josh works pretty much full time, three or four days a week with me, too.

CRAIG BOX: Great, so what exactly is cert-manager?

JAMES MUNNELLY: Two years ago, we kind of recognized that Kubernetes didn't have a particularly strong certificate management story. There's all sorts of certificate management tools out there. Large and small companies alike use them, because it's a hard problem managing x509 TLS certificates. So we kind of wanted to make that simpler to do in Kubernetes.

We're seeing all of these new resources making it possible to manage all sort-- like, StatefulSets came out, making it easier to manage Stateful Services and more finicky things in Kubernetes. We saw the opportunity to represent CAs and so on in Kubernetes as well. So that's where it was born out of. At the same time, we had LetsEncrypt came about a few years ago.

CRAIG BOX: There's a lot of terminology relating to security and a lot of things you have to make sure you get right. For someone who might be newer to the space, how does TLS encryption work at a broad level?

JAMES MUNNELLY: It's obviously a hot topic at the minute, what with all sorts of breaches and everything else you're hearing online. So TLS really relies on, first of all, a root of trust - the idea of root CAs. So you may have heard of this before, but your laptop, your operating system, is going to come with a set of root CAs.


JAMES MUNNELLY: A Certificate Authority. And these basically are trusted to sign other certificates. So they're basically trusted with the permission to say this other person is who they say they are. So everyone's got root CAs installed on their laptops. When it comes to requesting a certificate, though, you go along and you request a certificate, and you say I own the domain name Google.com, or something like that. It's then up to that Certificate Authority to go and somehow verify that you are the owner of Google.com.

And there are all sorts of ways that happens. And historically, there's been people faxing business documents and so on with letterheaded paper. Obviously, nowadays, it's a little bit more advanced. But you then get your certificate that says that you are Google.com. And that certificate has a corresponding private key. So this is where the asymmetric-key cryptography stuff comes in.

And you have to keep that safe because that private key basically allows you to use that certificate and publish that to users so that when they come to your website, you send them the certificate. You also encrypt some data using your private key and send that along with it, and that validates that you are, in fact, the holder of that certificate. And so your browser can give you that green padlock to say you are Google.com or so on.

CRAIG BOX: So we've talked about this in the context of certain web pages, and that is probably the most common case where people have come across SSL or TLS. Like, say if I need to prove who I am, I would go to thawte.com and give Mark Shuttleworth $20, and then he would look me up and then prove I was who I said I was and so on, and eventually end up in space.

But now we're also talking about the components of Kubernetes need to have encryption between them. And presumably, it's not possible for things that run in this private environment to go off and get certificates, and especially if some human intervention is required. How has certificate issuance moved from being a manual to an automated process?

JAMES MUNNELLY: What you're saying there about needing to manage your own internal certificates and so on, that is one of the big features that we're trying to make easier with cert-manager. So I mentioned the root CAs, the root Certificate Authorities that you trust on your laptop. Obviously, if you're creating an internal service and trying to secure that, be that Kubernetes nodes or service meshes and whatever else, you're going to define your own root CA.

So that root isn't then installed on everybody's laptops. It'll be your private infrastructure or maybe your developer's laptops that will trust your particular internal root. And that typically starts with kind of self-signing a root certificate there. And then off of there, issuing certificates just in the same way. Someone ultimately, in some form, needs to validate that you are the person who is meant to get this certificate.

So it might be that you're asking for store or accounts.example.com. Your security team is likely to then go and double check that you are on the accounts team or something like that. So that's kind of the way that the internal CAs work. And these are used, like you mentioned, for Kubernetes, for all sorts of different areas in the space.

CRAIG BOX: Every Kubernetes cluster has its own Certificate Authority for encryption between the nodes and the master, for example.

JAMES MUNNELLY: Yeah, exactly. I think, in fact, there's at least three different Certificate Authorities involved in that process.

CRAIG BOX: Cert-manager helps with the process of getting public certificates from an external Certificate Authority. And LetsEncrypt was one of the first systems that made it possible to do this automatically and also for free, which means that we can do this without having to go and put our credit card detail into a form. What exactly is LetsEncrypt?

JAMES MUNNELLY: LetsEncrypt is an organization created in about 2014, I think. It involved a few developers from Mozilla, as well as the ISRG, the Internet Security Research Group. It's now got a number of others involved there, too. They basically were formed because, as we said, more and more websites are springing up. Security is becoming-- it's obviously essential. And up until recently, or up until LetsEncrypt, it's been required that you do go and hand some money to someone to go and get a certificate.

And because of the nature of that market, because there's a monetary element, you don't want to be spending that every month or so. You tend to get longer periods. So you buy a certificate each year and so on. And that process and having to do-- like, involving money in this has actually made it so that people are less likely to use TLS certificates. Because they need to pay for them. They then get a year-long certificate, and then they have to renew that. And everything becomes quite cumbersome.

And it's always been quite manual up until LetsEncrypt came about. So it would involve clicking on CA portals and things and clicking through to go and request the certificates you need. And given how essential proper handling of these certificates is, that's a real problem. And I think the folks over at LetsEncrypt recognized this and first set up, obviously, the CA. So they got their own root CA, root Certificate Authority trusted by end users.

And then they kind of took that a step further and looked at that manual process where someone has to go and log into the dashboard, and hit Request Certificate, and said, well, a lot of this can be automated. We've got systems to do this sort of thing now. And that's where the ACME protocol came about. So ACME comes from LetsEncrypt, originally.

CRAIG BOX: LetsEncrypt uses a protocol called ACME, which was either named by Wile E. Coyote, or is the Automated Certificate Management Environment protocol. What exactly is ACME?

JAMES MUNNELLY: That's kind of defining a standardized way. It's now a RFC. So it's now an actual documented standard. But it's a way of obtaining certificates from a server, and it defines the ways that, first of all, people validate themselves, so how you register an account with the server, but then also how you say I want to request a certificate for example.com.

And then beyond that, it goes a step further and it also defines ways to have that proving step where I said in the past, they may have sent off some kind of letterheaded paper to show, oh, look, we're Google.com LLC, or whatever it is. It also defines a mechanism to validate those. So that's where you hear about HTTP-01 and DNS-01 validation, and I think a few others.

And that's basically through-- with HTTP-01, you'll go and present a file on your web service so that when LetsEncrypt come along and, say, hit a particular endpoint on your website, you return with a particular secret key. And that process in and of itself validates that you own the domain and that you can have that certificate.

It's a similar thing with DNS-01 where you go and present a particular record. LetsEncrypt will come along and check that, and if it's there, they'll give you a certificate. And that process itself is then being audited and checked by various people involved in maintaining the security and reliability internet to make sure that that is, in fact, a secure process. So it's gone through that order and it's gone through that check. And this is now a protocol that anyone, other CAs, can also implement. So they've developed a standard there.

CRAIG BOX: Once you have proven that you control the domain by being able to post a file or create a DNS entry, the Certificate Authority will sign the certificate for you, and that will allow you to publish content securely on that site. How does cert-manager represent that in terms of Kubernetes contents?

JAMES MUNNELLY: First of all, we introduce what we call an Issuer. So that is something that issues certificates. So it's a Certificate Authority of some description. So that's kind of like the first thing that we see there, and that's us introducing the concept of CAs into Kubernetes. We then have corresponding certificate resources. So similar to how you'd create your pod or your replica set, we introduce the certificate as a custom resource in Kubernetes.

So this is an extension to Kubernetes, so you can just go ahead and create that like any other resource. Certificates have what we call like an issuer reference. So you say I want this certificate to come from LetsEncrypt, or I want this certificate to come from my internal CA. And that certificate resource will kind of detail the common name on the certificate, the DNS names, like the domains that it should be valid for.

And there's all sorts of other fields on there. We try and mirror as much as possible of the x509 specification, within reason, because it's quite an unwieldy specification. But yeah, so the certificate resource, it looks a lot like what you'd see on any other x509 certificate generator that you may have used before.

CRAIG BOX: And you mentioned the ability to also create self-signed certificates. If you suggest that the issuer should self-sign a certificate, what tooling does it use to issue that?

JAMES MUNNELLY: I'm glad you asked because I think quite often cert-manager gets seen as the LetsEncrypt controller, and it absolutely is. It does do LetsEncrypt very well, and it's kind of where the project was born. But no, we also support maintaining and building your own self-signed root CA. We don't shell out to open SSL or CFSSL. We rely solely on Go's crypto library to generate and manage those root CAs and certificates.

But yeah, you can then go and build your own certificate hierarchy using a self-signed root that cert-manager generates and manages and renews for you and then issue your certificates off of there. And that's in exactly the same way as you would with, say, LetsEncrypt ACME certificate. We're trying to standardize that API so that certificates means one thing in Kubernetes.

CRAIG BOX: If Kubernetes already has internal Certificate Authorities, could you integrate cert-manager with those?

JAMES MUNNELLY: Theoretically, it is possible to use something like cert-manager to manage those CAs. Practically, though, you'd get a bit of a chicken and egg problem, like a bootstrap problem, where you're going to need cert-manager to be running to run Kubernetes, but you're going to need Kubernetes running to run cert-manager. So that can definitely be a bit of a problem.

It depends on exactly what part of Kubernetes you're trying to secure. So if anyone out there is kind of extending Kubernetes with validating webhooks, mutating webhooks, API services, or any of these sorts of resources, that's a really great case for cert-manager to come in and help out. And in fact, the Kubebuilder project, when it stands up and generates all these validating webhooks for you to build your own controller with, it also injects a few extra resources that you can optionally use to secure those using cert-manager, because these require TLS to run, too. Because they expose some kind of an HTTP service.

So it's not something that you can today go and secure your entire TLS stack for your Kubernetes cluster itself with cert-manager. But there's a number of layers that you can definitely secure, and you can also configure it, at the very least, to monitor those certificates for you. So whilst it may not be able to issue them, if you configure things correctly, you can have it watching and exposing Prometheus metrics and alerts and so on about those certificates.

CRAIG BOX: Now another situation where you have the chicken and egg problem is when you're installing cert-manager. So you mentioned validating webhooks. There is a webhook provided by cert-manager which checks the data certificate resource that you are asking to create is correct. But the certificate issued for that needs to be issued by cert-manager, which isn't running yet. How do you get around that problem?

JAMES MUNNELLY: That is a particularly ugly problem. We provide a really good solution for other people to secure their validating webhooks, but then we run into a bit of an issue with our own one.


JAMES MUNNELLY: So first of all, when we first introduced this in 0.8, we realized that we could use a label selector on our validating webhook to exclude the cert-manager namespace from resource validation so that we could go ahead and create our certificates and our issuers in the cert-manager namespace specifically and skip that webhook. So that was our first way, and it kind of sidestepped the chicken and egg problem.

For anyone that's seen the new changes in Kubernetes 1.15 with conversion webhooks for CRDs, well, we want to implement one of these, and that just made the chicken and egg problem quite a bit harder. Because with a conversion webhook, it's not possible to just use a label selector to exclude certain namespaces or so on. So our previous fix for that kind of went out the water at the time that we started wanting to implement the conversion webhook.

So actually, now, we've got some special code in cert-manager that specifically handles and manages and deals with TLS for just the webhook. And this is actually also an out-of-bound operation, so that happens separately from the standard way cert-manager's issuing certificates.

After that, we're actually looking at reflecting some kind of fake resources back into the Kubernetes API server after we've bootstrapped the TLS, once it is possible to do, but that will be further down the line. And that's sort of like mirror pods or static pods, if anyone's really familiar with the kubelet in Kubernetes. It's kind of a similar concept or idea.

CRAIG BOX: From deep in the weeds of the Kubernetes infrastructure, let's step back up to the most common Hello World use case for cert-manager. You have a pod that's serving some sort of HTTP traffic. And then you want to configure an ingress which publishes that pod, but you want to be able to serve that securely using TLS. And so you need to get a certificate into whatever your ingress is. It might be a provider ingress configuring a load balancer, or in the general case, it might be the default NGINX controller. What is the process that a new user comes through to configure that so that they can serve their simple website securely?

JAMES MUNNELLY: Yeah, so sort of end-to-end on that-- and we do actually have a tutorial donated by some folks from SIG Docs for this, too-- but typically, you will deploy your ingress controller. You'll deploy cert-manager using the regular installation guide. From there, the first step will be to create the issuer resource. You need to create that representation of LetsEncrypt in your cluster.

So you typically create LetsEncrypt-staging or LetsEncrypt-production issuer resource. Provision that into your cluster, and that represents your account with the LetsEncrypt server. So it includes your email address, details like that. It also includes some configuration, what we call the solver configuration. I mentioned before that you have to do these validations with LetsEncrypt.

So you configure on there to say, I want you to solve these domains using HTTP-01, so I want you to go through that process. And that's typically the way most users use it because it's the simplest out of the box configuration, to create one resource and off they go. From there, you go ahead and deploy your application just like normal. So you create your deployment resources, whatever pipeline you've got for that.

Now you've kind of got two choices here. The easiest one is when you deploy your ingress resource for your application to expose it to the world, you can add on an annotation, which it's cert-manager.io/issuername. And then you just say LetsEncrypt staging.

In the background, the component called ingress-shim in cert-manager will go away. It will notice the Ingress resource with this annotation, and it'll go and manage a cert-manager Certificate resource for you. So it'll go and create it. So it's almost another operator for ingresses to create certificates so that cert-manager itself can kick off its normal flow. So when it comes down to it, once you've got everything set up, all you ever need to do is add a single annotation to your ingress. And cert-manager will kick off in the background and go and fetch a certificate for you and your ingress control will pick that up and serve it.

CRAIG BOX: The first time you do this, the protocol requires you to publish some sort of secret, which is provided by LetsEncrypt. So they will say to you, here's the secret code. You need to publish that somewhere. How do you map that human process that's required?

JAMES MUNNELLY: As you may know, ingress is kind of-- I always describe it as the HTTP router for Kubernetes. So you define on there, this hostname at this path needs to go to this pod. We take advantage of the fact that Kubernetes makes it so configurable and so easy to configure those paths and roots.

Cert-manager goes ahead and it deploys a pod for you, which is what we call the ACME solver. So that is the simplest of web servers, and it just responds with that secret key that LetsEncrypt has given you. So we deploy that. We deploy a service resource into the cluster. And then we configure the ingress accordingly to make it so that traffic for the path that LetsEncrypt require gets rooted to the ACME solver we just deployed.

So we actually use Kubernetes to configure the solving process. Once it's solved, we tear that all down, and then none of that exists until 60 days time or 90 days time when it comes to needing to renew your certificate for you.

CRAIG BOX: We've talked a lot about the free service of LetsEncrypt and the self-signed certificates. There are commercial certificate services, and people who are running Kubernetes in an enterprise environment may want to integrate with Vault for HashiCorp, or Active Directory. And then there's also human processes still where you might need to go and send the request by fax and validate it, and so on. How does cert-manager work with those other enterprise systems?

JAMES MUNNELLY: A lot of the work we've been doing over the recent releases has been to sort of solidify their story. Because I think previously, it wasn't so easy. You'd have to modify cert-manager directly to support your own CA. But like I said before, we want to be defining standardized ways to represent CAs and certificates in Kubernetes, which means we need to be more agnostic and API driven.

So we now have additional resource that users don't typically see, called a CertificateRequest. Now that pretty much just represents you going off to that CA and saying, look, I need a certificate. So with that, we've implemented a few different certificate request implementations, and now they might go off to-- like you've already noticed, we've got one for Vault, we've got internal CA one, the self-signed. Even ACME uses the certificate request stuff. We've got Venafi integration as well, which is typically more enterprise focused.

But it allows you to basically write a controller for these certificate requests that just watches and sees a CSR and then goes off to go and fetch a certificate from somewhere. Once you set that back on the certificate request, you say it's been issued. The rest of the system picks that up and make sure it goes into the secret resource, and it makes sure the renewal is scheduled at the right time. Then we allow anyone to plug in at that point.

So on the human signing point-- actually, it's funny you should ask, because we've had a few people reach out about this. Who knew all these people want to build up this big nice automation on Kubernetes to retrieve certificates, and at the end of it, they want to be paged or emailed? So that's kind of quite neatly easy to do today. You can set up something like a Prometheus or Alert Manager alert just to tell you when a new certificate request exists. And then it can go off and ping whoever it is. And they can go address it.

To make that easier, though, because right now, it's not the slickest experience, even if not that many people are doing human- signing, we're looking at a small CLI tool that will allow you to, first of all, view a list of pending requests and also to fulfill a request, to say, I approve this, or here's the certificate that has been minted for that. So yeah, it's via the certificate request resource, and then just watching for those that are pending and then filling those in with a signed certificate is how you'd go the human signing route.

CRAIG BOX: There was a notification recently from the LetsEncrypt project that said we are getting an awful lot of traffic from misconfigured or older versions of cert-manager and that they're now only going to support two recent releases. What was the situation there?

JAMES MUNNELLY: In Kubernetes, the actual controller pattern that you may have heard of-- you've heard any other shows about building custom controllers and so on. The whole controller pattern relies on watching resources and then reconciling and going and applying those. And then if it fails, retrying. And that retrying word was key.

In earlier releases of cert-manager, we iterating quickly on things and also in experiencing the area, building controllers, there wasn't so much for me to draw on at the time of writing it. But we were retrying basically too aggressively, and in certain edge cases, if we were unable to say persist data, like to say that the last request failed, we were then retrying again. And sometimes over especially the earlier releases, we had a few cases where we could enter quite a tight loop where we'd be retrying every second and hitting an API.

From about 0.8 onwards - or 0.6 onwards, initially - we've been working quite closely with LetsEncrypt to reduce this. We've included things that allow us to see how each version of cert-manager is performing in terms of API usage based on their API logs and so on. And we've seen pretty big improvements in recent versions. I think there's one where it was a few hundred times better API performance.

So one other thing we found is that people tend to, with certificates, they tend to get it working once and then leave it, which I can see why they do, why anyone would do that. Obviously, you don't want to be sitting babysitting this thing the whole time. But a security-focused product, you do need to make sure you're keeping it up to date.

So we have still got some users who are using two-year-old software. And that has started to become a really big issue because whilst we've made big improvements in later versions, the older versions, we can't do anything about it. There's nothing we can do to mitigate it. So after chatting with LetsEncrypt, it was decided that we would start to deprecate and block older versions in an effort to encourage people to keep on upgrading.

In 0.10, 0.11, we've now got better ways to measure how we're doing with the efforts of reducing it. And I think the API usage now is substantially better. As we move towards 1.0, I think we're going to be reconsidering and re-evaluating some of these deprecation periods. But I think at this point, we just need to make sure that everyone is upgrading and keeping up to date. And that upgrade process will get easier in time as well.

CRAIG BOX: You've just released version 0.11 of cert-manager and you mentioned a path to 1.0. This release feels like it's the first step on that path. There are a bunch of changes that are being made to help support a stable release over time. What are those changes, and why is now the right time to make them?

JAMES MUNNELLY: Yeah, we did just release 0.11. So if anyone wants to be on the cutting edge, then do go ahead and upgrade now. So really, it's a stepping stone towards us going to 1.0 and stable, v1, whatever you want to call it. So some of the changes included there-- first of all, we've had to rename our entire API group. So simply, if you think about Kubernetes, you have deployments in app/v1.

Well, CRDs also have groups. Our one has previously been cert-manager.k8s.io. I'll put my hand up and admit that was chosen by me a couple of years ago. And I shouldn't have chosen something ending with .k8s.io. So recent versions of Kubernetes will actually put restrictions on who can use that particular group. And so we've had to move, first of all, over to our own domain, which is cert-manager.io.

That's the first bit, but then the bigger one here is moving from v1alpha1 to v1alpha2. So back in 0.8, we had realized that there were a few issues with the way we'd structured our API. So I mentioned the solving configuration earlier in the show. I mentioned the solving configuration, how it says how to validate a particular domain. So you're saying use HTTP-01 for this one or go and present some DNS records for the other one.

So we had actually stored that configuration on the certificate resource, instead of on the issuer originally, which, at the time, it seemed like a half decent idea. But if you think the certificate resource is meant to represent the next 509 certificate. So the things you see on there should roughly roundabout, give or take, map to x509 features and fields.

So solving configuration is a property of ACME, and like LetsEncrypt and ACME certificate specifically. So as we started to expand out the project to support more and more issuers, you kind of had this odd one out on the certificate. And all we could really say to users was if you're using LetsEncrypt, make sure to set that. And if not you're using it, then just forget it exists, which isn't particularly good API design.

So back in 0.8, we deprecated those fields, and we moved those over to the issuer resource. Now because we have quite a few users who are running this in production environments, we didn't just rush to remove those old fields and just get rid of them altogether. We maintained support for both the new and the old format. So in 0.11 and v1alpha2, we are now actually dropping support for that old format. We've removed the certificate.spec.acme field altogether in v1alpha2, and you are now required to use the new one.

CRAIG BOX: This is a breaking change, that people are going to have to make changes to the resources that they had deployed to do that upgrade?

JAMES MUNNELLY: Yes, exactly. And if you're already currently running a version 0.8, 0.9, and 0.10, you can go ahead and make those changes now before you upgrade, make sure everything's working, and then upgrade. So you have the option there to do that. And it's not too bad. But because of the API group changes, regardless of anything, you'll have to uninstall cert-manager before upgrading to 0.11, run a migration tool that we've built, which will change up the API groups for you and the annotations, and then it will go and reapply those, as well as installing cert-manager.

So we have provided some tooling for this, but it's a bit of a painful one. It's not too bad. I went and upgraded my own cluster the other day, and it did take me a good 20, 30 minutes, just to make sure we've got everything right, but yeah.

CRIAG BOX: When do you think you'll get to 1.0?

JAMES MUNNELLY: Yeah, it's the golden question. I think sometime by the end of kind of Q1 2020, I'd expect that, so somewhere between three, four, five months. I don't want to rush it. I think, for the most part, the API surface in v1alpha2, we're pretty settled on. There's not any breaking changes that I can think that we need to make, or anyone else on the team at this point, so it's looking good as it stands. Provided all goes well and we don't get massive complaints or any good suggestions coming up in the next few months, I would imagine three or four months.

CRAIG BOX: Getting certificates into Kubernetes is a problem that many, if not all, clusters will have. What has the adoption of cert-manager been like?

JAMES MUNNELLY: Yeah, it's actually a very difficult one to measure accurately. We don't drop tracking beacons into your software or anything like that. So whilst I would love to know exactly how many instances are running it, I couldn't tell you exactly. I've recently taken a look at some of the quay.io, which is our Docker image registry-- some of the logs there. And it seems like for the last month or so, we've been pushing about 1 million Docker image pulls per day.


JAMES MUNNELLY: And I think that's a misleading number because honestly, I don't know quite how many people are using Kubernetes. But--

CRAIG BOX: Perhaps they're all failing and retrying.

JAMES MUNNELLY: Yeah. Perhaps so. I get the feeling some people might be mirroring our Docker image registry into other clouds and things, which is inflating that number. But either way, it's an astronomical amount, and I'm quite astounded.

CRAIG BOX: GitHub lists over 170 contributors who have worked on cert-manager to date. If someone wants to get involved, how should they go about doing that?

JAMES MUNNELLY: First of all, I'd just like to say how astounded I am at the number of people who do get involved. It is brilliant, and it's really heartwarming to actually hear that so many people want to get involved and help. So we've got the cert-manager dev channel, first of all, on the Kubernetes Slack. That's slack.k8s.io. We all hang out there. I've got my notifications set up, so I'll get pinged straight away if someone drops a message in there. So that's a good place to just kind of say hello, introduce yourself.

We also run a bi-weekly community call, where anyone can come along, join, ask questions, and get an update on what we've been up to and where we're going. It's during those calls that we'll do planning for the next milestone, the next release, and also discuss how this current milestone's going and keep track of that issue board and moving things through.

CRAIG BOX: OK, James, thank you very much for joining us today.

JAMES MUNNELLY: Thank you for having me.

CRAIG BOX: You can find James on Twitter @JamesMunnelly. And you can find cert-manager at github.com/jetstack/cert-manager.


ADAM GLICK: Thanks for listening. As always, if you enjoyed this show, please help us spread the word and tell a friend. If you have any feedback for us, you can find us on Twitter @KubernetesPod, or reach us by email at KubernetesPodcast@google.com.

CRAIG BOX: You can also check out our website at KubernetesPodcast.com, where you will find transcripts and show notes. Until next time, take care.

ADAM GLICK: Catch you next week.