A checklist for OK OKRs

Sunday. December 01, 2019

Checklist to verify that a Key Result is at least OK.

Run yours through it and think over how you can improve them to tick all boxes.

A couple words more

Some time ago I wrote down a thought experiment I did as an internal blog post at Yelp. The thought experiment was about improving the OKRs my team had drafted for the quarter. I cannot really publish the original post here, but I wanted to at least share the output of that mental exercise: a checklist to test the quality of key results (or KRs).

I pretty much like the idea to run tests for some kind of documents we write as it’s standard to do for the code we write. I know that frameworks to do such a thing automatically exist, but I find checklists easier to share and simple enough for other people to pick them up without committing into learning and deploying some new technology. They are also very much used in the aviation industry as they have been helping out to strengthen the safety of flights all around the world for many years, so yay checklists.

The items in this one are the result of many of my errors as a writer of OKRs, but since I cannot share the key results I wrote at Yelp, I’ll try to demonstrate how to use the checklist with some fictional ones. Let’s start.

Examples

Let’s take the happy path first and let’s see how the checklist behaves with a simple and allegedly well-written key result.

OK, let’s try something else, but still pretty alright.

KR #2: 100% of our services are running on AWS

This one looks pretty OK, too, but let’s see for ourselves.

So far so good. Time to see what happens with a pretty bad key result.

KR #3: Deploy memcache in front of the emoji service

Assuming that every company has at least an objective like “Deliver better and faster emojis to our customers”, let’s have a look at how the checklist behaves in front of a supposedly bad key result:

As we can see, with three failed checks, the checklist identified quite clearly a bad KR.

Alright, let’s move to something only slightly broken and therefore more interesting.

KR #4: GET emoji calls will complete in less than 300 ms

With the same objective as before, let’s see how a key result which is just a bit off gets caught by the checklist:

At this point, we should have caught that the metric we chose is kinda off and that we should move to something more similar to an SLO, like “99% of GET emoji calls averaged over 1 minute will complete in less than 300 ms”.

After having seen first hand how the checklist works, the next thing I want to show you is how we can make a key result better iterating through the checklist until we check all boxes.

From bad to OK KRs: platform release

When it comes down to OKRs for platform or infrastructure teams, this kind of key results is pretty common. Imagining an objective like “Powering the next-generation event processing at Company”, let’s have a look at the checklist:

This one turned out to be pretty problematic: it fails three of the criteria and barely passes another one. Let’s try to change it a bit to check at least one more box, starting from delivering value.

KR #2: The event processing platform powers at least one beta use case

This should have improved things quite a bit; let’s give it a try:

Only one is failing; let’s try to do even better focusing on something more finely measurable.

KR #3: Implement 100% of the features required for one beta use case of the event processing platform

This is a bit of a mouthful, but bare with me if that’s a problem for you. Let’s go through the checklist for now.

If you can settle for this KR, I would advise you to do so. If you cannot, a thing that I tried with success at Yelp is to simplify complex KRs with the addition of scoring criteria where to hide part of the complexity. Let’s have a look at them in action.

KR #4: The event processing platform powers at least one beta use case

Scoring: % of the implemented features required for a beta use case

Usually people within the team or second level managers look at the scoring criteria for KRs, while executives pay attention only to the actual key result. Let’s run it through the checklist.

If you can make scoring criteria part of your organization’s habits, they could really help you out when writing an OK key result is proving quite hard. However, shifting behaviors in a big group of people takes a good deal of effort, so I would recommend you to stick to standard KRs if you manage to get by with just them.

from yelp import software_engineer

Antonio Uccio Verardi website © 2019