The success of every engineering team depends on their ability to rally together and all “swim in the same direction.” How do different teams write software systems that can speak to each other? What happens in case of an emergency? How do we keep two teams from building conflicting products?
The list of possible what ifs is endless, and without a common set of core directives, theteam will go off on a thousand tangents. The lack of cohesion will either slow everything down or create mountains of mura and muri as multiple teams within the organization are forced to waste their time making heterogeneous systems work with each other.
Here at Noom, the Engineering Team has a Software Engineering Principles codex we use to align our work. It’s a living document that we iterate over every time we learn something new about our strengths and weaknessesas our size and complexity grow over time, so do our unification requirements. This approach has created cohesiveness in our teams and streamlined our development process.
When faced with the challenge of coordinating the work of a team, the obvious approach is to create a number of rules. With a detailed list of dos and don’ts in place, everyone should know what is expected of them and how they should behave in every conceivable scenario.
We find this approach suboptimal for two reasons. First, it’s essentially impossible to, well, conceive of every possible scenario. For every rule you put into place, you create an infinite set of new situations in which the members of your team are left without instructions. This gives them a choice between two equally unpleasant decisions: wing it and risk getting reprimanded for coloring outside the lines, or don’t do anything and risk getting reprimanded for not acting to solve the problem.
Say that your team puts a rule in place according to which all code must be reviewed before being checked in. What happens when someone finds a critical bug in the middle of the night and no-one else is around to review a patch?
So you add more rules — perhaps an on-call rotation that specifies which unfortunate soul should be awakened to review the code. Summoning that person is going to take time, however, and time is money in systems that are designed for 24/7 operation. Plus, how good will a code review at that hour be? What do you do if the person doesn’t respond? What if it’s a holiday and that person has had too much to drink? Sooner or later, you’re going to hit a corner case, and because that corner case is, by definition, something that you didn’t anticipate, Something Very Bad™ will happen.
The second shortcoming of a rules-based approach is that rules inspire the kind of malicious compliance that teams must avoid like the proverbial swarm of angry bees. When someone finds a bug in the middle of the night, the worst thing that can happen isn’t that an engineer makes things worse by trying to commit a hastily-written patch to the codebase. It’s that they simply decide not to do anything — because, after all, they are not supposed to push code to the repository unless it’s been reviewed by another engineer.
Why is it worse? Because the engineer isn’t acting with the best interests of the company in mind. Being told what to do absolves people of the responsibility to think and makes them behave like automata instead of expecting them to behave like owners.
Instead of rules, we have organized our team around principles. These are purposefully vague because they revolve around two simple concepts: personal responsibility and task-relevant maturity.
We spend a lot of time and effort hiring team members that we can trust and mentor. When we invite someone to join us, we expect them to use their best judgment when making decisions, knowing that they will never be blamed for doing so. We also rely on them to know what they don’t know. If you think you’ve found a bug in the middle of the night and it’s your first week at work, common sense says you really need to have someone check your work before pushing it live.
Does this lead to mistakes? You bet, but it also leads to something wonderful. If your team truly believes in the principles you have set out for them, they will spontaneously create the structures needed to make sure that they can work safely and successfully, without the need for an endless list of rules to tell them what to do.
For us, technology is a means to an end. At least at the current stage of our existence as a team, everything that we do must be for the benefit of our users, our colleagues, or ourselves.
This means that, in order to be successful, we need to put ourselves in the shoes of our users, understand their needs, and work on creating tools that satisfy them (which are often not the same as the tools they actually ask for).
Thus, our job isn’t to write code, but to solve problems. Code often comes into the picture at some point in the process, but only after we thoroughly understand the underlying challenge and have come up with a solution that fits the user’s model of it. We cannot code that which we do not understand.
This role also makes us ultimately responsible for the ethical considerations behind every single piece of functionality that the company places in production. We must always do our best to ensure that, through either action or inaction, we do not allow our users to be harmed.
As engineers, we should prefer to work on problems rather than tasks. This means we should expect not to be micromanaged and much of our work should be self-directed based on the needs of the team.
It’s also important that we own our problems. Don’t just wait for answers — ask questions. Find something that needs improving and make it your own. Help create knowledge that you can share with your fellow engineers.
The kind of freedom we enjoy only works well when it is accompanied by a deep sense of respect and responsibility toward our teammates, including:
Rather than jumping into implement the first solution that comes to mind, it’s always better to start new projects with good design and technical documentation. This ensures that we think through all the nuances of the problem we’re trying to solve and gives everyone an opportunity to review and comment on each other’s ideas before we go through the effort of writing the code.
We should be particularly careful to document those projects that do not originate in the Product team, ensuring that we write proper design specifications where we explain:
Designing technical solutions is a bit like planning orbital paths: You must think about where you are going to be rather than only where you are, and you always need to remember that things are going to move under your feet while you’re getting there.
When designing new functionality, we must think about how well it will work over the next 12 months. This is our time horizon — it’s as far into the future as we, as a team, want to peek. Keeping this in mind will help us avoid premature optimization. We can keep our planning realistic and allow new technologies to evolve under our feet so that we can take advantage of them.
Of course, it is often appropriate to plan for a shorter time horizon. For example, if we’re trying something on an experimental basis, we should try to validate it as quickly as possible, even if this means it is only a temporary solution.
Before introducing new complexity, we must make sure that we have explored at least the most obvious alternatives: Are we using the tools at our disposal for everything they have to offer? Are there ready-made solutions that we can take advantage of to defray the implementation cost of the chosen solution? Is there some wisdom of the crowds or external expertise that we can use to avoid reinventing the wheel?
We must also project our solution into the future. How will it evolve alongside Noom’s business in a year’s time? Will we need to introduce more complexity to deal with its consequences?
Because so many of us are remote, communication is unlikely to happen as spontaneously as if we were all in the same office. Therefore, we must be very deliberate and overcommunicate. Don’t assume that you have been heard until your message has been acknowledged by others.
Likewise, use meetings well. Unproductive meetings are our worst enemy, so always try to make sure that a meeting is the right way to solve a problem. Set an agenda and invite the smallest subset of people needed.
We accept that the highly experimental nature of our work means that mistakes are not merely likely, but are certain to happen. We will screw up, and when we do, we will all work together to fix each mistake and learn how we can avoid them in the future.
Because every mistake exists at the tail end of a complex sequence of events, Noom is a blameless environment. Never stop at the proximate cause of fault. Instead, dig until you find the root cause, and then come up with an institutional memory that can help prevent the same kind of problem in the future.
Finally, knowing that mistakes are going to happen gives us the opportunity to proactively be ready for them. We must work to build a strong safety net of code reviews, continuous integration, testing, monitoring, and automation around our code to help us react quickly when things go sideways — and, ideally, to prevent them from taking that turn in the first place.
As you can see, the list of principles under which our Engineering team operates is not very long, and that’s on purpose. We only need to remember 8 concepts to understand how to do our job well — even fewer if you consider that many of them really fall under the umbrella of plain-old common sense.
This list is not cast in stone. We update it every time we learn something new about what makes us a better engineering team (or, occasionally, a worse one). More importantly, I go over our principles with every single new engineer we hire, regardless of seniority, and impress upon them the importance of understanding each item on the list, as well as the overarching idea of governing our behaviour through it.
Of course, these are our principles. They are Noom-specific and fit into the larger picture of the company-wide principles that affect our organization as a whole. Your team may well land on a completely different list, based on variables like its size, maturity, and the kind of organization you work for.
The key concept that I hope you take away from this post, therefore, isn’t the list itself, or even that you should use a list. Instead, realize the importance of having an explicit set of principles on which your entire team can agree. Without one, each person will dance to a slightly different tune and never fully be part of a community that is more than just the sum of its parts.