How Big Technical Changes Happen at Slack - Several People Are Coding

By Slack Engineering

Note that these phases are a descriptive model, not prescriptive. We’re not forcing adoption to follow this sigmoid curve; it just naturally must, no matter how we wish things were. There is no way for early exploration to proceed as quickly as midlife adoption, and there is no way for the final push to get to full adoption to go as quickly as the middle phase went. The three phases are not consequences of any milestones, processes, tools, or people at Slack. They are part of the fabric of technical change, and they would be there whether we noticed them or not.

But now we’ve noticed them, and we can use them to make our efforts more successful. The tactics and strategy for each phase are different.

Phase 1: Exploration

Phase 1 is frictionless to enter. When an engineer first starts messing around with a technology they’re excited about, no permission-granting process or ceremony is needed. It probably happens dozens of times a day at Slack: someone reads about or invents something new, and commences fiddling around with it. Perhaps they have read a blog post about Elixir, or Cassandra, or WebAssembly, or TCR. They download some software, build it, poke around a little, work through some introductory material, and maybe take a stab at applying it to their day job.

Most exploration efforts sputter out here. This is good! Giving up here is one of the important ways we resist spending too much energy on fads. However, some things do make it out into our real workflows and codebases. Sometimes, an engineer can just apply this solution in place, because it solves a problem local to their team’s work. Sometimes, though, the situation is even more exciting: this new widget is useful for an entire class of problems that other teams face. Our intrepid engineer now believes they know something consequential that the rest of us in Slack Engineering do not: that there is a better way for us to do things. Once work starts to affect others’ work, you’ve entered Phase 2.

Phase 2: Expansion

Let’s take a moment to pity the poor engineer entering Phase 2! For they are now trying to modify other engineers’ behavior. This is going to involve communication, persuasion, and — if it is going at all right — substantial technical work. For most projects, Phase 2 is the most difficult, time-consuming, and discouraging phase. It is the “product-market fit” phase of the technology cycle, and many of the projects that enter it will not successfully complete it.

At Slack, client teams are free to choose not to depend on your system, with few exceptions. This may surprise you if you have a lot of experience at an “infrastructure-driven” engineering company. At some companies, leaders pick winners and losers before the product-market fit negotiation at Phase 2 has reached its conclusion. The goal of having a winner selected before it has been widely deployed is to provide clarity (“What does the future hold? Which system should I build on?”) and to economize on the expensive period in Phase 2 where more than one way of doing things needs to be supported.

While those are reasonable goals, it is not how Slack chooses to approach the adoption of new systems. We prioritize fad-resilience over speed of adoption. And so, we (intentionally) place the burden of getting other teams to adopt new technology mostly on the change agent. While this can be frustrating for the advocate of a new system, we know of no better substitute. Clearing this hurdle forces selection of Stuff that Works. If the new thing really is as wonderful as we hope it is, it should help the teams that depend on it get things done; this success can move them to adopt it and advocate it.

Some of the work of Phase 2 is fundamentally more like product work than like what-you-might-think-is-engineering. You need to do user research to figure out what problems matter. You need to communicate the value of your solution relative to previous practices, in ways your users are prepared to hear. You need to build things that close the gap between current practice and the change you’re making, to grease the skids for your busy and distracted clients.

Successful execution in Phase 2 eventually leads to some self-propelled adoption, where people you did not explicitly sell on the new tech are freely choosing to use it. The end of Phase 2 is close at hand when the new system is a de facto standard, the default practice for new projects. It is unusual to accidentally achieve this kind of adoption. It’s really hard, and draws on skills that are not part of every engineer’s professional experience.

Phase 3: Migration

The self-propelled adoption phase eventually starts to taper off. We are left with a residue of holdouts: use cases that seem especially resistant to the new way of doing things. Some systems that have been quietly working in the background are especially unmotivated to change just because they are not being actively developed. In some cases we are discovering late in the game some ways in which the previous system really worked better. Finally, there are always a few stubborn users who are overly invested in their muscle memory of the old way.

While we’ve been talking about “the” technology adoption curve, there is actually a fork in the road at Phase 3. Even very successful projects might not migrate every last use case to the new way of doing things. For instance, at Slack we have very widely adopted gRPC as an internal API technology. It is solidly in late Phase 3. However, we are unlikely to build a new version of memcached that uses gRPC; memcached’s custom protocol works well, and is well-supported in the clients we care about. The existence of exceptions like this doesn’t make gRPC adoption a failure.

In other cases, the costs of having More Than One Way (cognitive burden on engineers; operational burden from running the Olde Systeme) are high enough that we will migrate everything to the new way. For such projects, we need a plan to tackle the hold-outs. Different tactics are appropriate for different obstacles. The systems that just haven’t changed in a long time might need the change agent to adopt them and start moving them into the future. If the holdouts are functionally motivated, by real capabilities the new system lacks, you may need to enhance the new system, or wrap it in code that emulates the old system’s capabilities.

In the occasional case of emotional attachment to the old system, person-to-person outreach is usually a lot more effective than public, high-stakes debate. And please be gentle; your beautiful, new system will be the Old Way some day, too — if it is successful enough to live that long.