How we take (good) decisions

Nicholas Suter
YounitedTech
Published in
7 min readApr 8, 2021

--

Taking decisions in small organizations is easy(-ish). But as the organization grows, things get more complicated. When there were 10 people in our Tech department, we just had to ask a question to the other people on the bench. At 30, we could still yell the question across the open space, and a bunch of people would go and talk about it. At 80, it required a bit more effort. But we could still gather the right people to dig into the problem to solve and then go and check with every team if the solution we’d come up with was OK for them. At 100, we had to come up with a new way of taking decisions. So it was time to take a decision on how to take decisions.

Or you can just try being lucky

What is a good decision?

We’ve always liked group-based decisions and definitely don’t want to start building ivory towers. We want decisions to be taken the nearest possible to where they are put into action. But we have blown way past the point where we could involve every member of the Tech team and hope for a consensus to emerge naturally. But what is a good decision?

Taking decisions at the right moment

As often, it is a question of balance. An easy decision shouldn’t be slowed down by bureaucracy. Who wants to fill in forms and get managers to sign them to order a new pen? But some decisions precisely need reflection and maturing. We wanted our process to be flexible enough to be executable within a few days but also to guarantee that we had given enough thinking to the question that was asked.

Getting the right people involved

We use the RACI matrix. Nothing revolutionary here, but we’ve found it to be really useful. If you’ve never come across the concept, for each task, you can dispatch the involved people into 4 different roles:

  • Responsible: who executes the task?
  • Accountable: who gets told off if things go wrong?
  • Consulted: whose input do you seek to execute the task?
  • Informed: who needs to be informed that the task has been completed?

For each kind of decision we take, whether it be architectural, organizational or technical, we have established who has which role. Anyone can be responsible for pushing change or asking for a decision to be taken but most times, our communities push their own topics. The Tech Committee is responsible for organizational decisions, the Architecture Committee for… architectural decisions, the DevSecOps for infrastructure decisions… You get it.

We like accountables to be group leaders. That doesn’t mean they take every punch in case something goes wrong. But they are responsible for reacting to events, adapting plans and giving long-term targets.

We consult large populations to test hypotheses and gather feedback.

We inform everyone we can. This is done through wiki pages and Slack posts. We like meeting minutes to be made public and searchable.

Here is the RACI matrix we use :

Applicable decisions

A decision you can’t apply is a bad decision. If you haven’t made sure the decision gets quickly implemented, all you’ve made is a wish. We like testing ideas before throwing ourselves into the battle. A proof of concept (PoC) is a powerful tool we heavily rely on. You test an idea in an isolated risk-proof environment. If the PoC is successful, we move to a beta-phase, where we apply our hypothesis on a real-world example (or a small group of). At this point, we can see how our idea works in the real world. This gives us a very useful feedback loop.

In our opinion, critical decisions must not be taken before the beta-phase is successful.

Good decisions are traceable

Making sure your decisions are applicable isn’t enough. You also need to track its actual application. This can be done using todo lists or regular meetings, but automation is also an interesting option. For instance, when we decided to migrate all our public APIs and front-end apps behind Azure Frontdoor, we found a way to automatically identify the public resources and which ones had been migrated. Every day, a script is executed and generates a wiki page. This makes sure that the todo list stays up to date and requires less human work. Modern dev tools provide easy ways to do this kind of stuff. Cloud providers all have APIs and CLIs. Our wiki enables us to create pages through an API, but generating Excel or CSV files in a shared folder is just as good.

RFCs

We used an Architecture Decision Record for a few months to trace high-level architectural decisions. It is an excellent tool I strongly recommend you use, and so do the brainy guys at Thought Works. We actually still use them at a lower level in our git repositories. New team members will love you for documenting your choices. The drawback we found was that ADRs focus on what your decision is, and not enough on how you took it. That is why we moved to Request for Comments.

An RFC is a templated document used to open a discussion topic and move to taking a decision. They have been used for over 50 years now to decide how Internet works. Really. RFC-01 was published in 1969. You can imagine these are not simple topics, and involve a large number of distributed contributors. This gave us confidence that the pattern was future-proof and scalable.

Our template is very simple :

Our very own RFC template

Each RFC has an ID (RFC-001, RFC-002, etc.) for easy reference. We track who initiates the RFC and who reviews it. The Context paragraph should be a short description of the problem we want to solve. The Rule proposition paragraph is a list of short sentences containing each a modal verb : CAN, SHOULD or MUST. The Pros and Drawbacks give context on what we expect as an outcome to this RFC. And last but definitely not least, the Exceptions paragraph must list all exceptions to the RFC. This is extremely important, as we consider a validated RFC as a law. But laws are not always perfect in 100% of cases. Knowing when to bend the rule rather than complying for the sake of complying seems the better option in our opinion. It’s also a good KPI to monitor the quality of the law.

We store our RFCs in a dedicated section of our wiki and use folders to track their state :

A simple folder structure for a simple workflow

The Decision Workflow

Now I’ve described to you the different pieces of our decision process, let’s see how they fit together.

The Decision Workflow

The first thing you can see here is that documentation and implementation are parallel streams that influence each other.

Let’s say you believe that we should adopt the Back-for Front pattern.

You can start with a PoC or with documentation, as you like. The PoC is a first feedback loop that usually influences your RFC. When your RFC draft is mature enough and the PoC is successful, you can summon the accountable team leader and his/her team. The Architecture Committee, in this case. Each community has a weekly meeting, so you can just ask to be invited to one of the next meetings. This meeting will raise questions and remarks from the accountable community. You need to find a good balance between encouraging people who want to make thinks change and efficient gate-keeping. A simple RFC will lead to a quick meeting, and an immature RFC will probably generate rework and another meeting a few days or weeks later.

Once the Architecture Committee has given you a green light you can start the beta-phase. You can also summon the consulted community : the Lead Dev Community. This is a larger community, but this time, you have momentum: you’ve already convinced the Architecture Committee and they can help you convince the lead devs. Discussions are usually simpler, but the green light from this community is important, because each team has a lead dev, so you might identify a corner case. Another feedback loop!

Once the beta-phase is successful and the Lead Dev Community has given its green light, the RFC is officially published. We inform everyone in the Tech department and move to the deployment phase. That’s when traceability becomes important.

But is this useful?

We’ve found this process to be both efficient and versatile. The several phases you need to go through ensure that complex decisions get the right amount of brain power. It doesn’t give us absolute guarantee that we don’t go wrong, but it definitely minimizes risk. And simple decisions can be taken within a few days or even a few minutes if you get the right people involved and available.

The decisions we’ve taken since using this pattern are better quality. We make sure people are informed and can refer to them. RFCs are public documents and are searchable. If anyone finds a violation, they can refer to the RFC. This can lead to work to become compliant to the RFC, but it can also lead to documenting an exception. Of course, too many exceptions imply that the rule is bad. So the rule can evolve. For this, we use amendments (ex : RFC-001–1).

I can only encourage to try this out. It’s cheap, since you only need a wiki or a shared folder people can access. And tell us how it went. All thoughts and feedbacks are more than welcome!

--

--