Fear and Loathing and Sticky Notes: Salvaging A Disastrous Engineering Project

Hi! I’m Adam, from Reaktor. We’re a bunch of consultants who try to build the right thing and build that thing right. I’m not going to tell you about a great project or even a normal project. Instead, I’m going to tell you how we made the best of a pretty bad project. Everyone’s found themselves inside the morass of a messy assignment, and to help you get your bearings and start putting your ducks in a row when this happens, I thought I’d share how the Reaktor team went about salvaging a disaster into a workable engineering project management process that worked.

We were a few months in. By that time, we normally hope to have shipped something to production, received feedback from users, and a maintained a great relationship with the client.

Our situation was not that.

We weren’t iterating toward anything, because despite hiding behind the scrum rituals, the client was enforcing a strict waterfall process. The backend team was physically, procedurally, and politically isolated from us, so production was unimaginable—in fact, we were pretty nervous the frontend widgets we were building wouldn’t integrate with the backend team’s system at all. And as a result of trying to convince the client’s nontechnical managers to address these issues, our relationship with them had gotten pretty dismal.

A Turning Point

Another Reaktorian was watching from the sidelines. On a particularly bad day, he interrupted our whining and complaining to ask two simple questions: “do you really think that nothing will launch?”, and then “is there any situation in which the client is happy?”

We hadn’t considered it that way before. We thought for a moment, and we agreed on the answer to both: nope. Ultimately, we knew that they would manage to launch something, even if we didn’t fix things ourselves. And try as we might, we couldn’t imagine the client being happy with us.

That sounds awful, but it was liberating. We had been trying hard to fix the project and please the client, without considering whether it was even possible. That realization helped us see what to do next.

Since we couldn’t make the client happy, we decided to make ourselves happy. We drew three big concentric circles on a whiteboard, and began writing elements of the project on sticky notes.

In the small inner circle, we stuck the things we could control directly, like our own attitude and internal team processes. The next circle was for the things we could influence but not directly affect, such as how sprint planning worked and how tasks were divided amongst the team. The biggest circle was for everything else: the things that bothered us but we could do absolutely nothing about. That outer circle—the “we can’t possibly address this” zone—held a lot for us. There was stuff I’ve already mentioned: the client’s happiness, the team structure, and the overall waterfall process.

Based on those circles, we redirected our energy. The outer circle contained all the things we had spent our energy on so far, but it was also a list of the things that we had to just let go. No matter how bad they felt, there was nothing we could do. We could have some impact on the stuff in the middle—our circle of influence—but we could effect the most change by focusing on the center.

A Process Bubble

With our priorities and goals realigned, we got in the weeds and had a look at process. As someone on our team said in a retro, a huge struggle for us was that all of our work was tracked in a “David Bowie–grade JIRA labyrinth“. While everything was tracked in JIRA, searching for individual items was difficult, leading team members to instead create duplicate tickets for the same issue.

The thing that took it from inconvenient to disastrous was that some managers added a JIRA plugin, hoping to understand how much work remained, that produced misleading reports.

This resulted in totally incorrect timeline projections for completion, and, worse, made some higher-ups to (incorrectly) believe we would need more headcount to get the job done.

One by one, we went through each ticket in JIRA. We wrote each new task on a sticky note, and we put that on the table. We consolidated any new details from duplicate tickets, and wound up with a manageable, clean pile of tasks.

We did this exercise with a manager, and I’ll never forget the relieved sigh she let out when she realized that not only was the situation not disastrous, it was in good shape. Completion wasn’t going to take months, as projected, but rather a matter of weeks. It was all there in the stickies: you could see them, touch them, and understand them.

Putting it all literally on the table gave us a deep sense of calm and control. I remember feeling proud of that exercise, but what really changed the game was what we did next: we put those stickies on the wall.

The sticky notes grew over the wall like ivy in the sunshine. At a glance, we now knew what work there was and what state it was all in. We could reprioritize by just picking tasks up and moving them. And once we got that level of understanding, the dynamics changed.

In sprint planning, we had often been blindsided by tickets that had been hiding in a dark corner of JIRA but had suddenly become very important. Now, our view of the situation was more relevant, up-to-date, and comprehensive than JIRA. We came in already knowing the plan.

The board helped in other ways as well. Although it was largely superfluous, the client required everything to go through their QA process. In JIRA, it wasn’t clear which tasks needed the QA team’s attention. On our physical board, we saw stickies piling up in the QA phase, and so we invited the QA people to our daily standups. Not only did they see which tasks were ready for them, they also had an opportunity for quick discussion with the team if needed. The QA column emptied and stayed empty.

Eventually, the visualization changed how the whole system worked. In the dark days of JIRA, the managers would define tasks, then at some later point they would prioritize them, then at some much later point they would throw a bunch of them into a sprint plan for us. And then at some point still later, we would work on them.

By the time work started on a ticket, it was a relic of an ancient time: its definition, its priority, and maybe even its existence had become irrelevant. We spent a lot of time dealing with that instead of doing work.

Once we saw all the work on our wall, we started to prioritize and plan for ourselves. When we needed another task to work on, we quickly evaluated which sticky had the next highest priority, we made sure it was well defined, and then we started work on it immediately. We redefined our workflow within our bubble, and it helped us a ton.

A Technical Bubble

By this point, we had built ourselves a technical bubble, too. Although our code was mostly under our control, there were still some dangers from the outside world that could slip through. One major source of this was our interface with the backend system.

We built frontend widgets and defined the data interface for them. Although we knew defining that interface in isolation from the backend team would result in disaster, we failed to convince anyone else—even the backend team—of that, and so we had to just do it.

Eventually, we needed to change the format of the data for a widget, and we discovered after doing so the backend team had already integrated that widget, and we had just broken things for them. Worse, they were too tight on time to change their code to support our update. Worst, they were in fact too tight on time to change any interfaces. Suddenly, they declared an API freeze.

That was unpleasant news for us, but there we were: we could make the change and break everything, or we could not make the change and be stuck in our awful past forever. Or, maybe gnarlier, we could try to make changes in such a way as to be backward-compatible to whatever arbitrary versions of our code’s interfaces the backend conformed to.

We realized there was a better way, one that was elegant in how neatly it contained all the inelegance that threatened us: we built a backward-compatibility layer.

It was pretty simple. For each maybe-incompatible change we made, we also wrote one function. That function took in a data object, potentially in the format that was used before the change, and returned a data object in the format the change introduced.

The backward compatibility layer exported a function that just composed all of these simple transformations in sequence. To future-proof (or would it be “past-proof”?) your widget, you just imported that compatibility function and passed your data through it. Data in the current format would go through unchanged; ancient data would undergo the full evolution process and come out in the current format.

The clever thing is that it isolated all of the volatility into a single file and preserved the existing contracts for both the frontend and the backend. On the frontend, we could continue to make the right changes to our widgets’ interfaces, without worrying about compatibility. On the backend, they didn’t have to worry about unexpected interface changes; instead, they could feel secure that their code wouldn’t break with newer versions of our work.

A less obviously clever thing was that the compatibility code made no pretense of being code that should stick around. Nothing was well named in a traditional sense: if one of the functions renamed a name field to title in the Header component, for example, and it were added in pull request #272, it might be called pr272HeaderNameToTitle. It’s easy to look at that and cringe, but it’s actually a great name, given the context. A name like that tied the function to a specific context: a particular set of changes made at a particular point in time and merged in a particular PR. With perfect clarity, that subtly enforced the append-only nature of the transformation process and of the compatibility file itself.

The most clever thing about it, though, was that it brought real confidence: we used it to ensure backward-compatibility in our testing process.

As an artifact of the project’s odd division of responsibilities, we had created our own little server with some example data to show each of the widgets in action. Our test suite already ran that data through the type system to make sure that our components and the ideal data were in sync. Testing the compatibility layer was an easy extension of that process: after checking the current example data, we also retrieved, from Git, the example data from our agreed-upon compatibility point, ran that through the transformation, and then checked the result of that against the latest types. Now, in addition to failing if the current example data were out of sync, our build would also fail if the compatibility layer didn’t sufficiently transform old data to work with the current interface.

Bubbling up

The project looks pretty different now. A few months after we did that first exercise with the circles of control, we did it again. We were surprised to discover how much the circles had grown. Whereas before, most of the items fell outside of our control, they were now mostly in our circle of influence. There were even a few things that had moved into our direct control.

We’re doing end-to-end work now. We’re talking to users. We prioritize and do work with the client. We’re even sending the occasional probe to planet production.

That’s not all just because of our procedural and technical bubbles, but I think our bubble made it all possible. However, if you take inspiration from this and set out to build your own bubble, I have one final piece of advice: respect your interfaces. We obviously would have made life awful for the backend team if we had changed our interface with them; instead, we found a way to preserve the important parts of the contract for both sides. But in a less obvious way, we respected JIRA as an interface, too. Our physical workflow board was the thing we relied on, but we still also kept JIRA up to date. The managers found value in JIRA and relied on it; we would only have worsened our relationship if we’d let it die.

Respecting these interfaces with the things outside of our control was extra work for us, but it allowed us to change and improve the things inside our control. As a result, we felt more effective, we produced more work, our morale increased, and we gained trust. With trust and momentum, we slowly widened our circle of control and became able to make even more meaningful change.

In short, by directing our energy where we could make a real difference, we turned our vicious cycle of misery into a virtuous cycle of improvement.

If you find yourself in a vicious cycle of misery—and you will, as we all do—I encourage you to spend your energy on changing the things you can truly control. Building a meaningful and reliable engineering project management process takes time, but with dedication and the right focus, you can make it happen!

by Adam Lloyd

September 14, 2017

Engineering

Read the next blog

7 Things to Never Say During a Salary Negotiation