Tradeoffs in Automation

Releasing software, to this day, is often a manual task. Scripts may be involved, but we still depend on human orchestration. Knowledge often remains locked away in individual minds. This is disastrous for many reasons. Even if we remember everything, each time we repeat a task there will be slight variations. After a few days, let alone months or years, we’ll forget most of what’s necessary and be left to re-invent what we need to do.

Fortunately, we have fantastic practices and tools to move from highly manual, to entirely automated. Unfortunately, many people see these as insurmountable barriers. And, even when we talk about some of the tools involved, like Puppet for managing the configuration of a server, I often hear: “If you have 2 or 3 servers, it’s not worth it.” I usually hear this from those who manage many servers, so perhaps there’s legitimate bias as to the typical challenges they face that leads to that conclusion. But, take a boutique consulting firm with small scale deployments. They likely have numerous applications in a diverse set of environments. In this case it’s not the number of servers that matter.

Additionally, many steer clear of automation because they primarily hear people talking about these tools in the context of a large scale application deployment where manually configuring servers just isn’t feasible. For example, deploying updates to facebook may need to support rolling updates without any downtime. But, take a boutique consulting firm, they likely have many apps that can be taken offline for updates. Instead of being scared away by the typical baggage of rolling updates, with no downtime, we should step back and look at the challenges that spur the use of these tools to decide when they are appropriate, or not. If we do this, I think we’d see many more organizations adopting and reaping the benefits of automation.

The real conundrum for many is to mitigate the problems with manual orchestration. And, are those challenges significant enough to justify automation. Typically, when we get burned by manual orchestration, we turn to documentation or better documentation. We’ve all seen the word document, or README, often housed in some obscure location. If we’re lucky, it was accurate when it was first written. Many implicit assumptions go into these documents based on who the author intended the document for. If it was for himself, likely there’s much left unsaid. Who enjoys writing down the things they think everyone should know? Furthermore, there’s no way to know if the instructions are accurate, short of reading through them and trying them out by hand. Naturally, these become outdated quickly. They usually only get updated when we get burned and someone cracks the whip. Which is always too late.

So when we fear new practices and tools, this is about all we’re left with. So, what benefits do we have to look forward to if we adopt automation? We can align incentives with automation by only allowing automated releases. Don’t allow people to manually release software, do this culturally or restrict access if necessary. In this case, the automation has to work. It serves as documentation that’s always up to date.

Assumptions are explicit. Because everything necessary to execute the release is automated, we can avoid forgetting as months and years pass. Anyone can learn what’s necessary by executing the automation. There are even tools that allow us to spin up virtual test environments. We can use these to test the automation and confidently make changes. We can orchestrate end to end testing of the result. These scripts can be versioned, like our software, so we always know why a change was made and we can see the evolution of changes.

What are the drawbacks? Well, with automation we have to learn new practices, the principles behind them and tools to make it possible. But, how much of a learning curve is this? We already have to execute all these steps manually. How challenging is it for developers, ops and other roles to develop some software for themselves? Given the alternative, manual documentation that almost always fails, which approach seems reasonable? Even if you only manage two or three servers, what would it save to automate the management of those servers?

Perhaps most frightening, if we stay in the world of manual, we rarely leverage automated testing on any level. We find out about problems very late in the process. We don’t have much confidence in what we do. We don’t have a very deep understanding of the decisions we make in developing software and how those impact the environment we host in, and when to make tradeoffs. We don’t know what it’s like to develop in an environment with minimal differences from the environment our users rely on. We can’t break free of the shackles of traditional development patterns, for example, manual database migrations. There’s so much we give up. We’re at a point where we can’t justify doing much manually. The learning curve of automation is minute in comparison.