Ops: The Other DevOps

Source – devops.com

Much of the conversation around DevOps, including right here on DevOps.com, is about helping developers get easier access to production systems, increasing agility by shortening release cycles. Most of the time, this focus on developers leads to a concentration on so-called “Day One” tasks, the Dev side of the equation, to the exclusion of “Day Two” operations—the Ops side of things.

Recent moves in the DevOps market underscore the fact that the weight is still very much on the Dev side, and not so much on Ops. Microsoft buying GitHub is huge news, don’t get me wrong, not least because many people are still stuck in a very Nineties conception of Microsoft, and this acquisition should help clarify that this is not your parents’ Microsoft (full disclosure: my employer, Moogsoft, has a strategic partnership with Microsoft). News on the Ops side of the house, meanwhile, rarely achieves the same prominence.

The reason for this imbalance is to be found in DevOps’ origin myth. The story is all about developer frustration with risk-averse operations procedures which prevented them from deploying as rapidly and frequently as they would have liked to. As with most good stories, there is a kernel of truth here, but the truth is more complicated. Ops people are risk-averse for good reason: When they mess up, they don’t have a safety net, and they get a lot of very public blame.

That said, it is certainly true that a reasonable aversion to risk can be taken too far. Couple that with a very human tendency to define oneself by one’s job, and you wind up with people putting hands to keyboards for even basic, routine tasks, becoming bottlenecks for huge processes.

Table of Contents

What Does DevOps Mean To Ops?

The biggest benefit of DevOps from an Ops perspective is that it has forced that hard conversation about moving from artisanal, handcrafted IT infrastructure, to industrial and scalable models that don’t require a human in the loop for routine operation. This cultural shift began with the famous pets-versus-cattle analogy, but the transition is far from being complete.

A big sign that there is still work to be done is the relative lack of sophistication around tooling in IT Operations. There is a lot of talk about software-defined infrastructure and desired-state automation, but relatively little practical uptake, especially when it comes to operating such frameworks day-to-day. These new models will require a wrenching transition from traditional Ops models, with pervasive automation working to augment human sysadmins’ capabilities.

This is not the old style of automation, with cron jobs and shell scripts, which any BOfH worth their etherkiller has been doing for decades. The new serverless frameworks do away with the last vestiges of persistent coupling of infrastructure components to individual business services. What vMotion promised in 2003 is now coming to pass, with workloads moving, growing and shrinking to support changes in user demand, and infrastructure reconfiguring itself top to bottom in response.

Key Questions To Ask About Operational Automation

In this context, operational automation can no longer work on the old model of “set it and forget it.” Every process set in motion needs real-time contextual awareness of what is going on around it and how that is constantly changing. Without that visibility, backup scripts may go on faithfully backing up what used to be a high-priority server, but is now a backwater of the application infrastructure. Meanwhile, the cloud services where the bulk of the application’s logic runs are not meaningfully integrated into the disaster recovery (DR) plan.

All of these facets need to be reviewed regularly, in terms of need, function and ownership.

Who Needs This Automation?

Anything that runs should be linked to some kind of business requirement, however indirectly. “This ensures that our infrastructure can scale and flex seamlessly in response to user demand”: Good. “This keeps our operating systems patched”: A valid goal, but do we need to do it in production if the average lifetime of provisioned operating systems is shorter than the patch cycle? Just patch the templates and you’re done. “We’ve always done it this way, and I’m not sure what might happen if we stop”: Time to set some chaos monkeys loose, and see what happens.

What is the Function of This Automation?

We are in the middle of huge technological transitions, and the waves of innovation keep coming closer and closer together. As an industry, we had just about come to terms with virtualization by the time cloud computing came along, but that was far from settled when containerization got underway, and it looks like that may be undermined by serverless architectures before it even comes to its full fruition. Watch out for hidden assumptions in terms of the way you go about a task, and check back in regularly. Ops types should take a leaf from Dev’s book, and regularly consider refactoring their automation to remove cruft and take advantage of new developments.

Who is the Owner of This Automation?

Someone has to own every piece of automation, so that they can maintain it over time according to the previous two points, and so that everyone else knows who to call if (or realistically, when) it breaks. Again, developers have been talking about technical debt for a long time: the impact of older, unmaintained code. Meanwhile, sysadmins used to pride themselves on the uptime of individual servers. That attachment needs to move from the individual server to the overall infrastructure and the automation and management that develop over time to ensure its availability and performance. They’re cattle, not pets, remember?

Ops is the Other DevOps

As Ops becomes more and more about managing the automation, the distinction between Dev and Ops starts to blur. With Dev roles “picking up the pager” and taking responsibility for production operation of their software, Ops roles develop more and more code to fulfill the promises of those applications. With traditional Ops tasks evolving or even disappearing, new roles will appear. The job of automation architect will become much more relevant than any job that requires someone to ssh into an individual server or router. In this transition, old-school Ops people can learn a lot from what our Dev colleagues have been up to over the last few years.

One day, the term DevOps will no longer describe developers and non-developers, but two different types of developer, one focused on business logic and one on architectural underpinnings.