The Pipeline Driven Organization – Enabling True Continuous Delivery
What does it take to work at the speed and agility of a company like Netflix, Amazon and Google, deploying changes to production code many times a day?
Why do so many medium and large companies fail at it?
What does it take to enable true continuous delivery?
RELATED VENDOR CONTENT
9 Myths of Software Requirements Gathering
Database DevOps 6 Tips
Application Performance Monitoring (APM) using the InfluxDB Time Series Platform
The Only Good Reason to Adopt Microservices
Feature Flag Best Practices (By O’Reilly) – Download the eBook
Split helps you rapidly and safely deliver software with feature flags and feature analytics.
The difference is, of course, as with all software problems, people; more specifically, the way people work with the pipeline. The expectations we set for the pipeline will determine how much it enables true continuous delivery.
A pipeline driven culture requires both pipelines and culture, so let’s start with the pipeline.
A pipeline is a set of one or more automated tasks or “jobs” that run in a certain order to produce a result. This result is usually made up of two things:
A judgement result: the pipeline passed, failed or gave an inconclusive result (always use “failed”/”passed”, never “inconclusive”. I’ll cover why later in this book).
Any related artifacts: deployed application, binaries, log files or anything else we care about.
At the heart of it, a task is a command line (shell) script or program that runs automatically on some machine (locally or in the cloud), executes some tasks (sometimes many tasks that are related to each other, like compiling, running tests etc), and returns a special command line “exit code”.
It is common to have an exit code of zero at the command line, which signifies success. Any non-zero exit code signifies non-success. The program or script decides what the exit code will be.
Tools like Jenkins (colloquially called “Continuous Integration Servers” or “CI/CD Servers” for “Continuous Delivery”) act as an abstraction layer on top of these tasks. CI Servers allow us to easily run these tasks (based special code commit triggers or schedules) and listen to the task’s exit codes, as well as its outputs (logs).
They would then parse out the logs, and give us a nice UI showing the results, the error logs, and any artifacts.
CI/CD Servers allow us to manage our tasks in a single location and monitor their running progress, run tasks in parallel and run them on different machines (sometimes called ‘build agents’).
“Pipeline driven” means we want to rely more and more on pipelines to make technical decisions (judgements) related to the code and its associated artifacts, and then have the pipeline immediately act based on those decisions as autonomously as possible.
In software development, I can recognize two different types of “decisions” people usually have to make:
Strategic judgments, such as:
“Should we build this product?”
“Should we focus on this feature?”
Tactical judgments, such as:
“Should we merge this branch?”
“Should we deploy to this environment?”
“Should we spin up/destroy an environment?”
“Is the feature working?
“Is the app secure?”
“Did we create new bugs?”
Pipelines are Tactical
I believe pipelines are great for tactical day-to-day judgements and actions, and people are better at the strategic ones. If someone found a way to let a pipeline decide what their next product should be without a human signing off on it – I’d love to see that. But I believe that isn’t something you can code a decision algorithm into.
Tactical stuff is usually much better suited for a machine to check: doing things in a repetitive manner without skipping steps, verifying a long series of checklists, and collecting and analyzing data to get specific numbers.
These decisions are related to things like:
The health and functionality of our code
…and other matters related to the software development lifecycle.
By letting pipelines make these judgements instead of people (as we usually do in traditional agile and waterfall organizations), we can gain several benefits:
Much faster decisions/judgements being made
More reliable repetitive decisions
Enable true continuous delivery
There are two main reasons why having a pipeline make a decision instead of a human is faster:
Computers compute faster than humans based on specific facts
Computers do not feel stress or anxiety or a sense of impending doom when they are about to make a decision about whether to merge a branch or spin up an environment or deploy an application
More Reliable, Repetitive Judgements
There are two main reasons why having a pipeline make decisions is more reliable for repetitive things:
Computers are great at going over a long list of things and verifying each one, without ever forgetting a single item, or checking things differently every time
Computers don’t feel boredom, annoyance, or the urge to choke someone because a specific check has been failing for the last ten times, or wonder why no one is doing anything about it, or think, “Screw this, let’s go spend some time with the kids instead of going through this again.”
Reducing Stress & Anxiety
It’s really unnerving to go to a company that has a long and arduous manual testing process, to then see that one special meeting take place before a release in which the leadership team looks at the head of the test team, and asks them, “Is the release good to go?”
Watching the test lead is always a lesson in human psychology. How would you feel if you had to sign off on pushing a major release into production – when you were the last line of defence between the world and everyone’s code?
Enabling True Continuous Delivery
Finally, we get to the heart of it for me.
Continuous Delivery is a software engineering approach in which teams produce software in short cycles, ensuring that the software can be reliably released at any time.
I believe that the key to succeeding with continuous delivery is to remove human bottlenecks from the chain of tactical decision making, and enable pipelines to work almost autonomously in deciding and pushing code around, all the way to production, without human fear and doubt getting in the way of receiving fast feedback about the way our code behaves.
Teaching a Pipeline to Drive
In order to be able to trust a pipeline enough so that we can rely on its decisions, we need to start teaching our software pipelines to make those tactical judgements without needing humans in the process. Every tactical decision we can teach our pipelines to do is once less human bottleneck on the road to true continuous delivery.
By teaching, I mean we add automated steps, scripts and quality gates into the pipeline, that would make the various steps fail or pass based on various results or metrics we care about. These can come in several flavors:
Fail/pass based on the result of automated tests
Fail/pass based on a specific metric being measured (such code coverage drastically declining)
Fail/pass based on the results of sub or parallel pipelines
Fail/pass based on log files
(BETA readers – if you have more ideas – email [roy at osherove.com])
Places like Netflix, Google and Amazon can be described as being pipeline driven; they let the pipelines in their teams make the day-to-day, tactical judgments, and execute their related actions automatically, instead of people making judgements (see Large-Scale Continuous Delivery at Netflix and Waze Using Spinnaker (Cloud Next ’18)).
They let go of the notion that a person holds the reins on the tactical minute-by-minute, hour-by-hour judgment of the work performed, and let the pipeline drive the progress. The people then become the ones who teach the pipelines how to judge and make tactical decisions automatically, thus removing the human factor as a bottleneck from many of the tactical processes.
How Pipeline Driven Flows Change the Dynamic of Day-to-Day Work
Traditionally (non pipeline driven), the pipeline “asks” or “notifies” people if everything is OK (tests passed, compilation passed). Then people have to hit a button to trigger a separate pipeline or job based on their judgement of whether that next step is OK to run.
Image: Traditional agile pipelines are separated by knowledge silos and manual human judgment to decide whether to trigger the next pipeline in the SDLC process.
In a pipeline driven flow, people trust the pipeline enough so that if everything is OK, the next step is triggered automatically. The assumption is that the pipeline knows how to properly judge that something is OK or not, to the point of simply continuing to the next action without waiting for a human to trigger the next pipeline or job.
In the figure above, you can see that the various knowledge silos in the organization are responsible for embedding decision logic inside the automated pipeline. The decisions are executed automatically as part of the pipeline execution, and there is no waiting for humans on the “outside” to approve or continue on to the next action.
New Skills for a New World
If we take a look at the four main silos we usually encounter when trying to adopt an agile process, we can see in this image that each one has a valuable skill that is relevant for the other silos to learn.
Dev skills: pipelines are pieces of code that run automatically, so a main pipeline-related skill is learning basic coding techniques. If you can’t code anything, be it simple scripts of full automation infrastructure, you can’t contribute to the pipeline or understand how it does what it does and why it fails.
Testing skills: a pipeline- all pipelines, really- is just one big test. It’s either red or green. It’s made up of “steps” – each one is a form of test in and of itself. If it fails, the pipeline fails (or it should usually). Contributing to a pipeline means adding steps or tests into it, and those tests/steps are written in code. So not only do you need to know how to code, you need to know how to write tests that can run in a pipeline. You need to have a testing mentality – where instead of manually checking something, you are able to envision and help create an automated test for it.
Ops skills: pipelines in medium and large companies are traditionally in the realm of the operations and dev folks. But if we expect people to contribute tests into pipelines, they need to know how pipelines work at a basic level, how to understand their outputs, how to configure them (adding and removing steps to them), how to access them, and how to run them.
Security skills: Security is a set of skills and a mindset that touches all aspects of the code, infrastructure, configuration and deployment. This means security folks that traditionally live in their own silo will have to share their security knowledge, because, in this new brave world, everyone has more power, and thus more responsibility. We allow everyone to touch the pipeline and write code that runs in it. This means teaching them how to think about security with everything they touch.
We can add more skills to this list, but for the sake of keeping things at a reasonable length, I’ll stop here. I will write more extensively on each of these on my blog.
Image: for each knowledge silo, I’ve highlighted (in colored dots) what set of related skills from other silos they will need to improve on going forward.
Summary: What’s in it for You?
Like it or not, pipelines are going to become more and more common, and the need to become pipeline driven will increase in order to become more and more competitive.
That means that knowledge sharing, ability to automate, ability to test and ability to write code will increase your chances of being successful on your next job, and akc of those skills will reduce your likelihood of playing well in such a world.
Many organizations try to implement continuous integration or continuous delivery, but they get stuck in the process; there are too many human bottlenecks standing between the pipelines, deciding whether we can move to the next step in the software process (deployment, testing, etc..).
By teaching pipelines to make better decisions and offloading human judgments onto the pipelines, we can have the pipelines make decisions all the way up to production, thus allowing them (if they run continuously) to create a true continuous delivery mechanism.
If a pipeline fails – no deployment. If it passes – we deploy. No humans involved.
That is the vision that today, only a handful of companies fully embrace; places like Amazon, Netflix, Google and a few others. And I’d argue that’s part of the magic that got them into that scale and speed; they removed humans from the tactical decision process and instead taught them how to teach the pipelines to do their job.