Continuous Delivery with Kubernetes the Hard Way
Source – infoq.com
This post will discuss how to build up an architecture for continuous delivery from simple to complicated. At each point we’ll see the limitations of the simpler approaches, and justify adding complexity and/or refactoring the architecture.
The ultimate approach described here is the approach that the Weaveworks team has found works best for them.
Continuous delivery is when software is produced in a way that allows it to be released little and often, rather than in big chunks.
Why is continuous delivery so important? Deploying releases to an application continuously eliminates a “release day” mindset. The theory here is that the more frequently releases are done, the less risky releases are. Developers can also make changes to the application (whether implementing new code or rolling back to an older version) as code is ready. This means that the business can make changes more quickly, and therefore be more competitive.
Since microservices use APIs to speak to one another, a certain level of backwards compatibility needs to be guaranteed between releases to avoid having to synchronize releases between teams.
According to Conway’s law, software takes on the structure of the organization it belongs to. Thus, microservices and containers are just as much about organizational changes as they are about technological changes. When splitting a monolithic application into microservices, each microservice can be delivered by a separate team.
In this article, continuous delivery will be implemented with Kubernetes.
A brief overview of Kubernetes
Kubernetes is a container orchestrator that manages containerized applications. As stated earlier, there is no right way to implement continuous delivery with Kubernetes. Though crucial for release and automation, Kubernetes does not offer a single resolution to this problem.
Kubernetes uses pods, which are the smallest units that can be created and managed in the platform. As visible from this diagram, there is a Docker container in a pod. A Docker container image contains the application code in an isolated environment.
Pods are a set of containers that are co-scheduled on the same machine. These pods share a network namespace. One container can talk to a localhost and find another container in the same pod on whatever port it’s bound to.
Pods are mortal, as Google’s Tim Hockin puts it. What he means by this is that if the machine in the cloud disappears, for example because of underlying hardware failure, then the pods disappear. To cope with the fact pods are mortal, don’t put an important service in a pod and then leave it, hoping the machine doesn’t disappear. Machines disappear all the time, especially in the cloud.
Instead, a pod is wrapped up in a deployment. A deployment dictates how many pods there should be. For example, a deployment can define that there should be three instances of this pod and that they should be kept running. If a machine goes down, the deployment will put those three pods on a new machine, starting them up elsewhere, to keep them running.
The Kubernetes cluster is where the application will actually run.
Version 1 Architecture
In the first (and simplest) deployment, the version controlled code connects the CI system to the Docker registry. Then, the CI system manually deploys the latest image to the Kubernetes cluster.
kubectl apply -f service.yaml
While it’s fine to perform manual deployments initially, future updates to a deployment should be automated in the CI system. The CI system will update Kubernetes by updating the image tag and pushing this change to the Kubernetes API. This results in Kubernetes pulling the Docker image that’s referred to by the tag on the image from the Docker registry and deploying it.
Committing a change
In this architecture, if a developer makes a change via git push in the version control code, the CI system is automatically going to do a Docker build. The CI system is going to tag that Docker image with the SHA-1 hash corresponding to the SHA-1 hash of the code that was pushed, giving it a unique name. Then, the CI system will push that to the Docker registry.
Kubectl set image
To commit this change, the CI system will run a program called “kubectl set image”. The kubectl set image takes an already-running API object and tells Kubernetes to update the CI system with a specific, new tag. For example, a user might use kubectl set image to change the current image of a front-end service.
Kubernetes will then automatically pull down the new Docker image and replace it. Additionally, users can do rolling upgrades, a feature that’s built into Kubernetes.
Any time a new change is pushed to the master branch, the change in turn is pushed to the production environment.
To rollback a change, another code change needs to be made. From the master, the developer will revert the latest commit. If there are merge commits, then it obviously gets a bit messy. Ideally, users will reset to the version before the latest merge and then force push the new change.
After pushing the rollback, the old version gets rebuilt, and the CI system churns away rebuilding that new image. It also pushes that new image, which is a new copy of the old image, to the Docker registry.
There are a few weaknesses to this architectural approach. First, building and pushing containers can be slow. This is dependent on what is inside the container, but regardless it uses up disk I/O and network bandwidth. This is a problem for doing rollbacks, as end users need rollbacks to be fast in order to fix the issues immediately.
This first architecture has tight coupling and does not allow for different environments (e.g. dev, staging, prod) to be on different versions, which is clearly a problem for most users.
So let’s try and improve on V1!
Version 2 Architecture
Building on the initial architecture in version one, this second version of this architecture will introduce the concept of having version control configurations separate from the rest of your application repos. This allows users to have the version control config as the single source of truth (SSOT) for the entire app, meaning all of the microservices making up the application.
Instead of having a users service and an orders service that has the code for those services next to their Kubernetes YAML, it’s better to pull all of those Kubernetes YAMLs into a centralized repo, called a “config repo”.
This is important since it allows users to reconstruct their entire cluster from the version control if it gets destroyed. Now, this version controlled config repository is the only thing necessary to bring back an application if someone accidentally deletes the production cluster.
With the introduction of this new object, everything is the same as that first architecture except now the CI system is doing a bit more work.
Committing a change
So what happens with this architecture when a code changes is committed?
The CI system builds the new container image, then the CI system (operating in response from the code repo) needs to push the change to the Docker registry. The CI system clones the latest version of the config repo, making the change to the Kubernetes YAML. Next, the CI system deploys the change to the Kubernetes cluster. Finally, the Kubernetes cluster pulls down the images from the Docker registry.
As mentioned before, the CI system is now doing a lot of work. It would be better for each piece of the architecture to do one thing well, as opposed to relying on a single element to take on most of the burden.
Secondly, it is still only possible to trigger the CI system by pushing code. What would be preferable is rollbacks without pushing code. Rolling back out of band in this instance (and directly with kubectl) means the developer will have to update the central configuration repo manually. This takes away from that repo being the SSOT.
After reaching that level of complexity in the second architecture, it is valuable to add a release manager to the architecture. The release manager that the Weaveworks team uses is Weave Flux, a completely open source release manager that is also a part of Weave Cloud. But there are other release managers out there, such as Spinnaker. Weave Flux is designed to be simpler than Spinnaker and designed for containers.
Adding a release manager makes the architecture simpler again, as each element now has just one responsibility. The job of the release manager is to observe when new containers appear in the registry. It then clones the version controlled config, modifies it, and pushes it back to record that the release is happening. It then also pushes that new config to the Kubernetes cluster.
Now, the CI system is automatically building the version controlled code into a container image and pushing that container image to the Docker registry.
The release manager pulls the Kubernetes YAMLs out of the config repo and modifies them, pushing the modified versions into the cluster. Then, Kubernetes pulls down the latest version of the Docker registry.
In the image above, the release manager has a “scroll” icon representing different policies for different environments. So the policy for staging could be release all the time, the policy for production could be manually promoting releases using the buttons in the release manager’s GUI. There is no more tight coupling between individual microservices repos and what’s being released.
In this version, rollbacks are simple. The user tells the release manager to rollback to the latest version without any involvement from the CI system. The release manager now makes changes to the config.
Implementing continuous delivery with Kubernetes can be simple, or it can be hard. The more sophisticated a microservices application is, the more likely it is a complex architecture is required. While there is no wrong way to implement continuous delivery, it is important that automation be implemented, a SSOT be established, and that rollbacks can be implemented efficiently without requiring new code changes to be pushed.