Containers 102: Kubernetes and the Basics of Container Orchestration
To know containers is to know Kubernetes, the gold standard for container orchestration and deployment. Here’s an explainer for how Kubernetes works and what makes it tick.
In the first installment in this series, you learned about why and how containers became a new paradigm in IT infrastructure.
When containers first emerged, there was a quick race to develop a container management system to provide functionality like networking, high availability and — important for database workloads — persisted storage. The early entrants into this space included Docker Swarm and Apache Mesos, but it was Google’s Kubernetes that has become the clear leader in the space in terms of adoption.
Kubernetes started out as a system called Borg that was used internally at Google to provide large-scale cluster management. Borg has since evolved to become an open source project with broad inputs from a large number of major technology vendors contributing code to the project.
How does Kubernetes work? I am always hesitant to compare Kubernetes to VMware’s ESX product, but at a very high level, container orchestration is similar to virtual machine (VM) management. Much like a VMware environment, we have host servers, which can be either physical machines or VMs. There are also what’s known as a master node and worker nodes.
The master node contains an etcd cluster (which is a distributed key-value store) that stores data about the cluster itself, API objects and service discovery details. This node also contains the API server that receives the REST calls for modifications and new deployments to the cluster. The API server is the only component that communicates with the etcd cluster for security purposes. The master node also hosts a scheduler that tells each pod (i.e., a set of containers) where to run based on current host activity. Finally, there are a couple of controller processes that serve to maintain the desired state of the workloads on the cluster based on the deployment manifests that exist on the cluster.
Each of the worker nodes in the cluster contains two main services. The kubelet regularly takes in new or modified pod specifications and monitors the node pod health. There is also a kube proxy service that manages individual host subnetting and allows for services to be exposed to the outside world.
By now, you have seen me use the term “pod” several times. Pods are the basic unit of deployment in Kubernetes and are composed of one or many containers. You would typically deploy containers of the same tier (think Web container) in a single pod. It is not recommended to mix workloads — for example, putting a database container and a Web container in the same pod. Pods are volatile and can be killed if they have runaway processes, which means that there are also services. Services represent a logical set of pods and enable connectivity for the pods by acting as a gateway and providing an IP address.
While Web and app workloads may be able to function without persisted storage, database workloads (among others) need their storage to be persisted. Since containers are ephemeral, Kubernetes connects volumes to container definitions, allowing those volumes to be defined and persisted. Think about an application like SQL Server. These function similarly to the shared disk in a SQL Server Failover Cluster instance.
Kubernetes has a concept called “namespaces,” which is very similar in concept to the resource groups in Microsoft Azure. The namespace acts as a virtual cluster and can also act as a security boundary. The difference between this and a resource group is that Kubernetes resources in a namespace cannot talk to resources in another namespace. A namespace can also be given a quota, so it can be used to limit resource consumption.
apiVersion: apps/v1 kind: Deployment metadata: name: nginx-deployment labels: app: nginx spec: replicas: 3 selector: matchLabels: app: nginx template: metadata: labels: app: nginx spec: containers: - name: nginx image: nginx:1.7.9 ports: - containerPort: 80
This is a simple manifest that deploys an Nginx container. The metadata section allows for labeling of the container and can be appended with more information to better describe what the workloads are running. In the spec session of this manifest, there are three replicas defined. This means there can be up to three containers in this pod. Also, the container image that is defined is Nginx version 1.7.9 and using port 80.
This notion of storing infrastructure as code will frighten and excite many experienced system administrators. While getting used to Git repositories can be challenging, the benefit is that infrastructure changes are easily deployed into the cluster, and with a consistent set of deployment mechanisms.
The other major benefit is that the Kubernetes deployment is highly available at both the infrastructure and application level. Concepts like stateful sets and replica sets ensure that containers are always running. The master node also supports high availability by using multinode replication for the etcd database.
Getting started with Kubernetes is quick and easy. If you are running Docker Desktop, you can use the Kubernetes option there. Alternatively, and if you want to better understand high availability, you can use the Kubernetes as a Service offerings from either Azure or Amazon Web Services (AWS), which give you a built-out cluster within minutes. Finally, if you have a Linux machine or VM, you can use Minikube, which simulates a Kubernetes cluster on a single node.
Kubernetes is broad topic. There’s a lot to learn, but it is easy to get started learning the fundamental concepts of the technology, even on your own machine. Kubernetes has become a standard for container orchestration and deployment across both enterprises and startups, making it a key career skill going forward. As more vendor products like SQL Server adopt it as a deployment framework, that adoption will only grow. So go get started with your cluster!