Mark Pollmann's blog

Understanding the Kubernetes architecture

· Mark Pollmann

Introduction

One cool thing about Kubernetes is that the infrastructure is transparently handled by Kubernetes objects themselves. If you want to see how everything works together, you can just inspect the pods, services and other objects themselves.

Disclaimer: This setup is valid for Kubernetes in Docker Desktop on Mac. In a managed environment, or in your setup, the setup might be different and components are not necessarily deployed as simple pods.

Let’s have a first look. I’m using the built-in Kubernetes for Docker Desktop on a Mac here (and use k as an alias to kubectl)

$ k get all
NAME                 TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
service/kubernetes   ClusterIP   10.96.0.1    <none>        443/TCP   39d

Well, not a lot going on here. Is that all there is?

Of course not, the real meat is in other namespaces. Anyway, let’s first inspect this service we found:

$ k describe service/kubernetes
Name:              kubernetes
Namespace:         default
Labels:            component=apiserver
                   provider=kubernetes
Annotations:       <none>
Selector:          <none>
Type:              ClusterIP
IP:                10.96.0.1
Port:              https  443/TCP
TargetPort:        6443/TCP
Endpoints:         192.168.65.3:6443
Session Affinity:  None
Events:            <none>

What is it doing? It’s actually just forwarding requests to the Kubernetes API server (which we will meet soon).

Okay, but where is the rest?

Let’s check out the other namespaces.

$ k get namespaces
NAME              STATUS   AGE
default           Active   39d
kube-node-lease   Active   39d
kube-public       Active   39d
kube-system       Active   39d

The default namespace we already know.

kube-node-lease is relatively new (starting from version 1.14) and helps in determining the availability of a node via heartbeats. We’ll ignore it for now.

kube-public is a namespace that exists if the cluster was created with kudeadm. It contains a single, lonely configmap called config-info and aids in discovery for other clients. Read more about the discovery API if you are interested.

kube-system is the most interesting namespace for us. Let’s dive in:

The kube-system namespace

Let’s do a quick lookup:

$ k get all -n kube-system
NAME                                         READY   STATUS    RESTARTS   AGE
pod/coredns-f9fd979d6-dmkxf                  1/1     Running   0          4m57s
pod/coredns-f9fd979d6-vrd8t                  1/1     Running   0          4m57s
pod/etcd-docker-desktop                      1/1     Running   0          3m43s
pod/kube-apiserver-docker-desktop            1/1     Running   0          4m7s
pod/kube-controller-manager-docker-desktop   1/1     Running   0          3m45s
pod/kube-proxy-s4jfp                         1/1     Running   0          4m57s
pod/kube-scheduler-docker-desktop            1/1     Running   0          3m51s
pod/storage-provisioner                      1/1     Running   0          3m43s
pod/vpnkit-controller                        1/1     Running   0          3m42s

NAME               TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                  AGE
service/kube-dns   ClusterIP   10.96.0.10   <none>        53/UDP,53/TCP,9153/TCP   5m4s

NAME                        DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE
daemonset.apps/kube-proxy   1         1         1       1            1           kubernetes.io/os=linux   5m4s

NAME                      READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/coredns   2/2     2            2           5m4s

NAME                                DESIRED   CURRENT   READY   AGE
replicaset.apps/coredns-f9fd979d6   2         2         2       4m57s

By the way: kubectl get all does not really get all resource types, only the most important. Check out ketall if you want to get everything.

A lot more going on here! We can see see a deployment (plus replicaset and its pods) of coredns and a service kube-dns, which, unsurprisingly, handle DNS in the cluster. I will skip this for now and write a separate post about DNS in Kubernetes.

So, what is left? I count 7 pods, let’s go through them one by one.

VPN Controller

pod/vpnkit-controller is there to interact with my host’s VPN configuration. Not very interesting for us right now.

etcd

etcd is a key-value store that stores information concerning the cluster, like pods, nodes, roles, configs, secrets. Think of it as its database. Every piece of information you get via kubectl get comes from etcd and every update you make goes into etcd. As Kubernetes is a distributed system the database needs to be distributed, too, and etcd is the perfect solution for it.

API server

kube-apiserver is quite self-explanatory. This is the endpoint with which other components interact with REST operations if they want to lookup or change state. If you run kubectl you are interacting with the apiserver.

It’s also the only component that interacts with etcd.

Controller Manager

kube-controller-manager. Controllers (e.g. Node controllers, deployment controllers, namespace controllers) supervise the state of the components they are in charge of. Node controllers, for example, check the health of nodes every five seconds to see if they are still reachable. If they don’t respond after 40 seconds it is marked unreachable. If it doesn’t come up again after 5 minutes it evicts the pods scheduled on it to other nodes.

All these controllers run in the process of the controller-manager.

Kube Proxy

kube-proxy runs on every node and handles service networking. As a service is not a real object but just lives in memory, somebody needs to handle the routing of services and kube-proxy does just this. One way it does this is with iptable rules. On every node it creates entries for every service and forwards traffic to the IP of the actual pod selecting the service.

Kube Scheduler

kube-scheduler decides which pod gets scheduled on which node. Any pod might have specific resource requirements to run.

In the first phase the scheduler filters out the nodes which can’t satisfy the resource requirements or are tainted or don’t tolerate this specific pod.

Then it ranks the remaining nodes by a priority function. If the pod is CPU-intensive the node with more free CPU capacity gets a higher rank and so one. The one with the best ranking wins the pod.

Storage Provisioner

storage-provisioner handles storage specific to the platform being used. In my case this pod is using the docker/desktop-storage-provisioner image specific for Mac’s Docker Desktop.

By the way, you can see more of the configuration for some of these by looking at the config maps in the kube-system namespace

$ k get cm -n kube-system
NAME                                 DATA   AGE
coredns                              1      6m5s
extension-apiserver-authentication   6      6m8s
kube-proxy                           2      6m5s
kubeadm-config                       2      6m6s
kubelet-config-1.19                  1      6m6s

Conclusion

That’s it, a quick primer on what is going on under the hood of a Kubernetes setup. While Kubernetes does a ton of things and in the end is quite complex, the infrastructure is very modular and understandable. It’s possible to do a deep dive into any component you are interested it, if you so desire.