Introduction 介绍
Kubernetes, and specifically the kubelet, let you load pod specifications from a directory on disk. Improper use of this functionality can lead to some strange things happening. This post will explore some of those edge cases.
Kubernetes,特别是 kubelet,允许你从磁盘上的目录加载 Pod 的规范。不当使用此功能可能会导致一些奇怪的事情发生。本文将探讨一些边缘情况。
In a typical Kubernetes cluster deployment, the Kubelet on each node reaches out to the API Server to find out what pods should be running on its node. This is core to maintaining the correct cluster status in the event of a node outage. However, some deployments of Kubernetes run the API Server and other control plane components as pods themselves. To allow this bootstrapping when the Kubelet has no API Server to communicate with, the Kubelet can be configured with a location from which to read static pod configurations.
在典型的 Kubernetes 集群部署中,每个节点上的 Kubelet 都会联系 API 服务器,以确定哪些 Pod 应该在其节点上运行。这对于在节点中断时维持正确的集群状态至关重要。然而,某些 Kubernetes 部署会将 API 服务器和其他控制平面组件本身作为 Pod 运行。为了在 Kubelet 没有 API 服务器进行通信时也能进行这种引导,可以为 Kubelet 配置一个读取静态 Pod 配置的位置。
This location can be configured as either a web location or a local directory, with the latter being the most common. In a kubeadm
setup, the default directory is /etc/kubernetes/manifests
. Inspecting this directory on a clean KinD
cluster shows a number of manifests, generated to allow essential control plane components to start.
此位置可以配置为 Web 位置或本地目录,其中后者最为常见。在 kubeadm
设置中,默认目录为 /etc/kubernetes/manifests
。在干净的 KinD
集群上检查此目录会显示一些清单,这些清单是为了启动必要的控制平面组件而生成的。
|
|
These static pods correspond to objects we can observe through the Kubernetes API Server. These are mirror pods, which are a specific instance of a pod type created to track statically created pods in the API Server. Each pod is named with the value specified in the yaml file, suffixed with the name of the node the pod was created on. Note 4 pods in the output below which end with kind-control-plane
and correspond to the manifests above.
这些静态 Pod 对应于我们可以通过 Kubernetes API 服务器观察到的对象。它们被称为镜像 Pod ,是特定 Pod 类型的实例,用于跟踪 API 服务器中静态创建的 Pod。每个 Pod 都使用 yaml 文件中指定的值命名,并以创建 Pod 的节点名称作为后缀。请注意,下方输出中有 4 个 Pod,它们以 kind-control-plane
结尾,与上面的清单相对应。
|
|
The Kubernetes documentation on static pods specifies a good amount of the behaviour and the constraints which are placed on static pods. Some of these have significant security benefits, like not allowing static pods to use configmaps, secrets, or service accounts. It is possible to mount host volumes in to the container, potentially allowing a user to escalate permissions on the node if they have write access to the directory1.
Kubernetes 文档中关于静态 Pod 的说明详细说明了静态 Pod 的诸多行为和约束。其中一些约束具有显著的安全优势,例如不允许静态 Pod 使用 configmap、secret 或服务账户。此外,还可以将主机卷挂载到容器中,如果用户对目录 1 具有写权限,则可能允许其提升节点的权限。
This seems like some fairly complex functionality, and as with anything complicated, there are edge cases which lead to odd behaviours. We’ll dive into one of these below.
这看起来是一个相当复杂的功能,而且和任何复杂的事情一样,也存在一些会导致奇怪行为的极端情况。我们将在下面深入探讨其中一种情况。
Admission Control 准入控制
In a modern cluster, the kubelet is always permitted to create a pod through Node Authorization. Created pods aren’t restricted to specific namespaces and, until recently, I’d never tried to create a pod in a namespace restricted by Pod Security Admission.
在现代集群中,kubelet 始终被允许通过节点授权创建 Pod。创建的 Pod 并不受特定命名空间的限制,而且直到最近,我还从未尝试在受 Pod 安全准入 (Pod Security Admission ) 限制的命名空间中创建 Pod。
If you’re reading this, take a second to think about what will happen. I posed the question “Do the kubelet’s static pods get affected by admission control” to some colleagues, and the response was heavily weighted towards “No”. This makes sense, and was my initial suspicion too. The whole point of a static manifest is that it allows the kubelet to start workloads without a control plane to communicate with, so why should admission control get involved when that is a component which runs in the control plane itself?
如果你正在读这篇文章,请花点时间想想会发生什么。我向一些同事提出了“kubelet 的静态 pod 会受到准入控制的影响吗?”这个问题,他们的回答大多是“不会”。这很有道理,也是我最初的怀疑。静态清单的意义在于它允许 kubelet 无需与控制平面通信即可启动工作负载,那么既然准入控制本身就是一个在控制平面中运行的组件,为什么还要介入呢?
So let’s try it out by creating two static pods. One is a standard pod with an empty security context, and one sets the privileged
flag.
让我们尝试创建两个静态 Pod。一个是标准 Pod,安全上下文为空,另一个设置了 privileged
标志。
|
|
We can then check if the pods are running:
|
|
Only one pod is running, so that answers that question, right? The privileged pod wasn’t created. Let’s check the kubelet’s logs to be sure.
|
|
This certainly looks like the container wasn’t created, as there was no pod startup, but something still doesn’t seem right. Let’s confirm there’s definitely no container running on the node:
|
|
Well, that’s interesting. The pod wasn’t able to register on the API Server, but the container itself is running. We can confirm this again by checking the pod ID rather than the container ID:
|
|
Yep, definitely running. We can confirm this another way by using the kubelet API (accessed, in this case, using the wonderful Kubeletctl):
|
|
Non-existent Namespaces
Okay, we’ve confirmed that running a pod that can’t get past admission control will run as a static node on the pod, but not as a mirror pod in the Kubernetes API Server. How about other attempts to create a local container that won’t be recognised by the API Server?
When we try to create a pod in a namespace that doesn’t exist, two errors are returned2. One is from kubelet.go
, and one is from event.go
, both detailing that the API Server rejected the request for the pod to be created. However, as with the admission control example, we can see that the pod was successfully started.
|
|
As with the previous example, we can execute commands inside the running container through Kubeletctl or by talking directly to the kubelet with curl, but not through kubectl exec
.
Uses
It’s probably fair to say that creating pods which can’t be viewed by cluster administrators through kubectl is not a useful behaviour in day-to-day running of a cluster. I can’t think of any legitimate uses of this behaviour, but I do think it would be useful to someone trying to hide a running workload on a Kubernetes cluster they’ve compromised. Not that I’d talk about that or anything.