Testing a Sandbox on Real VMs in CI

A sandbox that doesn’t get tested against the thing it’s defending against isn’t a sandbox. It’s a wish.

Canister relies on Linux kernel features – user namespaces, seccomp BPF, mount isolation – and on Mandatory Access Control systems like SELinux and AppArmor. These are not things you can meaningfully test in a container. Containers share the host kernel. They don’t run their own SELinux policy store. They don’t boot with AppArmor enforcement. They’re the wrong abstraction for verifying that your security tool actually works on the systems where it will run.

So we boot real virtual machines in CI. Every push spins up a Fedora VM with SELinux enforcing and an Ubuntu VM with AppArmor enabled, installs Canister, and runs the full integration suite. This post explains why we made that choice and how it works.

The problem with testing security features in containers

Canister’s can setup command installs a MAC policy – an SELinux module on Fedora/RHEL, an AppArmor profile on Ubuntu/Debian. These policies tell the kernel’s mandatory access control system how to confine sandboxed processes: what files they can access, what capabilities they’re allowed, what domain transitions are permitted.

You can compile-test these policies in a container. We do – there’s a Fedora container job that builds the SELinux .pp module with make -f /usr/share/selinux/devel/Makefile, and an Ubuntu job that parses the AppArmor profile with apparmor_parser -QTK. These catch syntax errors and policy language mistakes.

But compile-testing a security policy is like type-checking a program. It tells you the policy is well-formed. It does not tell you the policy works. For that, you need a running kernel that enforces it.

Specific things that only show up on a real system:

SELinux domain transitions. The policy declares that the canister binary should transition sandboxed processes to canister_sandboxed_t. Whether the transition actually happens depends on file contexts, the binary’s label, and the running policy version. A container with a shared host kernel won’t have the right label store.
AppArmor user namespace restrictions. Ubuntu 24.04 added kernel.apparmor_restrict_unprivileged_userns, which blocks unshare --user unless the calling binary has an AppArmor profile that explicitly allows it. This broke Canister on Ubuntu until we added the right profile rules. You cannot reproduce this in a standard CI container because the sysctl is set by the host kernel, not the container.
Module installation and removal. semodule -i and semodule -r interact with the system’s policy store. In a container, there is no policy store – you’re looking at the host’s, and you probably don’t have permission to modify it. Same story for aa-enforce and aa-disable on AppArmor.
Namespace stacking. Canister creates user namespaces, mount namespaces, PID namespaces, and network namespaces. Some CI environments restrict clone(CLONE_NEWUSER) via sysctl or seccomp. GitHub Actions runners, for example, sometimes block unprivileged user namespaces on the host – which is exactly the scenario Canister needs to handle gracefully.

The approach: QEMU/KVM inside GitHub Actions

GitHub Actions ubuntu-24.04 runners expose /dev/kvm. This means you can run QEMU with hardware-accelerated virtualization – a real VM, booting a real kernel, with its own policy store and its own MAC enforcement.

The CI pipeline has three layers:

Build job. Compiles Canister on the GHA runner, packages the binary along with integration tests and recipe files into a tarball, uploads it as a GitHub Actions artifact.
Fedora VM job. Downloads the Fedora Cloud qcow2 image, boots it with qemu-system-x86_64 -enable-kvm, injects an SSH key via cloud-init, waits for SSH, SCPs the artifact in, installs the SELinux policy, and runs the full integration suite including SELinux-specific tests.
Ubuntu VM job. Same pattern with an Ubuntu Cloud image, AppArmor enforcement, and AppArmor-specific tests.

The binary is never compiled inside the VM. Rust compilation is expensive; booting a VM and running tests is cheap. The build job takes a few minutes; the VM jobs add about 3-5 minutes each, most of which is image download and VM boot.

Why not Cirrus CI?

We initially considered Cirrus CI, which offers native VM instances. But Cirrus CI’s free tier for public repos requires GCP billing to be enabled, and their compute credits model adds operational friction. GitHub Actions is already there, the runners have KVM, and keeping everything in one CI system is simpler.

VM boot in 30 seconds

The trick is cloud images + cloud-init. Fedora and Ubuntu both publish minimal cloud images (qcow2/img) designed to boot fast with injected configuration. We generate a cloud-init seed ISO with an SSH key, attach it as a second drive, and boot. The VM is SSH-ready in 30-90 seconds.

- name: Boot Fedora VM
  run: |
    qemu-system-x86_64 \
      -enable-kvm \
      -m 4096 \
      -smp 2 \
      -nographic \
      -drive file=fedora-work.qcow2,if=virtio \
      -drive file=cloud-init/seed.iso,if=virtio,format=raw \
      -netdev user,id=net0,hostfwd=tcp::2222-:22 \
      -device virtio-net-pci,netdev=net0 \
      &> /tmp/qemu-fedora.log &

QEMU user-mode networking with port forwarding (hostfwd=tcp::2222-:22) gives SSH access without bridged networking or elevated privileges.

Caching the images

Cloud images are 400-700 MB and don’t change between runs. We cache them with actions/cache, keyed on the download URL. When a distro version changes, the URL changes, the cache key changes, and the new image gets downloaded and cached. On cache hit, the download step is skipped entirely.

- name: Cache Fedora Cloud image
  uses: actions/cache@v4
  with:
    path: ${{ env.FEDORA_IMAGE }}
    key: vm-image-fedora-${{ env.FEDORA_IMAGE_URL }}

This is important because the Fedora mirror redirector can be flaky. Hitting a cache is both faster and more reliable.

What the integration tests verify

The integration suite is a collection of bash scripts that exercise the sandbox end-to-end. Each VM job runs the full suite plus MAC-specific tests:

SELinux tests (Fedora):

can check detects SELinux as the active MAC system
can setup --force installs the SELinux policy module
The sandbox runs under SELinux confinement
can setup --remove cleanly removes the module
Re-installation works after removal

AppArmor tests (Ubuntu):

can check detects AppArmor as the active MAC system
can setup --force installs the AppArmor profile
aa-status shows the profile as enforced
The sandbox works with the profile active
Profile removal and re-installation cycle works

General integration tests (both VMs):

Basic sandbox execution (echo, exit codes, signals)
Filesystem isolation (read-only mounts, denied paths)
Network isolation (allowed/denied domains)
Seccomp filtering (normal and strict mode)
Recipe composition and auto-detection
Registry operations (can init, can update)
Port forwarding

The non-MAC tests also run on the GHA runner directly (without a VM) as a faster feedback loop. But the full picture – MAC enforcement, namespace behavior, policy lifecycle – only comes from the VM jobs.

Constraints and tradeoffs

A few decisions worth explaining:

No third-party GitHub Actions. We only use first-party actions/* (checkout, cache, upload-artifact, download-artifact). Third-party actions are a supply chain risk, and for a security tool, that matters.

No Rust compilation inside VMs. The VMs have 4 GB RAM and 2 vCPUs. Rust compilation would be slow and wasteful. The artifact-based approach means we compile once and test on multiple targets.

20-minute timeout. VM jobs have a hard timeout. If the VM doesn’t boot, SSH doesn’t come up, or tests hang, the job fails rather than burning CI minutes.

Non-interactive setup. can setup detects whether stdout is a TTY. In the VM (accessed via SSH with piped commands), it’s not a TTY, so the interactive confirmation prompt is skipped automatically. No --yes flag needed.

What we learned

Some things that only surfaced through real VM testing:

Fedora 41 images disappeared. We initially targeted Fedora 41. Between writing the CI config and running it, Fedora 42 was released and 41’s cloud images were pulled from every mirror. CI broke with 404s. We upgraded to Fedora 42. Pinning to a “current” release URL would avoid this, but Fedora’s convention is versioned paths.
semodule -l needs root on Fedora 42. Listing custom SELinux modules without sudo silently returns an incomplete list. The module is loaded and enforced, but semodule -l as a regular user doesn’t show it.
git is not pre-installed in cloud images. The registry tests shell out to git clone. Fedora Cloud and Ubuntu Cloud images don’t include git by default. The error message was "failed to run git" rather than something about cloning, which made it non-obvious.
Ubuntu 24.04 restricts unprivileged user namespaces via AppArmor. The sysctl kernel.apparmor_restrict_unprivileged_userns defaults to 1. This means unshare --user fails unless the binary has an AppArmor profile that explicitly grants userns create. We relax this sysctl in CI, then install the proper profile.
Recipes must ship with the test artifact. Canister resolves recipe names like --recipe elixir by searching ./recipes/ relative to the working directory. The test artifact initially only included the binary and test scripts. Recipe composition tests failed until we added the recipes/ directory.

Every one of these issues was invisible in container-based testing. They all showed up the first time we booted a real VM.

Is it worth it?

The VM-based integration jobs add about 5 minutes to CI. In exchange, we know – on every push – that Canister installs its security policy, enforces it, and cleans it up on both major Linux MAC systems. That the sandbox actually creates namespaces on a real kernel. That the full lifecycle works, not just the happy path.

For a security tool, I think that’s the minimum bar. If you’re telling users “this will protect you,” you’d better be testing the protection, not just the code.