code sandboxing techniques

background (dhcpcd)

recently, Roy Marples added "privilege separation" to his DHCP client daemon software.

this basically means that the software runs as multiple processes which communicate with each other. lots of regular operations can be performed in underprivileged subprocesses that aren't allowed to do much, which greatly minimizes the impact of exploits.

he wrote a couple of blog posts comparing OS-specific techniques for restricting processes:

... and then followed up with a third after I poked him on IRC and pointed out that setrlimit is a thing:

why sandbox?

perhaps you write super awesome code in languages that give you a nice sense of safety and a remote execution bug is totally out of the question.

maybe you're a super devops and containerize everything so you're totally safe, at least until the latest container escape bug comes out.

even then, it's nice to clearly state what your program is allowed to do, it gives you constraints to work with, and makes bugs really obvious.


user account restrictions

perhaps the easiest way to do sandboxing is with separate user accounts. if your process changes from root to an unprivileged user shortly after starting, it greatly reduces opportunity for abuse.

most sensitive data on most systems is stored in files with fine-grained permissions, so not being root really helps. in most cases root is also allowed to do nasty system-wide things that ordinary users aren't.

filesystem restrictions

POSIX has a setrlimit function. combined with flags like RLIMIT_NOFILE, you can do things like prevent your process from opening any more files, or spawning any new subprocesses. this is really nice until you run into system-specific quirks as Roy discovered in his last blog post.

chroot is very useful for application authors for one particular purpose: changing the root filesystem for your process to an empty directory is a surefire promise that your process will not open any new files from this point forwards.

it is often claimed that chroot is not a security mechanism.

indeed, many UNIX vendors have refused to patch potential chroot escapes.

however, it has clearly been used as one for a long time.

chroot also requires root, so you have to do it before dropping to your underprivileged user.

at this point i can point to extremesandbox.c, a classic example of these techniques.

system call allowlists

essentially you build a list of low level system calls you expect your process to use into the binary, then pass this to the kernel in some way, and then it enforces this usage.

this seems to be very commonly deployed these days, thanks to Linux's seccomp-bpf. the BSDs previously had something similar in systrace, but it very much went out of fashion as bugs were found.

system call allow lists are system-specific by definition. if you use libraries that abstract the OS away, you can probably make a reasonable guess at what it currently does, but not necessarily what it might do in the future.

problem: i don't care what system calls i'm using

system call restrictions have to face a fundamental problem with how software development works: most of the time we do not use system calls, we use nice friendly libraries that wrap those system calls. the precise system calls the library uses are what we in the trade call an "implementation detail".

the standard model to talk to a UNIX system is to do it through libc. if you're writing a programming language, it's probably safer to bind to libc than to use syscalls directly, since they have not traditionally been seen as a stable interface. (note: in NetBSD even using libc involves abstractions, functions are versioned to avoid ABI breakage, and this is hidden from the programmer).

solution: abstract the system calls

this is the approach OpenBSD took with their pledge sandboxing mechanism.

my primary problem with this is that the categories they chose to let you allow seem both too broad and too tied to the C programming language: do i really want to allow stdio?

solution: think about resources, not system calls

as a programmer, you're probably far more aware of what resources your program requires than what system calls it might happen to use. this is why i like setrlimit - it's much easier to understand how many files a program might open.

it also happens that most of the harm you can do as a naughty exploiter, if you happen to take over a process, involves using resources: maybe you want to read some private data from a file and send it over a network socket, that involves opening several new resources.

Solaris privileges

i'd already spent a while thinking about this before i learned about Solaris privileges

the setppriv model provides a nice abstraction where you have to think about the resources your code is using, but not necessary the system calls (or indeed areas of the C library) it wants to use.

i think it's very interesting, and i think it's a shame that like many innovative features in OS development it's been slightly forgotten.