recently, Roy Marples added "privilege separation" to his DHCP client daemon software.
this basically means that the software runs as multiple processes which communicate with each other. lots of regular operations can be performed in underprivileged subprocesses that aren't allowed to do much, which greatly minimizes the impact of exploits.
he wrote a couple of blog posts comparing OS-specific techniques for restricting processes:
... and then followed up with a third after I poked him on IRC
and pointed out that setrlimit
is a thing:
perhaps you write super awesome code in languages that give you a nice sense of safety and a remote execution bug is totally out of the question.
maybe you're a super devops and containerize everything so you're totally safe, at least until the latest container escape bug comes out.
even then, it's nice to clearly state what your program is allowed to do, it gives you constraints to work with, and makes bugs really obvious.
perhaps the easiest way to do sandboxing is with separate user accounts. if your process changes from root to an unprivileged user shortly after starting, it greatly reduces opportunity for abuse.
most sensitive data on most systems is stored in files with fine-grained permissions, so not being root really helps. in most cases root is also allowed to do nasty system-wide things that ordinary users aren't.
POSIX has a setrlimit
function. combined with flags like
RLIMIT_NOFILE
, you can do things like prevent your
process from opening any more files, or spawning any
new subprocesses. this is really nice until you run into
system-specific quirks as Roy discovered in his last
blog post.
chroot
is very useful for application authors for
one particular purpose: changing the root filesystem for your
process to an empty directory is a surefire promise that
your process will not open any new files from this point forwards.
it is often claimed that chroot
is not a security mechanism.
indeed, many UNIX vendors have refused to patch potential chroot
escapes.
however, it has clearly been used as one for a long time.
chroot
also requires root, so you have to do it before
dropping to your underprivileged user.
at this point i can point to extremesandbox.c, a classic example of these techniques.
essentially you build a list of low level system calls you expect your process to use into the binary, then pass this to the kernel in some way, and then it enforces this usage.
this seems to be very commonly deployed these days, thanks to
Linux's seccomp-bpf
. the BSDs previously had something
similar in systrace
, but it very much went out of fashion
as bugs were found.
system call allow lists are system-specific by definition. if you use libraries that abstract the OS away, you can probably make a reasonable guess at what it currently does, but not necessarily what it might do in the future.
system call restrictions have to face a fundamental problem with how software development works: most of the time we do not use system calls, we use nice friendly libraries that wrap those system calls. the precise system calls the library uses are what we in the trade call an "implementation detail".
the standard model to talk to a UNIX system is to do it through libc. if you're writing a programming language, it's probably safer to bind to libc than to use syscalls directly, since they have not traditionally been seen as a stable interface. (note: in NetBSD even using libc involves abstractions, functions are versioned to avoid ABI breakage, and this is hidden from the programmer).
this is the approach OpenBSD took with their pledge
sandboxing
mechanism.
my primary problem with this is that the categories they chose to let
you allow seem both too broad and too tied to the C programming language:
do i really want to allow stdio
?
as a programmer, you're probably far more aware of what resources
your program requires than what system calls it might happen to
use. this is why i like setrlimit
- it's much easier to understand
how many files a program might open.
it also happens that most of the harm you can do as a naughty exploiter, if you happen to take over a process, involves using resources: maybe you want to read some private data from a file and send it over a network socket, that involves opening several new resources.
i'd already spent a while thinking about this before i learned about Solaris privileges
the setppriv
model provides a nice abstraction where you have
to think about the resources your code is using, but not necessary
the system calls (or indeed areas of the C library) it wants to use.
i think it's very interesting, and i think it's a shame that like many innovative features in OS development it's been slightly forgotten.