I think jess is on the money here. The complexity in linux containers vs zones show up in two ways:
1) the linux kernel container primitives are implemented in ways that are more complicated. for example in zones pid separation is implemented by just checking the zone_id and if the zone_id is different then processes can't access each other. this also means that in zones pids are unique and you can't have two processes from two different zones with the same pid [with the exception: i believe they may have hacked something in to handle pid1 on linux].
similarly, in zones there is no user mapping if you are root inside the zone you are also root outside of the zone. the files you create inside a zone are uid: 0 and also uid: 0 outside the zone.
if you look at how device permission is handled in linux we have cgroups that controls what devices can be accessed and created. while in solaris zones they use the existing Role Based Access Control and device visibility. so inside a zone you can either have permission to create all devices (very bad for security) or create no devices. In zones access to devices is mediated by whatever devices the administrator has created in your zone.
in zones there is no mount namespace instead there is something that is very similar to chroot. it is just a vnode in your proc struct where you are restricted from going above. zones have mostly been implemented by just adding a zone_id to the process struct and leveraging features in solaris that already existed [i guess the big exception would be the network virtualization in solaris] while in linux there are all these complicated namespace things.
this complexity means there are probably going to be more bugs in the linux kernel implementation. however, because you don't have as much fine grain control this can also create security bugs in your zone deployment. for example i found an issue in joyent's version of docker where you could trick the global zone into creating device files in your zone and these could be used to compromise the system. under a default lxc container this would not be possible because cgroups would prevent you from accessing the device even if you could trick someone else into creating it. you also have to be careful in zones with child zones getting access to files inside the parent zone. if you ever leak a filesystem fd or hard link into the child zone from the parent zone then all bets are off because the child is able to write into the parent zone as root. (i believe this situation was covered in a zone paper where they describe the risk of a non-privileged user in the global zone collaborating with a root user in a child zone to escalate privileges on the system)
2) because all the pieces are separate in linux then something has to put it together and make sure all the pieces are put together correctly. like i wouldn't trust sysadmins to do this on their own and luckily there are projects like lxc/lxd/docker etc that assemble these pieces in a secure way.
Yep, exactly. Since zones were inspired by jails, FreeBSD works in pretty much the same way. For device permissions though, we have rules support directly in devfs. https://github.com/freebsd/freebsd/blob/master/etc/defaults/... — there's a default ruleset for jails that makes sense (allows log, null, zero, crypto, random, stdin/out/err, fd and tty stuff).
By the way, I couldn't find this anywhere on the internet — is there a simple way to just run something in a zone on illumos, without the installer stuff? Like on FreeBSD you can just do this:
jail -c path=/my/chroot/path command=/bin/sh
and you have a jailed shell. What's the Solaris/illumos equivalent of this?
1) the linux kernel container primitives are implemented in ways that are more complicated. for example in zones pid separation is implemented by just checking the zone_id and if the zone_id is different then processes can't access each other. this also means that in zones pids are unique and you can't have two processes from two different zones with the same pid [with the exception: i believe they may have hacked something in to handle pid1 on linux].
similarly, in zones there is no user mapping if you are root inside the zone you are also root outside of the zone. the files you create inside a zone are uid: 0 and also uid: 0 outside the zone.
if you look at how device permission is handled in linux we have cgroups that controls what devices can be accessed and created. while in solaris zones they use the existing Role Based Access Control and device visibility. so inside a zone you can either have permission to create all devices (very bad for security) or create no devices. In zones access to devices is mediated by whatever devices the administrator has created in your zone.
in zones there is no mount namespace instead there is something that is very similar to chroot. it is just a vnode in your proc struct where you are restricted from going above. zones have mostly been implemented by just adding a zone_id to the process struct and leveraging features in solaris that already existed [i guess the big exception would be the network virtualization in solaris] while in linux there are all these complicated namespace things.
this complexity means there are probably going to be more bugs in the linux kernel implementation. however, because you don't have as much fine grain control this can also create security bugs in your zone deployment. for example i found an issue in joyent's version of docker where you could trick the global zone into creating device files in your zone and these could be used to compromise the system. under a default lxc container this would not be possible because cgroups would prevent you from accessing the device even if you could trick someone else into creating it. you also have to be careful in zones with child zones getting access to files inside the parent zone. if you ever leak a filesystem fd or hard link into the child zone from the parent zone then all bets are off because the child is able to write into the parent zone as root. (i believe this situation was covered in a zone paper where they describe the risk of a non-privileged user in the global zone collaborating with a root user in a child zone to escalate privileges on the system)
2) because all the pieces are separate in linux then something has to put it together and make sure all the pieces are put together correctly. like i wouldn't trust sysadmins to do this on their own and luckily there are projects like lxc/lxd/docker etc that assemble these pieces in a secure way.