syscall: add bind/mount operations to SysProcAttr on Linux #12125

sargun · 2015-08-13T01:07:43Z

It would be nice to be able to pass a list of bind mounts to ForkAndExec via SysProcAttr, that Go bind mounts after forking (and execing): https://golang.org/pkg/syscall/#SysProcAttr. Alternatively, it would be nice to have the ability to pass a lambda to it that would call some syscalls (or raw syscalls) before it execs.

ianlancetaylor · 2015-08-13T01:38:31Z

Our basic guideline for SysProcAttr is that we only add operations that must take place between the fork and execve calls. As far as I know bind mounts are not such a call: you could instead exec a small helper program that did the bind mounts, and then execed the real program. Running that small helper program is inconvenient, but better than adding new obscure features to the syscall package.

It's not possible to have a function that runs between fork and exec, because many Go operations are not available during that time. Only very carefully written Go code can happen in that space. Permitting an arbitrary function would be an invitation to bugs and a severe restriction on what we could do going forward.

Is there any reason a bind mount has to happen between fork and execve?

rsc · 2015-11-24T17:28:26Z

@sargun, arbitrary func calls won't happen, for the reasons Ian explained. They'll be basically impossible to use correctly.

@ianlancetaylor, the argument you're making also applies to closing file descriptors, yet we do that. So there must be a slightly different line. My guess is that it's based on how common/complex the operations are. New mounts do seem a bit rare.

In Plan 9, rfork(2) let you change the current process; you only got a new process if you included the RFPROC bit. So on a Plan 9 system you could call rfork(RFNAMEG) to put the calling thread in its own name space group, do you bind mounts, and then fork/exec. Translated into Go, it would be something like

go func() {
    runtime.LockOSThread()
    syscall.Rfork(syscall.RFNAMEG)
    ... binds/mounts ...
    result = ForkExec(...)
    c <- result
}()
result := <-c

It would be very nice if there were some equivalent on Linux, but as far as I can tell that functionality (operating on the current thread) was dropped along the way from rfork to clone.

rsc · 2015-11-24T17:29:29Z

(Clarifying title, not making a decision.)

mdempsky · 2015-11-24T17:40:24Z

@rsc For what it's worth, Linux has unshare() and setns() system calls that operate on the current process/thread:

rsc · 2015-12-05T04:50:35Z

@mdempsky Thanks. Those look promising.

@sargun, can you take a look at those system calls and see if that works for you? The pattern would be something like:

runtime.LockOSThread()
unshare(2) to get private name space for thread
do bind/mounts
ForkExec or cmd.Start
setns(2) to reconnect to original name space
runtime.UnlockOSThread()

You can do the sequence in a goroutine if you want to make sure not to affect a possible thread lock in the caller.

corhere · 2022-10-03T21:09:31Z

While it is possible to unshare(2) a thread's mount namespace to set up mounts for the child process, it is not possible to restore the thread completely to its initial state afterwards. unshare(CLONE_NEWNS) implies unshare(CLONE_FS), unsharing the thread's working directory, root directory and umask attributes. The thread's mount namespace can be restored with setns(2), but there is no syscall to reverse the effects of unshare(CLONE_FS). Insidious bugs could manifest if arbitrary goroutines were to execute on a thread with unshared file system attributes so the thread would have to be terminated. The changes to runtime.LockOSThread introduced in Go 1.10 make this possible, but the runtime would be down a thread which will eventually have to be replaced. No thread would have to be terminated if the mounts could be performed between the fork (i.e. clone(CLONE_NEWNS)) and execve.

In addition, it would be amazing if all the mount operations could be configured, including overriding the mount which is currently performed unconditionally if UnshareFlags has CLONE_NEWNS set. Setting the mount propagation mode recursively to MS_PRIVATE can prevent filesystems from being unmounted from the initial mount namespace due to "dangling" mounts in the unshared namespace preventing the filesystem from being released. Setting the propagation mode to MS_SLAVE would prevent such issues, but unconditionally changing it in the runtime would likely violate the Go 1 Compatibility Promise.

ianlancetaylor changed the title ~~Bind mounts for mount namespaces on Linux~~ syscall: Bind mounts for mount namespaces on Linux Aug 13, 2015

rsc added this to the Go1.6 milestone Oct 23, 2015

rsc added the Thinking label Oct 23, 2015

rsc changed the title ~~syscall: Bind mounts for mount namespaces on Linux~~ syscall: add bind/mount operations to SysProcAttr on Linux Nov 24, 2015

rsc modified the milestones: Unplanned, Go1.6 Dec 5, 2015

ianlancetaylor mentioned this issue Oct 15, 2017

syscall: Unshare with CLONE_NEWUSER fails #22283

Closed

ebfe mentioned this issue Nov 16, 2018

syscall: SysProcAttr.AmbientCaps fails when creating a new user namespace and creator is not root #23152

Closed

illiliti mentioned this issue Apr 27, 2021

TODO illiliti/king#1

Open

gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

syscall: add bind/mount operations to SysProcAttr on Linux #12125

syscall: add bind/mount operations to SysProcAttr on Linux #12125

sargun commented Aug 13, 2015

ianlancetaylor commented Aug 13, 2015

rsc commented Nov 24, 2015

rsc commented Nov 24, 2015

mdempsky commented Nov 24, 2015

rsc commented Dec 5, 2015

corhere commented Oct 3, 2022

syscall: add bind/mount operations to SysProcAttr on Linux #12125

syscall: add bind/mount operations to SysProcAttr on Linux #12125

Comments

sargun commented Aug 13, 2015

ianlancetaylor commented Aug 13, 2015

rsc commented Nov 24, 2015

rsc commented Nov 24, 2015

mdempsky commented Nov 24, 2015

rsc commented Dec 5, 2015

corhere commented Oct 3, 2022