Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

syscall: add bind/mount operations to SysProcAttr on Linux #12125

Open
sargun opened this issue Aug 13, 2015 · 6 comments
Open

syscall: add bind/mount operations to SysProcAttr on Linux #12125

sargun opened this issue Aug 13, 2015 · 6 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. Thinking
Milestone

Comments

@sargun
Copy link

sargun commented Aug 13, 2015

It would be nice to be able to pass a list of bind mounts to ForkAndExec via SysProcAttr, that Go bind mounts after forking (and execing): https://golang.org/pkg/syscall/#SysProcAttr. Alternatively, it would be nice to have the ability to pass a lambda to it that would call some syscalls (or raw syscalls) before it execs.

@ianlancetaylor ianlancetaylor changed the title Bind mounts for mount namespaces on Linux syscall: Bind mounts for mount namespaces on Linux Aug 13, 2015
@ianlancetaylor
Copy link
Contributor

Our basic guideline for SysProcAttr is that we only add operations that must take place between the fork and execve calls. As far as I know bind mounts are not such a call: you could instead exec a small helper program that did the bind mounts, and then execed the real program. Running that small helper program is inconvenient, but better than adding new obscure features to the syscall package.

It's not possible to have a function that runs between fork and exec, because many Go operations are not available during that time. Only very carefully written Go code can happen in that space. Permitting an arbitrary function would be an invitation to bugs and a severe restriction on what we could do going forward.

Is there any reason a bind mount has to happen between fork and execve?

@rsc rsc added this to the Go1.6 milestone Oct 23, 2015
@rsc rsc added the Thinking label Oct 23, 2015
@rsc
Copy link
Contributor

rsc commented Nov 24, 2015

@sargun, arbitrary func calls won't happen, for the reasons Ian explained. They'll be basically impossible to use correctly.

@ianlancetaylor, the argument you're making also applies to closing file descriptors, yet we do that. So there must be a slightly different line. My guess is that it's based on how common/complex the operations are. New mounts do seem a bit rare.

In Plan 9, rfork(2) let you change the current process; you only got a new process if you included the RFPROC bit. So on a Plan 9 system you could call rfork(RFNAMEG) to put the calling thread in its own name space group, do you bind mounts, and then fork/exec. Translated into Go, it would be something like

go func() {
    runtime.LockOSThread()
    syscall.Rfork(syscall.RFNAMEG)
    ... binds/mounts ...
    result = ForkExec(...)
    c <- result
}()
result := <-c

It would be very nice if there were some equivalent on Linux, but as far as I can tell that functionality (operating on the current thread) was dropped along the way from rfork to clone.

@rsc rsc changed the title syscall: Bind mounts for mount namespaces on Linux syscall: add bind/mount operations to SysProcAttr on Linux Nov 24, 2015
@rsc
Copy link
Contributor

rsc commented Nov 24, 2015

(Clarifying title, not making a decision.)

@mdempsky
Copy link
Member

@rsc For what it's worth, Linux has unshare() and setns() system calls that operate on the current process/thread:

@rsc
Copy link
Contributor

rsc commented Dec 5, 2015

@mdempsky Thanks. Those look promising.

@sargun, can you take a look at those system calls and see if that works for you? The pattern would be something like:

runtime.LockOSThread()
unshare(2) to get private name space for thread
do bind/mounts
ForkExec or cmd.Start
setns(2) to reconnect to original name space
runtime.UnlockOSThread()

You can do the sequence in a goroutine if you want to make sure not to affect a possible thread lock in the caller.

@corhere
Copy link

corhere commented Oct 3, 2022

While it is possible to unshare(2) a thread's mount namespace to set up mounts for the child process, it is not possible to restore the thread completely to its initial state afterwards. unshare(CLONE_NEWNS) implies unshare(CLONE_FS), unsharing the thread's working directory, root directory and umask attributes. The thread's mount namespace can be restored with setns(2), but there is no syscall to reverse the effects of unshare(CLONE_FS). Insidious bugs could manifest if arbitrary goroutines were to execute on a thread with unshared file system attributes so the thread would have to be terminated. The changes to runtime.LockOSThread introduced in Go 1.10 make this possible, but the runtime would be down a thread which will eventually have to be replaced. No thread would have to be terminated if the mounts could be performed between the fork (i.e. clone(CLONE_NEWNS)) and execve.

In addition, it would be amazing if all the mount operations could be configured, including overriding the mount which is currently performed unconditionally if UnshareFlags has CLONE_NEWNS set. Setting the mount propagation mode recursively to MS_PRIVATE can prevent filesystems from being unmounted from the initial mount namespace due to "dangling" mounts in the unshared namespace preventing the filesystem from being released. Setting the propagation mode to MS_SLAVE would prevent such issues, but unconditionally changing it in the runtime would likely violate the Go 1 Compatibility Promise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. Thinking
Projects
None yet
Development

No branches or pull requests

6 participants