Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: syscall: Support linux namespace fd's in SysProcAttr #56680

Open
cpuguy83 opened this issue Nov 9, 2022 · 8 comments · May be fixed by #59018
Open

proposal: syscall: Support linux namespace fd's in SysProcAttr #56680

cpuguy83 opened this issue Nov 9, 2022 · 8 comments · May be fixed by #59018
Labels
Milestone

Comments

@cpuguy83
Copy link

cpuguy83 commented Nov 9, 2022

I've been working on a package to help work on Linux namespaces (https://github.com/cpuguy83/gonso).
One of the things I'd like to be able to do is, as a library author, enable users of the library to run an executable in a given set of namespaces.
Today this would require the user of the library to have a helper process join those namespaces before executing the command.
In the library I could setup a thread with the namespaces and run the command but this doesn't really work with exec.Cmd since that can spin up its own goroutines not locked to the current thread.

I've also looked at creating essentially my own bad copy of exec.Cmd that would allow me to do this but ideally this would just work with the real exec.Cmd

What I propose is to add something like:

diff --git a/src/syscall/exec_linux.go b/src/syscall/exec_linux.go
index b61b51dff1..fd27257461 100644
--- a/src/syscall/exec_linux.go
+++ b/src/syscall/exec_linux.go
@@ -59,6 +59,11 @@ type SysProcIDMap struct {
 	Size        int // Size.
 }

+type LinuxNamespace struct {
+	Fd   int // Namespace file descriptor
+	Kind int // e.g. CLONE_NEWNS
+}
+
 type SysProcAttr struct {
 	Chroot     string      // Chroot.
 	Credential *Credential // Credential.
@@ -98,9 +103,10 @@ type SysProcAttr struct {
 	// This parameter is no-op if GidMappings == nil. Otherwise for unprivileged
 	// users this should be set to false for mappings work.
 	GidMappingsEnableSetgroups bool
-	AmbientCaps                []uintptr // Ambient capabilities (Linux only)
-	UseCgroupFD                bool      // Whether to make use of the CgroupFD field.
-	CgroupFD                   int       // File descriptor of a cgroup to put the new process into.
+	AmbientCaps                []uintptr        // Ambient capabilities (Linux only)
+	UseCgroupFD                bool             // Whether to make use of the CgroupFD field.
+	CgroupFD                   int              // File descriptor of a cgroup to put the new process into.
+	Namespaces                 []LinuxNamespace // Namespaces to join before exec
 }

 var (

I think this is a more generally useful addition since it allows anyone to set which namepsaces they want the command to run in without having to resort to using external tooling (such as nsenter) or re-exec/cgo init hacks.

@gopherbot gopherbot added this to the Proposal milestone Nov 9, 2022
@ianlancetaylor
Copy link
Contributor

CC @kolyshkin @avagin

@kolyshkin
Copy link
Contributor

I like the overall idea, but perhaps some higher-level way of specifying the namespaces is better.

Say something like

Namespaces []string

where each element is in the form "type:path". The type is one of "cgroup", "ipc", "mnt", "net", "pid", "user", "uts", and the path is an arbitrary filesystem path to open.

What the code would do is map those types to CLONE_XXX flags, open the files, and then call setns(2) for each one.

  1. Using a string instead of a CLONE_FLAG will make it slightly more portable (say we won't have to export more CLONE_XXX flags from the deprecated syscall package)
  2. I am not quite sure about using path vs file descriptor; one thing I can think of is path is a superset of fd, given than one can use /proc/self/fd/N to refer to fd N.

@cpuguy83
Copy link
Author

@kolyshkin I think that could work, though it is definitely different from other things, such as CgroupFD

I'm fine either way.

@cpuguy83
Copy link
Author

cpuguy83 commented Dec 3, 2022

@kolyshkin Is there any issue with having to do the parsing on that string in the syscall package?

@cpuguy83
Copy link
Author

Is there something we can do to move this discussion forward? Does this need to be in the active discussion list before a change should be submitted? /cc @rsc

@ianlancetaylor
Copy link
Contributor

You can send in a change any time but it won't be approved and submitted until this proposal is accepted. In this case actually I think seeing the code would help, as I don't really understand the proposal myself. Thanks.

I see there is a suggestion above for a different approach, and that doesn't seem to be settled. We aim to get consensus on proposals. @kolyshkin any more thoughts on the approach? Thanks.

@cpuguy83 cpuguy83 linked a pull request Mar 14, 2023 that will close this issue
@cpuguy83
Copy link
Author

I opened #59018 for this.

@gopherbot
Copy link

Change https://go.dev/cl/476095 mentions this issue: syscall: add support for setns after fork

cpuguy83 added a commit to cpuguy83/go that referenced this issue Jan 6, 2024
This adds a Namespaces field to Linux's SysProcAttr type.
When set, these namespaces will be entered after fork and before exec.
The format for this is [ns name]=[path], e.g. mnt=/some/path.

This allows users to exec a new process in a pre-defined set of
namespaces without having to resort to hacks or re-execs to bootstrap
these namespaces.

Closes golang#56680
cpuguy83 added a commit to cpuguy83/go that referenced this issue Jan 6, 2024
This adds a Namespaces field to Linux's SysProcAttr type.
When set, these namespaces will be entered after fork and before exec.
The format for this is [ns name]=[path], e.g. mnt=/some/path.

This allows users to exec a new process in a pre-defined set of
namespaces without having to resort to hacks or re-execs to bootstrap
these namespaces.

Closes golang#56680
cpuguy83 added a commit to cpuguy83/go that referenced this issue Jan 12, 2024
This adds a Namespaces field to Linux's SysProcAttr type.
When set, these namespaces will be entered after fork and before exec.

This allows users to exec a new process in a pre-defined set of
namespaces without having to resort to hacks or re-execs to bootstrap
these namespaces.

Closes golang#56680
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Incoming
Development

Successfully merging a pull request may close this issue.

4 participants