Skip to content

proposal: syscall: add ExecFD options for Linux fexecve to SysProcAttr #72102

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
criyle opened this issue Mar 4, 2025 · 10 comments
Open

proposal: syscall: add ExecFD options for Linux fexecve to SysProcAttr #72102

criyle opened this issue Mar 4, 2025 · 10 comments
Labels
LibraryProposal Issues describing a requested change to the Go standard library or x/ libraries, but not to a tool Proposal
Milestone

Comments

@criyle
Copy link

criyle commented Mar 4, 2025

Proposal Details

Linux 3.19 added a new system call, execveat, which provides the proper implementation for POSIX's fexecve. fexecve requires mount of /proc system and previous example shows the implementation of fexecve over the /proc filesystem is not reliable given the reorganization of file descriptors before the execve call in the child process. #66654 #61751

It would be great for some software written in Go to benefit from this new functionality, especially for containerized software which makes copies of their binary to avoid escaping attacks.

Proposed implementation as follows:

diff --git a/src/syscall/exec_linux.go b/src/syscall/exec_linux.go
index 678bc84..ead763b 100644
--- a/src/syscall/exec_linux.go
+++ b/src/syscall/exec_linux.go
@@ -110,11 +110,14 @@ type SysProcAttr struct {
        // functionality is supported by the kernel, or -1. Note *PidFD is
        // changed only if the process starts successfully.
        PidFD *int
+
+       ExecFD int // File descriptor of a executable file.
 }
 
 var (
        none  = [...]byte{'n', 'o', 'n', 'e', 0}
        slash = [...]byte{'/', 0}
+       empty = [...]byte{0}
 
        forceClone3 = false // Used by unit tests only.
 )
@@ -241,7 +244,7 @@ func forkAndExecInChild1(argv0 *byte, argv, envv []*byte, chroot, dir *byte, att
        // by an otherwise-correct change in the compiler.
        var (
                err2                      Errno
-               nextfd                    int
+               nextfd, execfd            int
                i                         int
                caps                      caps
                fd1, flags                uintptr
@@ -573,6 +576,17 @@ func forkAndExecInChild1(argv0 *byte, argv, envv []*byte, chroot, dir *byte, att
                pipe = nextfd
                nextfd++
        }
+       if SysProcAttr.ExecFD > 0 {
+               execfd = SysProcAttr.ExecFD
+               if execfd < nextfd {
+                       _, _, err1 = RawSyscall(SYS_DUP3, uintptr(execfd), uintptr(nextfd), O_CLOEXEC)
+                       if err1 != 0 {
+                               goto childerror
+                       }
+                       execfd = nextfd
+                       nextfd++
+               }
+       }
        for i = 0; i < len(fd); i++ {
                if fd[i] >= 0 && fd[i] < i {
                        if nextfd == pipe { // don't stomp on pipe
@@ -663,10 +677,16 @@ func forkAndExecInChild1(argv0 *byte, argv, envv []*byte, chroot, dir *byte, att
        }
 
        // Time to exec.
-       _, _, err1 = RawSyscall(SYS_EXECVE,
-               uintptr(unsafe.Pointer(argv0)),
-               uintptr(unsafe.Pointer(&argv[0])),
-               uintptr(unsafe.Pointer(&envv[0])))
+       if execfd > 0 {
+               _, _, err1 = RawSyscall6(SYS_EXECVEAT, execfd,
+                       uintptr(unsafe.Pointer(&empty[0])),
+                       uintptr(unsafe.Pointer(&argv[0])),
+                       uintptr(unsafe.Pointer(&envv[0])), _AT_EMPTY_PATH, 0)
+       } else {
+               _, _, err1 = RawSyscall(SYS_EXECVE,
+                       uintptr(unsafe.Pointer(argv0)),
+                       uintptr(unsafe.Pointer(&argv[0])),
+                       uintptr(unsafe.Pointer(&envv[0])))
+       }
 
 childerror:
        // send error code on pipe
@criyle criyle added the Proposal label Mar 4, 2025
@gopherbot gopherbot added this to the Proposal milestone Mar 4, 2025
@ianlancetaylor ianlancetaylor moved this to Incoming in Proposals Mar 4, 2025
@gabyhelp gabyhelp added the LibraryProposal Issues describing a requested change to the Go standard library or x/ libraries, but not to a tool label Mar 4, 2025
@ianlancetaylor
Copy link
Member

CC @neild

It seems that it would make sense to consider this in conjunction with the new os.Root family of functions. For example, perhaps we should support os.(*Root).StartProcess(name string, argv []string, attr *ProcAttr). That suggests that rather than ExecFD we should have DirFD. I think it would have to be something like

    At      bool // use execveat with DirFD and AtFlags
    DirFD   int  // if At is true, dirfd argument to execveat
    AtFlags int  // if At is true, flags argument to execveat

In order to use this we would have to add AT_EMPTY_PATH and AT_SYMLINK_NOFOLLOW to the syscall package.

@criyle
Copy link
Author

criyle commented Mar 5, 2025

The execveat behaves like fexecve only when pathname is empty path uintptr(unsafe.Pointer(&empty[0])) and the _AT_EMPTY_PATH is supplied. To support the above use case, the pathname parameter of the syscall need to be adjusted, or controlled by AtFlags.

@ianlancetaylor
Copy link
Member

@criyle I can't tell if you are pointing out a problem or not. The pathname is under program control, along with the fields in SysProcAttr. So if a program wants the equivalent of fexecve, it can pass AT_EMPTY_PATH and an empty path name.

@criyle
Copy link
Author

criyle commented Mar 5, 2025

You are right, the user can pass empty string to argv0 directly using your proposed API like

syscall.ForkExec("", argv, &syscall.SysProcAttr{
    At: true, 
    DirFD: fileFD, 
    AtFlags: syscall.AT_EMPTY_PATH
})

It indeed made the implementation more generic to other use cases. Sorry that I remembered the os/exec.Cmd does not accept empty string Path, so I made a false assumption to assume the argv0 is not accepting that either.

@neild
Copy link
Contributor

neild commented Mar 5, 2025

I don't understand the point about "fexecve requires mount of /proc system". The example in #66654 seems to be using execve("/proc/self/fd/X", ...) to mimic the behavior of fexecve(X, ...), but I don't see why fexecve itself would require /proc.

If we're adding new API here, it seems like we should do it for all OSs that support the necessary syscalls. FreeBSD at least has fexecve.

os.(*Root).StartProcess(name string, argv []string, attr *ProcAttr) seems reasonable to me. If we do want to support that, perhaps we should also have a way to use it from os/exec.Cmd. Perhaps something like:

package exec

type Cmd struct {
  // ...

  // Root specifies the root the command is relative to.
  //
  // If Root is non-nil, Path is interpreted relative to the root.
  Root *os.Root
}

I'm not sure how Cmd.Root and Cmd.Dir should interact, though.

@criyle
Copy link
Author

criyle commented Mar 5, 2025

fexecve requires mount of /proc system

From the fexecve(3) man page, the POSIX's fexecve is implemented via proc system by glibc on systems that does not have execveat syscall, and the two issues I have linked showed that this kind of ancient implementation is not reliable. This proposal is to add support to fexecve using newer system interface execveat if available (kernel >= 3.19). Sorry that I should say "outdated implementation of fexecve requires mount of /proc system" instead of made it so generic.

@ianlancetaylor
Copy link
Member

Rather than a Root field in exec.Cmd, perhaps we could have a new function

// CommandRoot is like CommandContext, but the executable is relative to root.
func CommandRoot(ctx context.Context, root *os.Root, name string, arg ...string) *Cmd

I think that may be a more natural way to describe what we want to execute. It would set the SysProcAttr fields accordingly.

@neild
Copy link
Contributor

neild commented Mar 6, 2025

If we do add something, I think it should use fexecve on systems which have it but not execveat. That is: This shouldn't be a Linux-specific feature. (Even if we just add something to syscall, we should think about non-Linux systems.)

My one concern about CommandRoot is whether we need to be able to make Cmd.Dir root-relative, and if so, whether it needs to potentially be a different root than Cmd.Path. That seems like a very specialized need, but all of this is satisfying very specialized needs.

@ianlancetaylor
Copy link
Member

I think that if we take the CommandRoot path then Dir is not relative to the root. That seems like a separate thing.

Of course, we should think about some approach for setting Dir safely, which is a valid concern whether or not the executable is in a root or not. For example, it seems reasonable to run /bin/ls in a user-specified directory that must be within some root.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
LibraryProposal Issues describing a requested change to the Go standard library or x/ libraries, but not to a tool Proposal
Projects
Status: Incoming
Development

No branches or pull requests

5 participants