Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

syscall: swap use of read-write locks for ForkLock #54162

Closed
junchuan-tzh opened this issue Aug 1, 2022 · 28 comments
Closed

syscall: swap use of read-write locks for ForkLock #54162

junchuan-tzh opened this issue Aug 1, 2022 · 28 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@junchuan-tzh
Copy link

The ForkLock disables fork while creating new fd and marking it close-on-exec. Fd ops hold read lock, and fork holds write lock.

In kernels newer than 2.6, the two fd ops can be done in one step, and thus the ForkLock is no need. Actually, the ForkLock brings a bottleneck to concurrent forks.

Therefore, we exchange read-write locks: Fd ops hold write lock, and fork holds read lock. It is reasonable since fd ops "modify" process state, and fork just "reads" state. With this change, the newer kernels can remove the concurrent bottlenect, and the older kernels can still ensure the safety of fd close-on-exec.

@gopherbot gopherbot added this to the Proposal milestone Aug 1, 2022
@ianlancetaylor
Copy link
Contributor

This is not an API change so it doesn't have to go through the proposal process.

@ianlancetaylor ianlancetaylor changed the title proposal: syscall: exchange read-write locks for ForkLock syscall: use read-write locks for ForkLock Aug 2, 2022
@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Aug 2, 2022
@ianlancetaylor ianlancetaylor added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. and removed compiler/runtime Issues related to the Go compiler and/or runtime. Proposal labels Aug 2, 2022
@ianlancetaylor ianlancetaylor modified the milestones: Proposal, Backlog Aug 2, 2022
@ianlancetaylor ianlancetaylor changed the title syscall: use read-write locks for ForkLock syscall: swap use of read-write locks for ForkLock Aug 2, 2022
@rittneje
Copy link

rittneje commented Aug 2, 2022

This is a breaking change. Consumers that have to set close-on-exec on fds themselves may already be using syscall.ForkLock, and you cannot force everyone to replace Lock with RLock and vice versa.

@junchuan-tzh
Copy link
Author

syscall.ForkLock is only used in several cases (Pipe, Socket, Accept, Open, Dup), and swapping Lock and RLock in these cases that already are using syscall.ForkLock is enough. The upper APIs/consumers are not aware of this.

@rittneje
Copy link

rittneje commented Aug 2, 2022

No, syscall.ForkLock is a public property that anyone can use. And this is required when you are working with lower-level APIs (e.g., syscall.Socketpair) and have to set close-on-exec yourself.

Also, your statement that kernels newer that 2.6 can do this atomically (via O_CLOEXEC, etc.) only applies to Linux. For example, macOS does not support that flag in all cases. In other words, this change would cause performance issues for macOS applications that create new fds more often than they fork, which personally I suspect is the majority case.

@junchuan-tzh
Copy link
Author

junchuan-tzh commented Aug 5, 2022

Using an either-or lock is ok? See issues/23558

Like this (only the first fork holds the write lock):

close-on-exec:

    ForkLock.RLock()
    set_fd_close_on_exec()
    ForkLock.RUnlock()

fork: refcnt
    Mutex.Lock()
    if refcnt == 0:
        // get write lock for the first fork (if a non-fork op holds wirte lock, it is also ok)
        FockLock.Lock()
    refcnt += 1
    Mutex..Unlock()

    do_forkexec()

    Mutex..Lock()
        refcnt -= 1
        if refcnt == 0:
            // release write lock for the last fork
            ForkLock.UnLock()
    Mutex.Unlock()

@rittneje
Copy link

rittneje commented Aug 5, 2022

I'm assuming you meant to write ForkLock.Unlock() for the second block. I think that will work.

@junchuan-tzh
Copy link
Author

Yes, a typo error.

There is a potential problem. The latency to get ForkLock.RLock() can be very long if there are too many forks.
Is there a better way to do either-or lock?

@rittneje
Copy link

rittneje commented Aug 5, 2022

Unfortunately, I think you'd have to be able to ask the ForkLock whether there are pending readers in order to avoid that, and currently that information is not exposed.

@ianlancetaylor
Copy link
Contributor

Perhaps we should close this issue as a dup of #23558, as they are about the same problem.

@ianlancetaylor
Copy link
Contributor

Let me add that both this and #23558 have the same problem: we unfortunately exported syscall.ForkLock, and it is at least possible that people are using it, and that makes it difficult to change.

@ianlancetaylor
Copy link
Contributor

OK, I think I see a way: https://go.dev/cl/421441. Anybody see a way to improve that code?

@gopherbot
Copy link

Change https://go.dev/cl/421441 mentions this issue: syscall: avoid serializing forks on ForkLock

@rittneje
Copy link

rittneje commented Aug 6, 2022

@ianlancetaylor This is the approach that @junchuan-tzh mentioned above. As discussed, it has an issue where if forkExec is continually being invoked it will prevent syscall.ForkLock.RLock() from ever unblocking.

@ianlancetaylor
Copy link
Contributor

@rittneje My apologies for not reading more carefully.

I've updated https://go.dev/cl/421441 to avoid that problem. But it may be too complicated now. What do you think of that approach? Thanks.

@rittneje
Copy link

rittneje commented Aug 6, 2022

Indeed, it is fairly complex. The choice of 10 as the concurrency limit (sort of) also seems very arbitrary.

Perhaps sync.Cond would be a better choice than adding a channel? Just to reduce the number of players.

var forkingCond = sync.NewCond(new(sync.Mutex))

...

forkingCond.L.Lock()
if forking > 10 {
    forkingCond.Wait()
}
if forking == 0 {
    syscall.ForkLock.Lock()
}
forking++
forkingCond.L.Unlock()

defer func() {
    forkingCond.L.Lock()
    forking--
    if forking == 0 {
        syscall.ForkLock.Unlock()
        forkingCond.Broadcast()
    }
    forkingCond.L.Unlock()
}()

@ianlancetaylor
Copy link
Contributor

My personal feeling is that sync.Cond never makes things clearer. See also #21165.

@ianlancetaylor
Copy link
Contributor

I've updated https://go.dev/cl/421441 to use a different approach, using internal secret knowledge about sync.RWMutex. Does anybody see a problem with this approach? Thanks.

@rittneje
Copy link

@ianlancetaylor I am confused by the call to runtime.Gosched(). In order for the preceding call to ForkLock.RLock() to unblock, whatever was holding the write lock must have unlocked it. That means (I assume) that whatever calls ForkLock.Lock() next would have to wait for those readers to finish anyway.

@ianlancetaylor
Copy link
Contributor

@rittneje Thanks, you're right, that's not needed. I was thinking that the RLock callers would be woken up but would still have to acquire the read lock, but that's not how it works. The read lock is already held when they are woken up. I will remove the Gosched call.

@rittneje
Copy link

@ianlancetaylor I think there may still be a starvation potential hidden here. There is no guarantee that goroutines will acquire the (logical) fork lock in order. For example:

  1. Goroutine A calls ForkLock.Lock().
  2. Goroutine B calls ForkLock.RLock(), which blocks.
  3. Goroutine C tries to acquire fork lock. It sees there is a pending reader (B), so it waits.
  4. Goroutine A releases the write lock.
  5. Goroutine B acquires then releases the read lock.
  6. Goroutine D calls ForkLock.Lock().
  7. Goroutine E calls ForkLock.RLock(), which blocks.
  8. Goroutine C is finally scheduled, and tries to acquire fork lock again. Once again, it is thwarted by pending reader (E), so it waits.

In short, there is a possibility that a goroutine that is intending to fork keeps losing out to other goroutines indefinitely.

@ianlancetaylor
Copy link
Contributor

@rittneje I don't think that can happen with the current RWMutex implementation. When goroutine C blocks trying to acquire the read lock, it increments readerCount before it blocks. The subsequent call to Lock in D will block until readerCount drops back to zero.

@ianlancetaylor
Copy link
Contributor

To put it another way, I think that if starvation is possible with this CL, then it is possible with any use of RWMutex, and in particular it's possible today without this CL.

@rittneje
Copy link

rittneje commented May 23, 2023

When goroutine C blocks trying to acquire the read lock, it increments readerCount before it blocks. The subsequent call to Lock in D will block until readerCount drops back to zero.

But that doesn't matter. Even if D blocks, then when C gets scheduled it will immediately release read lock, which unblocks D. Then when C goes to the top of the for loop, it will observe the ForkLock is already taken so will once again fall to the pending readers check.

I think the correct fix is to remove the for loop. Instead, it should unconditionally acquire logical fork lock after waiting for pending readers once.

@ianlancetaylor
Copy link
Contributor

Oh, sorry, I did misunderstand. I see what you mean. Done.

@rittneje
Copy link

Thanks, I think it looks good now. One question I do have is whether it would be worth checking for pending writers in addition to pending readers. Otherwise, if there is code outside syscall doing ForkLock.Lock() (which is pretty unlikely but not impossible), it could get starved now.

@zhuangqh
Copy link

Thanks, I think it looks good now. One question I do have is whether it would be worth checking for pending writers in addition to pending readers. Otherwise, if there is code outside syscall doing ForkLock.Lock() (which is pretty unlikely but not impossible), it could get starved now.

I think this is as expected, if the user misuses forklock

@ianlancetaylor
Copy link
Contributor

I'm OK with assuming that no program calls syscall.ForkLock.Lock(). Even if there are such programs, they will continue to work unless they also fork new processes continuously. So I'm willing to wait to see if anybody reports a bug.

@gopherbot
Copy link

Change https://go.dev/cl/507355 mentions this issue: syscall: serialize locks on ForkLock on platforms where forkExecPipe is not atomic

gopherbot pushed a commit that referenced this issue Jul 10, 2023
…is not atomic

In CL 421441, we changed syscall to allow concurrent calls to
forkExec.

On platforms that support the pipe2 syscall that is the right
behavior, because pipe2 atomically opens the pipe with CLOEXEC already
set.

However, on platforms that do not support pipe2 (currently aix and
darwin), syscall.forkExecPipe is not atomic, and the pipes do not
initially have CLOEXEC set. If two calls to forkExec proceed
concurrently, a pipe intended for one child process can be
accidentally inherited by the other. If the process is long-lived, the
pipe can be held open unexpectedly and prevent the parent process from
reaching EOF reading the child's status from the pipe.

Fixes #61080.
Updates #23558.
Updates #54162.

Change-Id: I83edcc80674ff267a39d06260c5697c654ff5a4b
Reviewed-on: https://go-review.googlesource.com/c/go/+/507355
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Run-TryBot: Bryan Mills <bcmills@google.com>
Auto-Submit: Bryan Mills <bcmills@google.com>
bradfitz pushed a commit to tailscale/go that referenced this issue Jul 15, 2023
…is not atomic

In CL 421441, we changed syscall to allow concurrent calls to
forkExec.

On platforms that support the pipe2 syscall that is the right
behavior, because pipe2 atomically opens the pipe with CLOEXEC already
set.

However, on platforms that do not support pipe2 (currently aix and
darwin), syscall.forkExecPipe is not atomic, and the pipes do not
initially have CLOEXEC set. If two calls to forkExec proceed
concurrently, a pipe intended for one child process can be
accidentally inherited by the other. If the process is long-lived, the
pipe can be held open unexpectedly and prevent the parent process from
reaching EOF reading the child's status from the pipe.

Fixes golang#61080.
Updates golang#23558.
Updates golang#54162.

Change-Id: I83edcc80674ff267a39d06260c5697c654ff5a4b
Reviewed-on: https://go-review.googlesource.com/c/go/+/507355
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Run-TryBot: Bryan Mills <bcmills@google.com>
Auto-Submit: Bryan Mills <bcmills@google.com>
zhuangqh pushed a commit to zhuangqh/go that referenced this issue Aug 2, 2023
Fixes golang#23558
Fixes golang#54162

Change-Id: I3cf6efe466080cdb17e171218e9385ccb272c301
Reviewed-on: https://go-review.googlesource.com/c/go/+/421441
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Auto-Submit: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
zhuangqh pushed a commit to zhuangqh/go that referenced this issue Oct 7, 2023
Fixes golang#23558
Fixes golang#54162

Change-Id: I3cf6efe466080cdb17e171218e9385ccb272c301
Reviewed-on: https://go-review.googlesource.com/c/go/+/421441
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Auto-Submit: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

5 participants