Skip to content

syscall: effect of Chdir in Plan 9 may not be simultaneous to all goroutines #58802

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
millerresearch opened this issue Mar 1, 2023 · 6 comments
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Plan9
Milestone

Comments

@millerresearch
Copy link
Contributor

CL 6350 added a syscall.Fixwd routine to ensure that a Chdir in one goroutine is propagated to other goroutines before the working directory is referenced in a subsequent syscall. This is needed because each M in Plan 9 runs as a separate OS process with its own working directory.

This mechanism turns out to be not quite sufficient to guarantee that Chdir is effectively program wide. It may be frustrated by preemption of a goroutine and rescheduling onto a different M at an inopportune moment.

Circumstantial evidence includes test failures like this with messages like './testdata' file does not exist', and the observation by @psilva261 in the discussion of #57540 that inserting some runtime.LockOSThread calls around Fixwd makes "fs related tests ... much more stable".

I've also found that if the os.TestProgWideChdir test is made slightly more aggressive, it will
sometimes report errors.

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Mar 1, 2023
@bcmills bcmills added OS-Plan9 NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Mar 1, 2023
@bcmills bcmills added this to the Backlog milestone Mar 1, 2023
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/472555 mentions this issue: os: make TestProgWideChdir detect more possible failure cases

@psilva261
Copy link
Member

I could create a patch for this, so far I've tried adding additional locking in Fixwd(), fixwd(), Getwd() and Chdir() although it's probably not needed everywhere.

@millerresearch
Copy link
Contributor Author

I think your patch mentioned in #57540 might be too much and too little. For example fixwd() doesn't need to lock to the OS thread because it only makes one call to Fixwd(), which already does the locking.

On the other hand, consider this sequence (with your patch):

  1. syscall.Open is called with a relative pathname
  2. Open calls fixwd, which calls runtime.LockOSThread
  3. fixwd calles Fixwd which calls fixwdLocked which calls chdir, changing the directory for the current M
  4. Before returning, fixwd calls runtime.UnlockOSThread
  5. syscall.Open calls open which opens the file in the correct (we hope) directory

But what if the goroutine gets preempted between 4. and 5. and is rescheduled on a different M where Fixwd hasn't yet been called since the most recent Chdir?

I think for safety the goroutine needs to be locked to the OS thread from the start of fixwd until after the syscall which the fixwd protects. This only needs to be done if the syscall uses a relative pathname, so maybe fixwd could return a bool to indicate if (deferred) UnlockOSThread is required.

@millerresearch
Copy link
Contributor Author

I think for safety the goroutine needs to be locked to the OS thread from the start of fixwd until after the syscall which the fixwd protects. This only needs to be done if the syscall uses a relative pathname, so maybe fixwd could return a bool to indicate if (deferred) UnlockOSThread is required.

@psilva261 I've tried something along these lines and it seems to work. Shall I submit it, or would you like to develop your patch (it was your idea first)?

@psilva261
Copy link
Member

@psilva261 I've tried something along these lines and it seems to work. Shall I submit it, or would you like to develop your patch (it was your idea first)?

@millerresearch Yes, it would be great if you would submit your patch. This takes more corner-cases into account. Also thanks a lot for the explanation.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/474055 mentions this issue: syscall: avoid race in plan9 while syncing Chdir across goroutines

@mknyszek mknyszek moved this to All-But-Submitted in Go Compiler / Runtime Mar 8, 2023
@github-project-automation github-project-automation bot moved this from All-But-Submitted to Done in Go Compiler / Runtime Mar 9, 2023
gopherbot pushed a commit that referenced this issue May 24, 2023

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
This test is meant to detect the effect of Chdir not being
observed in other concurrent goroutines, possible in Plan 9
because each M runs in a separate OS process with its own
working directory. The test depends on Getwd to report the
correct working directory, but if Chdir fails then Getwd
may fail for the same reasons. We add a consistency check
that Stat(Getwd()) and Stat(".") refer to the same file.

Also change channel usage and add a sync.WaitGroup to
ensure test goroutines are not left blocked or running
when the main test function exits.

For #58802

Change-Id: I80d554fcf3617427c28bbe16e5e396367dcfe673
Reviewed-on: https://go-review.googlesource.com/c/go/+/472555
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: David du Colombier <0intro@gmail.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Ian Lance Taylor <iant@golang.org>
@golang golang locked and limited conversation to collaborators Mar 8, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Plan9
Projects
None yet
Development

No branches or pull requests

4 participants