Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/tools/gopls/internal/lsp/filecache: TestConcurrency failure on windows #57747

Closed
findleyr opened this issue Jan 11, 2023 · 9 comments
Closed
Assignees
Labels
FrozenDueToAge gopls Issues related to the Go language server, gopls. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Windows Tools This label describes issues relating to any tools in the x/tools repository.
Milestone

Comments

@findleyr
Copy link
Contributor

findleyr commented Jan 11, 2023

#!watchflakes
post <- goos == "windows" && pkg == "golang.org/x/tools/gopls/internal/lsp/filecache"

Just noticed this failure on windows:
https://build.golang.org/log/bff6275bfcc6f47e62fef12d9b1607ddfe68a0ca

--- FAIL: TestConcurrency (10.81s)
    filecache_test.go:99: rename C:\Users\gopher\AppData\Local\gopls\606f95f2\TestConcurrency\ae\aef468845cf8c257a7f1cf8fddd43ab5ebdbd0eb629eff15d05306cdcb975ccc.tmp.557174086837030208 C:\Users\gopher\AppData\Local\gopls\606f95f2\TestConcurrency\ae\aef468845cf8c257a7f1cf8fddd43ab5ebdbd0eb629eff15d05306cdcb975ccc: Access is denied.
FAIL
FAIL	golang.org/x/tools/gopls/internal/lsp/filecache	12.369s

Could this be an artifact of the builder, or a problem with the filecache design on windows? We need to understand this, as the filecache is a critical new component of gopls.

CC @adonovan

@findleyr findleyr added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jan 11, 2023
@findleyr findleyr added this to the gopls/v0.12.0 milestone Jan 11, 2023
@gopherbot gopherbot added Tools This label describes issues relating to any tools in the x/tools repository. gopls Issues related to the Go language server, gopls. labels Jan 11, 2023
@adonovan adonovan self-assigned this Jan 11, 2023
@bcmills
Copy link
Contributor

bcmills commented Jan 11, 2023

(CC @golang/windows)

From what I understand, guaranteed-atomic renames are not really a thing in the Win32 API.

The caches in cmd/go instead use a combination of idempotent writes (in the build cache) and file-locking (in the module cache). Both of those do appear to work reliably. Note that for idempotent writes you must avoid os.WriteFile, because that begins by truncating away the existing contents (and truncating away previously-written bytes is not idempotent!).

We also have robustio.Rename, which will generally retry past a rename failure on Windows but seems like kind of a code smell. (It papers over the failure with a retry rather than avoiding it in the first place, and as a result it can add a lot of latency as compared with a more principled approach.)

@adonovan
Copy link
Member

adonovan commented Jan 12, 2023

We also have robustio.Rename, which will generally retry past a rename failure on Windows but seems like kind of a code smell. (It papers over the failure with a retry rather than avoiding it in the first place, and as a result it can add a lot of latency as compared with a more principled approach.)

filecache is already using robustio.Rename, so this issue is really just an indication that its probabilistic workaround isn't as reliable as we would like.

The caches in cmd/go instead use a combination of idempotent writes (in the build cache)

By idempotent write, I assume you mean that if you write a file whose name is sha256(s) then its contents will be exactly s. We don't currently have that property in filecache, since it would require that you know the hash of the value before you access the cache, whereas in our case we only know the hash of the recipe, not of the cake that would be produced by executing the recipe, and we don't guarantee that recipes are fully deterministic: so long as the cake has the right density, we don't care where the CO2 bubbles are.

and file-locking (in the module cache).

Thanks, I'll look into locking.

From what I understand, guaranteed-atomic renames are not really a thing in the Win32 API.

Yeah. It's sad that this still hasn't been addressed in the quarter century since I first hit this problem. It's not as if Windows exploits the lack of transactional renames to make its directory operations faster than POSIX---quite the opposite in fact, it is notoriously slow. (Some of that is no doubt due to hooks for Antivirus, backup, GUIs, search index updates, and other things that Linux doesn't support, of course.)

Apparently, support for file system transactions (more general than just rename within a directory) was added to MS Windows Vista (2007) but has since been deprecated.

@adonovan
Copy link
Member

adonovan commented Jan 12, 2023

No less an authority than the late Jim Gray of MSR says in To BLOB or Not To BLOB:
Large Object Storage in a Database or a Filesystem?
that

To perform a safe write, a new version of the file
is created with a temporary name. Next, the new data
is written to the temporary file and those writes are
flushed to disk. Finally, the new file is renamed to the
permanent file name, thereby deleting the file with the
older data. Under UNIX, rename() is guaranteed to
atomically overwrite the old version of the file. Under
Windows, the ReplaceFile() call is used to atomically
replace one file with another.

But I'm unable to reconcile that with these terrifying error codes documented for ReplaceFile:

ERROR_UNABLE_TO_MOVE_REPLACEMENT1176 (0x498) | The replacement file could not be renamed. If lpBackupFileName was specified, the replaced and replacement files retain their original file names. Otherwise, the replaced file no longer exists and the replacement file exists under its original name.
ERROR_UNABLE_TO_MOVE_REPLACEMENT_21177 (0x499) | The replacement file could not be moved. The replacement file still exists under its original name; however, it has inherited the file streams and attributes from the file it is replacing. The file to be replaced still exists with a different name. If lpBackupFileName is specified, it will be the name of the replaced file.

Perhaps the operation is atomic within a directory, ACLs permitting, and not otherwise?

@bcmills
Copy link
Contributor

bcmills commented Jan 12, 2023

It's complicated. 😅
(See previously #8914.)

Per https://learn.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-replacefilew#remarks,

The ReplaceFile function combines several steps within a single function. An application can call ReplaceFile instead of calling separate functions to save the data to a new file, rename the original file using a temporary name, rename the new file to have the same name as the original file, and delete the original file.

From that description, I would suppose that if the file exists, it does have either the old contents or the new contents. However, if the two renames are not a single transaction then there may be an interval after the “rename the original file” step but before the “rename the new file” step in which the file ceases to exist entirely. 🤔

If that gap is possible, then it seems that simultaneous ReplaceFile calls could cause at least one of the calls to fail.
Consider the interleaving:

  • A: rename away old file
  • B: rename away old file (Does this fail because the path no longer refers to a file? Or is it a no-op?)
  • B: rename new file
  • A: rename new file (Does this fail because the path now contains the file renamed by B?)

So I would expect that ReplaceFile would still at least be prone to spurious failures, which would still have the latency problems associated with retries.

@gopherbot
Copy link

Change https://go.dev/cl/461855 mentions this issue: internal/robustio: use SetFileInformationByHandle(FileRenameInfo)

@adonovan
Copy link
Member

Inspired by https://www.youtube.com/watch?v=uhRWMGBjlO8&t=2161s I tried an approach based on SetFileInformationByHandle(FileRenameInfo), but that doesn't seem to be any better. See https://go.dev/cl/461855.

I notice that LLVM's solution to the same problem involved trying up to 200 retries (!!):

   // We normally expect this loop to succeed after a few iterations. If it
   // requires more than 200 tries, it's more likely that the failures are due to
   // a true error, so stop trying.
   for (unsigned Retry = 0; Retry != 200; ++Retry) {

Sigh.

@gopherbot
Copy link

Change https://go.dev/cl/461800 mentions this issue: gopls/internal/lsp/filecache: use file locking on Windows

@gopherbot
Copy link

Change https://go.dev/cl/463776 mentions this issue: internal/lockedfile: fix build constraints on solaris

gopherbot pushed a commit to golang/tools that referenced this issue Jan 27, 2023
The version of this file in the main repo uses a "!unix" build
constraint, which had to be ported to a longer list for inclusion in
gopls (which still supports Go versions that predate the "unix" tag).

Solaris is a Unix derivative, and its implementation is provided in
the "fcntl" implementation.

Updates golang/go#57747.

Change-Id: Ibde8ce55dadc03ad3cb797b5320f5b75580f639f
Reviewed-on: https://go-review.googlesource.com/c/tools/+/463776
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Bryan Mills <bcmills@google.com>
Reviewed-by: Robert Findley <rfindley@google.com>
gopls-CI: kokoro <noreply+kokoro@google.com>
Auto-Submit: Bryan Mills <bcmills@google.com>
@golang golang locked and limited conversation to collaborators Jan 27, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge gopls Issues related to the Go language server, gopls. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Windows Tools This label describes issues relating to any tools in the x/tools repository.
Projects
Archived in project
Development

No branches or pull requests

4 participants