Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime/race: update race windows syso to new LLVM version #53539

Closed
thanm opened this issue Jun 24, 2022 · 21 comments
Closed

runtime/race: update race windows syso to new LLVM version #53539

thanm opened this issue Jun 24, 2022 · 21 comments
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@thanm
Copy link
Contributor

thanm commented Jun 24, 2022

The syso that supports the race detector for windows is currently still at an old version, since we had been waiting for new windows builders with updated compilers:

https://go.googlesource.com/go/+/6bad7e82430bb1eb927a2901f44f9664637db27d/src/runtime/race/README#12

I took a stab at doing the update today but ran into some additional problems; filing this issue to note the problems and track the work needed to do the update.

The LLVM support library (llvm-project/compiler-rt/lib/tsan and related) has undergone a fair number of changes since we last did an update, and it looks as though there are now some calls in the windows version to new synchronization routines. Specifically if you run racebuild with the new builders and then race.bat, you get unsatisfied symbols.

With a slightly modified version of racebuild.go that doesn't use GCC 5.3 for the build.

./racebuild --goroot=/ssd2/xgo --rev=41cb504b7c4b18ac15830107431a0c1eec73a6b2 \
  --gorev=851ecea4cc99ab276109493477b2c7e30c253ea8 --platforms=windows/amd64
...
# go install -race std
# runtime/race
..../x86_64-w64-mingw32/bin/ld.exe: runtime\race\race_windows_amd64.syso:gotsan.cpp:(.text+0x3bfa): undefined reference to `WaitOnAddress'
...
..../x86_64-w64-mingw32/bin/ld.exe: runtime\race\race_windows_amd64.syso:gotsan.cpp:(.text+0x91cf): undefined reference to `WakeByAddressSingle'
...
...../x86_64-w64-mingw32/bin/ld.exe: runtime\race\race_windows_amd64.syso:gotsan.cpp:(.text+0x137e2): undefined reference to `WakeByAddressAll'

Looks like this is coming from this code:

https://github.com/llvm/llvm-project/blob/41cb504b7c4b18ac15830107431a0c1eec73a6b2/compiler-rt/lib/sanitizer_common/sanitizer_win.cpp#L48

This implies that if we want to support -race + internal linking on windows we need to teach the Go linker to import this lib.

@thanm thanm added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jun 24, 2022
@thanm thanm added this to the Go1.20 milestone Jun 24, 2022
@thanm thanm self-assigned this Jun 24, 2022
@gopherbot
Copy link

Change https://go.dev/cl/413817 mentions this issue: cmd/link: link against libsynchronization.a for -race on windows

@thanm
Copy link
Contributor Author

thanm commented Jun 24, 2022

OK, looks like CL 413817 helps with that, but we're on to a new problem:

C:\workdir\go\pkg\tool\windows_amd64\link.exe: running gcc failed: exit status 1
c:/programdata/chocolatey/lib/mingw/tools/install/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/11.2.0/../../../../x86_64-w64-mingw32/bin/ld.exe: C:\Users\gopher\AppData\Local\Temp\go-link-3661115345\000002.o:gotsan.cpp:(.text+0xf949): undefined reference to `__sanitizer::DumpProcessMap()'
collect2.exe: error: ld returned 1 exit status

gopherbot pushed a commit that referenced this issue Jun 27, 2022
As of LLVM rev 41cb504b7c4b18ac15830107431a0c1eec73a6b2, the
race detector runtime now refers to things in the windows
synchronization library, hence when doing windows internal
linking, at that library to the list of host archives that
we visit. The tsan code that makes the reference is here:

https://github.com/llvm/llvm-project/blob/41cb504b7c4b18ac15830107431a0c1eec73a6b2/compiler-rt/lib/sanitizer_common/sanitizer_win.cpp#L48
https://github.com/llvm/llvm-project/blob/41cb504b7c4b18ac15830107431a0c1eec73a6b2/compiler-rt/lib/sanitizer_common/sanitizer_win.cpp#L834

Note that libsynchronization.a is not guaranteed to be available on
all windows systems, so in the external linking case, check for its
existence before adding "-lsynchronization" to the external linker
args.

Updates #53539.

Change-Id: I433c95c869915693d59e9c1082d5b8a11da1fc8c
Reviewed-on: https://go-review.googlesource.com/c/go/+/413817
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Than McIntosh <thanm@google.com>
@gopherbot
Copy link

Change https://go.dev/cl/414514 mentions this issue: runtime/race: update race_windows_amd64.syso

@thanm
Copy link
Contributor Author

thanm commented Jun 27, 2022

Update: I sent D128641 to LLVM upstream with a fix, now submitted as 13fb97d68821.

Still having trouble getting a clean race.bat run after running racebuild. My latest run is:

./racebuild --goroot=/ssd2/xgo --rev=13fb97d68821e1948d176057ebee94f50cb05b62 --gorev=68289f39f0019ff5c03e047d68e5b7f6a9f9e9e2 --platforms=windows/amd64

in combination with CL 414475, which picks GCC 11.2 instead of 5.3. The resulting race.bat run is still not passing however. Representative error:

##### Testing packages.
ok  	archive/tar	1.173s
ok  	archive/zip	3.534s
ok  	bufio	1.519s
==2580==ERROR: ThreadSanitizer failed to allocate 0x020000000000 (2199023255552) bytes at 0x010000000000 (error code: 487)
failed to reset shadow memory
FAIL	bytes	1.190s
ok  	compress/bzip2	1.807s
ok  	compress/flate	7.184s
ok  	compress/gzip	1.653s
ok  	compress/lzw	1.783s
ok  	compress/zlib	6.271s

I get this same symptom on both windows-amd64-race and windows-amd64-race-newcc.

@dvyukov
Copy link
Member

dvyukov commented Jun 28, 2022

This error:

==2580==ERROR: ThreadSanitizer failed to allocate 0x020000000000 (2199023255552) bytes at 0x010000000000 (error code: 487)

comes from here:
https://github.com/llvm/llvm-project/blob/241557fb0600fcef67bdd0698fbbd2fb9a3459e2/compiler-rt/lib/tsan/rtl/tsan_rtl.cpp#L200
when we try to zero the whole shadow memory range.

Previously we did not do it on Windows since it was optional, but now it's required to zero the whole range.

But since we map shadow on Windows lazily (otherwise we would get the same error earlier):
https://github.com/llvm/llvm-project/blob/241557fb0600fcef67bdd0698fbbd2fb9a3459e2/compiler-rt/lib/tsan/rtl/tsan_rtl.cpp#L562-L570

We don't need to remap the whole range, we only need to remap what was mapped by MapShadow function.

We probably need to add mapped_shadow_begin/end for Go, update them in MapShadow and re-map on that part in DoReset.

@thanm
Copy link
Contributor Author

thanm commented Jun 30, 2022

We probably need to add mapped_shadow_begin/end for Go, update them in MapShadow and re-map on that part in DoReset.

Thanks, I tried coding up this suggestion. Tentative/experimental CL at https://reviews.llvm.org/D128909.

This does seem to eliminate the "failed to allocate 0x020000000000" problem, but now I seem to be on to a different failure mode, e.g.

##### Testing packages.
==3348==ERROR: ThreadSanitizer failed to allocate 0x000000501000 (5246976) bytes at 0x100eccf98e000 (error code: 87)
FAIL	archive/tar	0.054s
==2036==ERROR: ThreadSanitizer failed to allocate 0x0000004d1000 (5050368) bytes at 0x100edaccf6000 (error code: 87)
FAIL	archive/zip	0.056s
==3584==ERROR: ThreadSanitizer failed to allocate 0x0000003ad000 (3854336) bytes at 0x100eca5bfe000 (error code: 87)
FAIL	bufio	0.046s
==3440==ERROR: ThreadSanitizer failed to allocate 0x0000003cd000 (3985408) bytes at 0x100ef1b2da000 (error code: 87)
FAIL	bytes	0.054s
==660==ERROR: ThreadSanitizer failed to allocate 0x00000037d000 (3657728) bytes at 0x100eeb308e000 (error code: 87)
FAIL	compress/bzip2	0.055s
...

This actually looks pretty similar to some previous issues, e.g. 48231 and 46099.

I see from this comment that the problem may be due to a new compiler that defaults to a PIE binary. I can pursue this angle (maybe with Go linker CL). @dvyukov , can you confirm that we need to avoid PIE in order for the race detector to function properly on windows/amd64? Thanks.

@dvyukov
Copy link
Member

dvyukov commented Jun 30, 2022

87 is ERROR_INVALID_PARAMETER 87 (0x57) The parameter is incorrect.
https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--0-499-

I see the addresses are not 64K aligned, that's probably what causes the error.
What line in tsan runtime does cause this?

@thanm
Copy link
Contributor Author

thanm commented Jun 30, 2022

I see the addresses are not 64K aligned, that's probably what causes the error. What line in tsan runtime does cause this?

Ah, ok, I guess this would make sense. I'll see if I can collect a stack trace.

@thanm
Copy link
Contributor Author

thanm commented Jun 30, 2022

Looks like the offending call is from MapShadow itself (e.g. https://github.com/llvm/llvm-project/blob/13fb97d68821e1948d176057ebee94f50cb05b62/compiler-rt/lib/tsan/rtl/tsan_rtl.cpp#L568).

64K: I tried upping the alignment to 64k, but that doesn't seem to help (I still get the same error).

@thanm
Copy link
Contributor Author

thanm commented Jun 30, 2022

OK, looks like the most recent batch of problems all go away (at least for the hand tests I tried) when PIE is turned off. I will work on putting together a linker CL that disables PIE when -race is in effect. Stay tuned.

@dvyukov
Copy link
Member

dvyukov commented Jul 1, 2022

Looks like the offending call is from MapShadow itself

Right, we only align it to 4K (page size), but not to 64K.

64K: I tried upping the alignment to 64k, but that doesn't seem to help (I still get the same error).

Humm... this is strange. You aligned both start address and size, right?
The other reason for this I can think of is that since we extend the region while rounding it to 64K, we can ask the kernel to allocate overlapping ranges. Maybe that fails?
Since we will need to track mapped_shadow_begin/end for DoReset, it should be relatively easy to avoid double mapping.

OK, looks like the most recent batch of problems all go away (at least for the hand tests I tried) when PIE is turned off.

The only way I see PIE can affect things is that some ranges become only 4K aligned, whereas w/o PIE they are 64K aligned.
If so, then carefully tracking mapped_shadow_begin/end, aligning everything to 64K and avoiding double mapping of ranges may solve all problems (w/o prohibiting PIE).

@gopherbot
Copy link

Change https://go.dev/cl/415674 mentions this issue: cmd/racebuild: support cherry-picking CL into Go repo

@gopherbot
Copy link

Change https://go.dev/cl/415675 mentions this issue: cmd/racebuild: add -copyonfail debugging flag

@gopherbot
Copy link

Change https://go.dev/cl/414475 mentions this issue: cmd/racebuild: update compilers used for windows syso build

@gopherbot
Copy link

Change https://go.dev/cl/415676 mentions this issue: cmd/link: explicitly disable PIE for windows/amd64 -race mode

@gopherbot
Copy link

Change https://go.dev/cl/416174 mentions this issue: cmd/go: default to "exe" build mode for windows -race

gopherbot pushed a commit that referenced this issue Jul 7, 2022
This patch changes the default build mode from "pie" to "exe" when
building programs on windows with "-race" in effect. The Go command
already issues an error if users explicitly ask for -buildmode=pie in
combination with -race on windows, but wasn't revising the default
"pie" build mode if a specific buildmode was not requested.

Updates #53539.
Updates #35006.

Change-Id: I2f81a41a1d15a0b4f5ae943146175c5a1202cbe0
Reviewed-on: https://go-review.googlesource.com/c/go/+/416174
Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Than McIntosh <thanm@google.com>
gopherbot pushed a commit that referenced this issue Jul 7, 2022
Turn off PIE explicitly for windows/amd64 when -race is in effect,
since at the moment the race detector runtime doesn't seem to handle
PIE binaries correctly. Note that newer C compilers on windows
produce PIE binaries by default, so the Go linker needs to explicitly
turn off PIE when invoking the external linker in this case.

Updates #53539.

Change-Id: Ib990621f22cf61a5fa383584bab81d3dfd7552e8
Reviewed-on: https://go-review.googlesource.com/c/go/+/415676
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Than McIntosh <thanm@google.com>
Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
@thanm
Copy link
Contributor Author

thanm commented Jul 7, 2022

For tracking purposes (this is more of a mental note / reminder), the updated windows race syso is dependent on functions from libsynchronization.a (which are not available on our existing Go windows builders), meaning that we can't check that in until we've transitioned over to the new ones. Otherwise we'll get unsats on WakeByAddressSingle/WakeByAddressAll/WaitOnAddress.

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 7, 2022
@bcmills
Copy link
Contributor

bcmills commented Jul 11, 2022

Is this dashboard failure related?

2022-07-06T18:36:43-d69bac6-77cc1c0/windows-amd64-race:

==4712==ERROR: ThreadSanitizer failed to allocate 0x000000040000 (262144) bytes at 0x056157aa0000 (error code: 1455)
FATAL: ThreadSanitizer can not mmap thread trace (0x056157aa0000/0x000000040000)
FAIL	golang.org/x/tools/refactor/importgraph	0.276s

@thanm
Copy link
Contributor Author

thanm commented Jul 11, 2022

Interesting, I haven't seem one of those before. Error code 1455 is 'The paging file is too small for this operation to complete'.

I would guess that it's unrelated, the errors I am working on at the moment for V3 are 487 ("Attempt to access invalid address").

thanm added a commit to llvm/llvm-project that referenced this issue Jul 26, 2022
Capture the computed shadow begin/end values at the point where the
shadow is first created and reuse those values on reset. Introduce new
windows-specific function "ZeroMmapFixedRegion" for zeroing out an
address space region previously returned by one of the MmapFixed*
routines; call this function (on windows) from DoResetImpl
tsan_rtl.cpp instead of MmapFixedSuperNoReserve.

See golang/go#53539 (comment)
for context; intended to help with updating the syso for Go's
windows/amd64 race detector.

Differential Revision: https://reviews.llvm.org/D128909
jproberts pushed a commit to jproberts/go that referenced this issue Aug 10, 2022
As of LLVM rev 41cb504b7c4b18ac15830107431a0c1eec73a6b2, the
race detector runtime now refers to things in the windows
synchronization library, hence when doing windows internal
linking, at that library to the list of host archives that
we visit. The tsan code that makes the reference is here:

https://github.com/llvm/llvm-project/blob/41cb504b7c4b18ac15830107431a0c1eec73a6b2/compiler-rt/lib/sanitizer_common/sanitizer_win.cpp#L48
https://github.com/llvm/llvm-project/blob/41cb504b7c4b18ac15830107431a0c1eec73a6b2/compiler-rt/lib/sanitizer_common/sanitizer_win.cpp#L834

Note that libsynchronization.a is not guaranteed to be available on
all windows systems, so in the external linking case, check for its
existence before adding "-lsynchronization" to the external linker
args.

Updates golang#53539.

Change-Id: I433c95c869915693d59e9c1082d5b8a11da1fc8c
Reviewed-on: https://go-review.googlesource.com/c/go/+/413817
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Than McIntosh <thanm@google.com>
jproberts pushed a commit to jproberts/go that referenced this issue Aug 10, 2022
This patch changes the default build mode from "pie" to "exe" when
building programs on windows with "-race" in effect. The Go command
already issues an error if users explicitly ask for -buildmode=pie in
combination with -race on windows, but wasn't revising the default
"pie" build mode if a specific buildmode was not requested.

Updates golang#53539.
Updates golang#35006.

Change-Id: I2f81a41a1d15a0b4f5ae943146175c5a1202cbe0
Reviewed-on: https://go-review.googlesource.com/c/go/+/416174
Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Than McIntosh <thanm@google.com>
jproberts pushed a commit to jproberts/go that referenced this issue Aug 10, 2022
Turn off PIE explicitly for windows/amd64 when -race is in effect,
since at the moment the race detector runtime doesn't seem to handle
PIE binaries correctly. Note that newer C compilers on windows
produce PIE binaries by default, so the Go linker needs to explicitly
turn off PIE when invoking the external linker in this case.

Updates golang#53539.

Change-Id: Ib990621f22cf61a5fa383584bab81d3dfd7552e8
Reviewed-on: https://go-review.googlesource.com/c/go/+/415676
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Than McIntosh <thanm@google.com>
Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
mem-frob pushed a commit to draperlaboratory/hope-llvm-project that referenced this issue Oct 7, 2022
Capture the computed shadow begin/end values at the point where the
shadow is first created and reuse those values on reset. Introduce new
windows-specific function "ZeroMmapFixedRegion" for zeroing out an
address space region previously returned by one of the MmapFixed*
routines; call this function (on windows) from DoResetImpl
tsan_rtl.cpp instead of MmapFixedSuperNoReserve.

See golang/go#53539 (comment)
for context; intended to help with updating the syso for Go's
windows/amd64 race detector.

Differential Revision: https://reviews.llvm.org/D128909
@thanm thanm modified the milestones: Go1.20, Go1.21 Nov 30, 2022
@thanm
Copy link
Contributor Author

thanm commented Nov 30, 2022

The work to finish out this task is now basically complete (see http://go.dev/cl/c/go/+/420197), but the CL didn't quite make it into the Go 1.20 release (it was blocked by the work to update the windows builders, which ran very late). I'm going to move this bug from 1.20 to 1.21 as a result; we should be able to submit the CL early in 1.21.

gopherbot pushed a commit to golang/build that referenced this issue Jan 12, 2023
Revise the recipe for building a windows race syso slightly to use
the existing compilers on the windows machine (in C:\godep\gcc64\bin)
as opposed to downloading GCC 5.X via "choco install". This requires
updating PATH following the refresh env.

Updates golang/go#35006.
Updates golang/go#53539.

Change-Id: I14c8491159f421f688f8d4b7c84250768d69ea42
Reviewed-on: https://go-review.googlesource.com/c/build/+/414475
Run-TryBot: Than McIntosh <thanm@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
gopherbot pushed a commit to golang/build that referenced this issue Jan 12, 2023
When testing a change to the Go repo that is needed to help build/test
a new version of the race detector, it helps to be able to do a
racebuild run from a specific Go revision plus a cherry-picked CL on
top of that revision. This patch adds a new flag ("-cherrypick") to
support that, also a "-checkout" that can be used to check out a
stack of pending changes on Gerrit.

Updates golang/go#35006.
Updates golang/go#53539.

Change-Id: Id6c508f21e11a445c89df8457dc6a65020eee0fb
Reviewed-on: https://go-review.googlesource.com/c/build/+/415674
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Than McIntosh <thanm@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
gopherbot pushed a commit to golang/build that referenced this issue Jan 12, 2023
When debugging new versions of the race detector runtime, it can be
useful to copy the newly built syso back to the local machine's Go
repo (from the gomote) even if race.bat/race.bash fails, so as to
analyze the syso or run other tests with it. Add a command line option
"-copyonfail" that attempts to perform the copy even if the script run
fails.

Updates golang/go#35006.
Updates golang/go#53539.

Change-Id: I688b8673b444d1b6d948f10ca2fa4ab109eade44
Reviewed-on: https://go-review.googlesource.com/c/build/+/415675
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Than McIntosh <thanm@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
@thanm
Copy link
Contributor Author

thanm commented Feb 9, 2023

Closing out this issue; we've now upgraded all of the *.syso's (including windows)

@thanm thanm closed this as completed Feb 9, 2023
arichardson pushed a commit to CTSRD-CHERI/compiler-rt that referenced this issue Sep 12, 2023
Capture the computed shadow begin/end values at the point where the
shadow is first created and reuse those values on reset. Introduce new
windows-specific function "ZeroMmapFixedRegion" for zeroing out an
address space region previously returned by one of the MmapFixed*
routines; call this function (on windows) from DoResetImpl
tsan_rtl.cpp instead of MmapFixedSuperNoReserve.

See golang/go#53539 (comment)
for context; intended to help with updating the syso for Go's
windows/amd64 race detector.

Differential Revision: https://reviews.llvm.org/D128909
arichardson pushed a commit to CTSRD-CHERI/llvm-project that referenced this issue Jan 9, 2024
Capture the computed shadow begin/end values at the point where the
shadow is first created and reuse those values on reset. Introduce new
windows-specific function "ZeroMmapFixedRegion" for zeroing out an
address space region previously returned by one of the MmapFixed*
routines; call this function (on windows) from DoResetImpl
tsan_rtl.cpp instead of MmapFixedSuperNoReserve.

See golang/go#53539 (comment)
for context; intended to help with updating the syso for Go's
windows/amd64 race detector.

Differential Revision: https://reviews.llvm.org/D128909
@golang golang locked and limited conversation to collaborators Feb 9, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

4 participants