Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

misc/cgo/test: 'signal arrived during external code execution' with updated windows/386 builder #53540

Closed
thanm opened this issue Jun 24, 2022 · 10 comments
Assignees
Labels
NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@thanm
Copy link
Contributor

thanm commented Jun 24, 2022

What version of Go are you using (go version)?

main branch tip
go version go1.19-pre4 cl/455575533 +12f49fe0ed

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

windows/386, specifically the new builder with update windows image and compilers, for example windows-386-2012-newcc.

What did you do?

ran the misc/cgo/test as part of all.bat

What did you expect to see?

clean run

What did you see instead?

Representative slowbot failure:

https://storage.googleapis.com/go-build-log/6a2e5140/windows-386-2012-newcc_2be6fdb8.log

The test fails with this failure mode:

test run output with stack traces
../misc/cgo/test

ok misc/cgo/test 12.649s
Exception 0xc0000005 0x8 0x424448b 0x424448b
PC=0x424448b
signal arrived during external code execution

runtime.cgocall(0x403170, 0x128a4f70)
C:/workdir/go/src/runtime/cgocall.go:158 +0x58 fp=0x128a4f5c sp=0x128a4f44 pc=0x40d9f8
misc/cgo/test._Cfunc_scatter()
_cgo_gotypes.go:2043 +0x2f fp=0x128a4f70 sp=0x128a4f5c pc=0x52134f
misc/cgo/test.test1635(0x129083c0)
C:/workdir/go/misc/cgo/test/test.go:1254 +0x1e fp=0x128a4f94 sp=0x128a4f70 pc=0x542b0e
misc/cgo/test.Test1635(0x129083c0)
C:/workdir/go/misc/cgo/test/cgo_test.go:15 +0x21 fp=0x128a4f9c sp=0x128a4f94 pc=0x51ceb1
testing.tRunner(0x129083c0, 0x584428)
C:/workdir/go/src/testing/testing.go:1446 +0x113 fp=0x128a4fe4 sp=0x128a4f9c pc=0x4e0693
testing.(*T).Run.func1()
C:/workdir/go/src/testing/testing.go:1493 +0x2e fp=0x128a4ff0 sp=0x128a4fe4 pc=0x4e148e
runtime.goexit()
C:/workdir/go/src/runtime/asm_386.s:1326 +0x1 fp=0x128a4ff4 sp=0x128a4ff0 pc=0x473291
created by testing.(*T).Run
C:/workdir/go/src/testing/testing.go:1493 +0x374

goroutine 1 [chan receive]:
runtime.gopark(0x5859cc, 0x12902270, 0xe, 0x17, 0x2)
C:/workdir/go/src/runtime/proc.go:363 +0xff fp=0x1292bcd4 sp=0x1292bcc0 pc=0x446bdf
runtime.chanrecv(0x12902240, 0x1292bd53, 0x1)
C:/workdir/go/src/runtime/chan.go:583 +0x3f2 fp=0x1292bd1c sp=0x1292bcd4 pc=0x410282
runtime.chanrecv1(0x12902240, 0x1292bd53)
C:/workdir/go/src/runtime/chan.go:442 +0x1c fp=0x1292bd30 sp=0x1292bd1c pc=0x40fe8c
testing.(*T).Run(0x128840f0, {0x576e27, 0x8}, 0x584428)
C:/workdir/go/src/testing/testing.go:1494 +0x393 fp=0x1292bd7c sp=0x1292bd30 pc=0x4e1413
testing.runTests.func1(0x128840f0)
C:/workdir/go/src/testing/testing.go:1846 +0x5e fp=0x1292bda0 sp=0x1292bd7c pc=0x4e31ae
testing.tRunner(0x128840f0, 0x1292be38)
C:/workdir/go/src/testing/testing.go:1446 +0x113 fp=0x1292bde8 sp=0x1292bda0 pc=0x4e0693
testing.runTests(0x12898000, {0x6af760, 0x59, 0x59}, {0xc0a55481db90a338, 0x8bb304cc19, 0x6b2260})
C:/workdir/go/src/testing/testing.go:1844 +0x3a6 fp=0x1292be4c sp=0x1292bde8 pc=0x4e3096
testing.(*M).Run(0x12894000)
C:/workdir/go/src/testing/testing.go:1726 +0x680 fp=0x1292bf88 sp=0x1292be4c pc=0x4e1c30
main.main()
_testmain.go:229 +0x185 fp=0x1292bfc4 sp=0x1292bf88 pc=0x549785
runtime.main()
C:/workdir/go/src/runtime/proc.go:250 +0x231 fp=0x1292bff0 sp=0x1292bfc4 pc=0x446821
runtime.goexit()
C:/workdir/go/src/runtime/asm_386.s:1326 +0x1 fp=0x1292bff4 sp=0x1292bff0 pc=0x473291

goroutine 2 [force gc (idle)]:
runtime.gopark(0x585aa4, 0x6b2020, 0x11, 0x14, 0x1)
C:/workdir/go/src/runtime/proc.go:363 +0xff fp=0x1282ffdc sp=0x1282ffc8 pc=0x446bdf
runtime.goparkunlock(...)
C:/workdir/go/src/runtime/proc.go:369
runtime.forcegchelper()
C:/workdir/go/src/runtime/proc.go:302 +0xc3 fp=0x1282fff0 sp=0x1282ffdc pc=0x446a63
runtime.goexit()
C:/workdir/go/src/runtime/asm_386.s:1326 +0x1 fp=0x1282fff4 sp=0x1282fff0 pc=0x473291
created by runtime.init.5
C:/workdir/go/src/runtime/proc.go:290 +0x23

goroutine 3 [GC sweep wait]:
runtime.gopark(0x585aa4, 0x6b21c0, 0xc, 0x14, 0x1)
C:/workdir/go/src/runtime/proc.go:363 +0xff fp=0x12830fd4 sp=0x12830fc0 pc=0x446bdf
runtime.goparkunlock(...)
C:/workdir/go/src/runtime/proc.go:369
runtime.bgsweep(0x12816100)
C:/workdir/go/src/runtime/mgcsweep.go:278 +0x96 fp=0x12830fe8 sp=0x12830fd4 pc=0x42e916
runtime.gcenable.func1()
C:/workdir/go/src/runtime/mgc.go:178 +0x27 fp=0x12830ff0 sp=0x12830fe8 pc=0x420577
runtime.goexit()
C:/workdir/go/src/runtime/asm_386.s:1326 +0x1 fp=0x12830ff4 sp=0x12830ff0 pc=0x473291
created by runtime.gcenable
C:/workdir/go/src/runtime/mgc.go:178 +0x7c

goroutine 4 [GC scavenge wait]:
runtime.gopark(0x585aa4, 0x6b2320, 0xd, 0x14, 0x2)
C:/workdir/go/src/runtime/proc.go:363 +0xff fp=0x12831fb8 sp=0x12831fa4 pc=0x446bdf
runtime.goparkunlock(...)
C:/workdir/go/src/runtime/proc.go:369
runtime.(*scavengerState).park(0x6b2320)
C:/workdir/go/src/runtime/mgcscavenge.go:389 +0x5f fp=0x12831fcc sp=0x12831fb8 pc=0x42c41f
runtime.bgscavenge(0x12816100)
C:/workdir/go/src/runtime/mgcscavenge.go:617 +0x4b fp=0x12831fe8 sp=0x12831fcc pc=0x42caab
runtime.gcenable.func2()
C:/workdir/go/src/runtime/mgc.go:179 +0x27 fp=0x12831ff0 sp=0x12831fe8 pc=0x420537
runtime.goexit()
C:/workdir/go/src/runtime/asm_386.s:1326 +0x1 fp=0x12831ff4 sp=0x12831ff0 pc=0x473291
created by runtime.gcenable
C:/workdir/go/src/runtime/mgc.go:179 +0xc1

goroutine 18 [finalizer wait]:
runtime.gopark(0x585aa4, 0x6d90bc, 0x10, 0x14, 0x1)
C:/workdir/go/src/runtime/proc.go:363 +0xff fp=0x1282ef90 sp=0x1282ef7c pc=0x446bdf
runtime.goparkunlock(...)
C:/workdir/go/src/runtime/proc.go:369
runtime.runfinq()
C:/workdir/go/src/runtime/mfinal.go:180 +0xfb fp=0x1282eff0 sp=0x1282ef90 pc=0x41f7cb
runtime.goexit()
C:/workdir/go/src/runtime/asm_386.s:1326 +0x1 fp=0x1282eff4 sp=0x1282eff0 pc=0x473291
created by runtime.createfing
C:/workdir/go/src/runtime/mfinal.go:157 +0x54

goroutine 19 [syscall]:
runtime.cgocall(0x4034c0, 0x1282afe4)
C:/workdir/go/src/runtime/cgocall.go:158 +0x58 fp=0x1282afd0 sp=0x1282afb8 pc=0x40d9f8
misc/cgo/test._Cfunc_usleep(0x2710)
_cgo_gotypes.go:2373 +0x37 fp=0x1282afe4 sp=0x1282afd0 pc=0x5217a7
misc/cgo/test.lockOSThreadCallback.func1()
C:/workdir/go/misc/cgo/test/testx.go:359 +0x28 fp=0x1282aff0 sp=0x1282afe4 pc=0x548988
runtime.goexit()
C:/workdir/go/src/runtime/asm_386.s:1326 +0x1 fp=0x1282aff4 sp=0x1282aff0 pc=0x473291
created by misc/cgo/test.lockOSThreadCallback
C:/workdir/go/misc/cgo/test/testx.go:359 +0x87
eax 0x18feb8
ebx 0x128a4f70
ecx 0x6b2854
edx 0x128a4f40
edi 0xc0
esi 0x5b502a
ebp 0x6b2820
esp 0x18fea0
eip 0x424448b
eflags 0x10206
cs 0x23
fs 0x53
gs 0x2b
FAIL misc/cgo/test 0.142s
FAIL
2022/06/23 21:16:43 Failed: exit status 1
ok misc/cgo/test 12.454s
ok misc/cgo/test 12.651s
ok misc/cgo/test 5.689s
skipped due to earlier error
go tool dist: FAILED

I am not sure what the root cause is here, needs to be debugged.

@thanm thanm added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jun 24, 2022
@thanm thanm added this to the Go1.20 milestone Jun 24, 2022
@thanm thanm self-assigned this Jun 24, 2022
@thanm
Copy link
Contributor Author

thanm commented Jun 29, 2022

Cherry helped me with some debugging and was able to reduce the test case down to this:

package main

// #include <stdio.h>
// int x = 123;
// void F() { printf("hello %d\n", x); }
import "C"

func main() {
  C.F()
}

By comparing external linking (works) with internal linking (crashes) she found that the C function printf generated from the C fragment above contains an indirect call to the data symbol "__imp____acrt_iob_func". Here's the code:

$ llvm-objdump-13 -t -l -d -r xxx/tmpdir/go-build2751358927/b001/_x002.o
...
00000020 <_printf>:
; _printf():
; C:/godep/gcc32/include/stdio.h:369
      20: 50                           	pushl	%eax
      21: 8d 44 24 0c                  	leal	12(%esp), %eax
; C:/godep/gcc32/include/stdio.h:371
      25: 89 04 24                     	movl	%eax, (%esp)
; C:/godep/gcc32/include/stdio.h:372
      28: 6a 01                        	pushl	$1
      2a: ff 15 00 00 00 00            	calll	*0
			0000002c:  IMAGE_REL_I386_DIR32	__imp____acrt_iob_func
      30: 83 c4 04                     	addl	$4, %esp
      33: ff 34 24                     	pushl	(%esp)
      36: 68 00 00 00 00               	pushl	$0
			00000037:  IMAGE_REL_I386_DIR32	.rdata
      3b: 50                           	pushl	%eax
      3c: e8 00 00 00 00               	calll	0x41 <_printf+0x21>
			0000003d:  IMAGE_REL_I386_REL32	___mingw_vfprintf
      41: 83 c4 0c                     	addl	$12, %esp
; C:/godep/gcc32/include/stdio.h:374
      44: 58                           	popl	%eax
      45: c3                           	retl
      46: 66 2e 0f 1f 84 00 00 00 00 00	nopw	%cs:(%eax,%eax)

Note the call at _printf+0x2a: it loads the data symbol and then branches to the loaded value.

However this data symbol is being renamed to "__acrt_iob_func" (a text symbol) as a result of CL 382837. Prior to CL 382837 the host object loader would always rename "__imp_XXX" to "XXX" except for the specific case of "__imp____acrt_iob_func"; after 382837 we always do the renaming except for when there is a local def in the same PE file.

Because of the bad rename, the code loads up data from the function itself and then tries to branch to it (crash).

I tried restoring the special case for "__imp____acrt_iob_func" that was present before CL 382837. With that change in place, there is no longer a crash at that spot, but a new crash with a very similar failure mode:

C:\workdir\xxx>cgohimom.exe
Exception 0xc0000005 0x8 0x4d8208 0x4d8208
PC=0x4d8208
signal arrived during external code execution

runtime.cgocall(0x401130, 0x1243bfc0)
	C:/workdir/go/src/runtime/cgocall.go:158 +0x58 fp=0x1243bfac sp=0x1243bf94 pc=0x4098b8
main._Cfunc_F()
	_cgo_gotypes.go:41 +0x2f fp=0x1243bfc0 sp=0x1243bfac pc=0x46a05f
main.main()
	C:/workdir/xxx/cgohimom.go:9 +0x17 fp=0x1243bfc4 sp=0x1243bfc0 pc=0x46a087
runtime.main()
	C:/workdir/go/src/runtime/proc.go:250 +0x231 fp=0x1243bff0 sp=0x1243bfc4 pc=0x43b461
runtime.goexit()
	C:/workdir/go/src/runtime/asm_386.s:1326 +0x1 fp=0x1243bff4 sp=0x1243bff0 pc=0x45f021

I inspected the resulting executable. The call to "0x4d8208" is happening from the mingw function ___mingw_pformat. Here's a partial disassembly of the offending function from the linked binary:

004015e0 <libmingwex(.text)>:
; __mingw_pformat():
  4015e0: 55                           	pushl	%ebp
  4015e1: 89 e5                        	movl	%esp, %ebp
  4015e3: 53                           	pushl	%ebx
  4015e4: 57                           	pushl	%edi
  4015e5: 56                           	pushl	%esi
  4015e6: 83 e4 f8                     	andl	$-8, %esp
  4015e9: 81 ec b8 00 00 00            	subl	$184, %esp
  4015ef: 8b 45 18                     	movl	24(%ebp), %eax
  4015f2: 89 44 24 10                  	movl	%eax, 16(%esp)
  4015f6: 8b 5d 14                     	movl	20(%ebp), %ebx
  4015f9: 8b 75 10                     	movl	16(%ebp), %esi
  4015fc: 8b 7d 0c                     	movl	12(%ebp), %edi
  4015ff: e8 04 6c 0d 00               	calll	0x4d8208
  401604: 8b 00                        	movl	(%eax), %eax
  401606: 89 44 24 7c                  	movl	%eax, 124(%esp)
...

I tracked down the object file that this is coming from in libmingwex.a, and dumped out the function there with relocations:

00000000 <___mingw_pformat>:
; ___mingw_pformat():
; /build/mingw-w64/mingw-w64-crt/build-i686/../stdio/mingw_pformat.c:2413
       0: 55                           	pushl	%ebp
       1: 89 e5                        	movl	%esp, %ebp
       3: 53                           	pushl	%ebx
       4: 57                           	pushl	%edi
       5: 56                           	pushl	%esi
       6: 83 e4 f8                     	andl	$-8, %esp
       9: 81 ec b8 00 00 00            	subl	$184, %esp
       f: 8b 45 18                     	movl	24(%ebp), %eax
      12: 89 44 24 10                  	movl	%eax, 16(%esp)
      16: 8b 5d 14                     	movl	20(%ebp), %ebx
      19: 8b 75 10                     	movl	16(%ebp), %esi
      1c: 8b 7d 0c                     	movl	12(%ebp), %edi
; /build/mingw-w64/mingw-w64-crt/build-i686/../stdio/mingw_pformat.c:2415
      1f: e8 00 00 00 00               	calll	0x24 <___mingw_pformat+0x24>
			00000020:  IMAGE_REL_I386_REL32	__errno
      24: 8b 00                        	movl	(%eax), %eax
      26: 89 44 24 7c                  	movl	%eax, 124(%esp)
; /build/mingw-w64/mingw-w64-crt/build-i686/../stdio/mingw_pformat.c:2418
...

I don't have a good idea as to what went wrong here. The relocation is to "__errno", but it looks as though we already have a SDYNIMPORT symbol for "_errno" set up... and I don't see any other bogus _errno definitions. The address in question (0x4d8208) is somewhere in the .data section apparently:

Sections:
Idx Name               Size     VMA      Type
  0 .text              00069190 00401000 TEXT
  1 .rdata             0006c7f4 0046b000 DATA
  2 .data              00002a00 004d8000 DATA
  3 .zdebug_abbrev     00000127 00507000 DATA
...

As to why the relocation targeting _errno is being resolved to some place in the data section, I am really not sure (scratching my head over that one).

@thanm
Copy link
Contributor Author

thanm commented Jun 29, 2022

I looked at what the external linker is doing for the __errno reference in ___mingw_pformat. It appears to be creating a thunk and then calling the thunk:

// the thunk:
0046a1c8 <__errno>:
; __errno():
  46a1c8: ff 25 c0 5c 4d 00            	jmpl	*5070016

// the code in __mingw_pformat:
...
  4644f6: 8b 5d 14                     	movl	20(%ebp), %ebx
  4644f9: 8b 75 10                     	movl	16(%ebp), %esi
  4644fc: 8b 7d 0c                     	movl	12(%ebp), %edi
; /build/mingw-w64/mingw-w64-crt/build-i686/../stdio/mingw_pformat.c:2415
  4644ff: e8 c4 5c 00 00               	calll	0x46a1c8 <__errno>
  464504: 8b 00                        	movl	(%eax), %eax
  464506: 89 44 24 7c                  	movl	%eax, 124(%esp)

As far as I know there isn't any code in the Go linker that creates these sorts of import thunks for x86. Hmm.

@cherrymui
Copy link
Member

It looks similar to generating PLT stubs for ELF, which the Go linker can do. Maybe we could do something similar?

@thanm
Copy link
Contributor Author

thanm commented Jul 7, 2022

It looks similar to generating PLT stubs for ELF, which the Go linker can do. Maybe we could do something similar?

Agree. I am looking over the corresponding code in lld, e.g. https://github.com/llvm/llvm-project/blob/83d59e05b201760e3f364ff6316301d347cbad95/lld/COFF/InputFiles.cpp#L974 to see if I can use that as a model as well.

@gopherbot
Copy link

Change https://go.dev/cl/417554 mentions this issue: cmd/link: create import thunks when reading PE host objects

@gopherbot
Copy link

Change https://go.dev/cl/417556 mentions this issue: cmd/link: revise DLL import symbol handling

@gopherbot
Copy link

Change https://go.dev/cl/451738 mentions this issue: cmd/link: revise DLL import symbol handling

@gopherbot
Copy link

Change https://go.dev/cl/451816 mentions this issue: misc/cgo/testcshared: handle unsuffixed dlltool path

gopherbot pushed a commit that referenced this issue Nov 19, 2022
Adapt the testcshared tests to handle the case where the path output
by invoking

  gcc -print-prog-name=dlltool

is a path lacking the final ".exe" suffix (this seems to be what clang
is doing); tack it on before using if this is the case.

Updates #35006.
Updates #53540.

Change-Id: I04fb7b9fc90677880b1ced4a4ad2a8867a3f5f86
Reviewed-on: https://go-review.googlesource.com/c/go/+/451816
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Bryan Mills <bcmills@google.com>
Run-TryBot: Than McIntosh <thanm@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
gopherbot pushed a commit that referenced this issue Nov 19, 2022
This patch reworks the handling of DLL import symbols in the PE host
object loader to ensure that the Go linker can deal with them properly
during internal linking.

Prior to this point the strategy was to immediately treat an import
symbol reference of the form "__imp__XXX" as if it were a reference to
the corresponding DYNIMPORT symbol XXX, except for certain special
cases. This worked for the most part, but ran into problems in
situations where the target ("XXX") wasn't a previously created
DYNIMPORT symbol (and when these problems happened, the root cause was
not always easy to see).

The new strategy is to not do any renaming or forwarding immediately,
but to delay handling until host object loading is complete. At that
point we make a scan through the newly introduced text+data sections
looking at the relocations that target import symbols, forwarding
the references to the corresponding DYNIMPORT sym where appropriate
and where there are direct refs to the DYNIMPORT syms, tagging them
for stub generation later on.

Updates #35006.
Updates #53540.

Change-Id: I2d42b39141ae150a9f82ecc334001749ae8a3b4a
Reviewed-on: https://go-review.googlesource.com/c/go/+/451738
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Than McIntosh <thanm@google.com>
@gopherbot
Copy link

Change https://go.dev/cl/452675 mentions this issue: misc/cgo/testcshared: reapply CL 451816

gopherbot pushed a commit that referenced this issue Nov 22, 2022
I accidentally reverted its edits with a bad cherry-pick in CL 452457.

This should re-fix the windows-.*-newcc builders that regressed at
that change.

Updates #47257.
Updates #35006.
Updates #53540.

Change-Id: I5818416af7c4c8c1593c36aa0198331b42b6c7d7
Reviewed-on: https://go-review.googlesource.com/c/go/+/452675
Run-TryBot: Bryan Mills <bcmills@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Bryan Mills <bcmills@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
@thanm
Copy link
Contributor Author

thanm commented Dec 6, 2022

Closing out this bug; CL 451738 takes care of it.

@thanm thanm closed this as completed Dec 6, 2022
@dmitshur dmitshur added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Aug 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

4 participants