Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: setevent failed; errno=6, fatal error: runtime.semawakeup #43720

Closed
egonelbre opened this issue Jan 15, 2021 · 46 comments
Closed

runtime: setevent failed; errno=6, fatal error: runtime.semawakeup #43720

egonelbre opened this issue Jan 15, 2021 · 46 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Windows release-blocker
Milestone

Comments

@egonelbre
Copy link
Contributor

egonelbre commented Jan 15, 2021

While trying to reproduce #42637 locally and ran into crashes, I was running os tests on Windows with:

go test -run . -count 100000  -failfast

Using e125ccd:

AMD Ryzen Threadripper 2950X 16-Core Processor    3.50 GHz
Windows 10 Pro
OS build	19042.746

The tests failed with:

c:\Go.tip\src\os>go test -run . -count 100000  -failfast
runtime: setevent failed; errno=6
fatal error: runtime.semawakeup

runtime stack:
runtime.throw(0xb02470, 0x12)
        C:/Go.tip/src/runtime/panic.go:1112 +0x79
runtime.semawakeup.func1()
        C:/Go.tip/src/runtime/os_windows.go:796 +0x85
runtime.semawakeup(0xc0003d6000)
        C:/Go.tip/src/runtime/os_windows.go:794 +0x51
runtime.notewakeup(0xc0003d6120)
        C:/Go.tip/src/runtime/lock_sema.go:161 +0x65
runtime.startm(0xc000022000, 0x959000)
        C:/Go.tip/src/runtime/proc.go:2318 +0x1d9
runtime.handoffp(0xc000022000)
        C:/Go.tip/src/runtime/proc.go:2338 +0x305
runtime.retake(0x125d81032f0, 0xc8)
        C:/Go.tip/src/runtime/proc.go:5287 +0x17d
runtime.sysmon()
        C:/Go.tip/src/runtime/proc.go:5195 +0x1a5
runtime.mstart1()
        C:/Go.tip/src/runtime/proc.go:1275 +0xda
runtime.mstart()
        C:/Go.tip/src/runtime/proc.go:1240 +0x6d

goroutine 1 [chan receive]:
testing.(*T).Run(0xc0000f3500, 0xaff734, 0xb, 0xb0f9c0, 0x9ab001)
        C:/Go.tip/src/testing/testing.go:1240 +0x2da
testing.runTests.func1(0xc000505680)
        C:/Go.tip/src/testing/testing.go:1512 +0x7f
testing.tRunner(0xc000505680, 0xc0002f9de0)
        C:/Go.tip/src/testing/testing.go:1194 +0xef
testing.runTests(0xc000004078, 0xc2c020, 0x8f, 0x8f, 0xbff87bdac4307fc4, 0x8bb2f1ec11, 0xc2f520, 0xb0056c)
        C:/Go.tip/src/testing/testing.go:1510 +0x310
testing.(*M).Run(0xc0000c2080, 0x0)
        C:/Go.tip/src/testing/testing.go:1418 +0x1f5
main.main()
        _testmain.go:361 +0x145

goroutine 1028 [chan receive]:
testing.(*T).Parallel(0xc0000cd680)
        C:/Go.tip/src/testing/testing.go:1060 +0x119
os_test.TestOpenFileKeepsPermissions(0xc0000cd680)
        C:/Go.tip/src/os/os_test.go:2661 +0x47
testing.tRunner(0xc0000cd680, 0xb0fbc0)
        C:/Go.tip/src/testing/testing.go:1194 +0xef
created by testing.(*T).Run
        C:/Go.tip/src/testing/testing.go:1239 +0x2b3

goroutine 1481 [sleep]:
time.Sleep(0x4c4b40)
        C:/Go.tip/src/runtime/time.go:193 +0xe5
os.(*Process).wait(0xc00028d1a0, 0xc000500000, 0x0, 0x0)
        C:/Go.tip/src/os/exec_windows.go:45 +0x41b
os.(*Process).Wait(...)
        C:/Go.tip/src/os/exec.go:129
os/exec.(*Cmd).Wait(0xc0005cedc0, 0x0, 0x0)
        C:/Go.tip/src/os/exec/exec.go:507 +0x6a
os/exec.(*Cmd).Run(0xc0005cedc0, 0xc0000aca50, 0xc0005cedc0)
        C:/Go.tip/src/os/exec/exec.go:341 +0x66
os/exec.(*Cmd).CombinedOutput(0xc0005cedc0, 0x26, 0xc00008b390, 0x3, 0x3, 0xc0005cedc0)
        C:/Go.tip/src/os/exec/exec.go:567 +0x9b
os_test.TestCmdArgs(0xc0000f3500)
        C:/Go.tip/src/os/os_windows_test.go:896 +0x76d
testing.tRunner(0xc0000f3500, 0xb0f9c0)
        C:/Go.tip/src/testing/testing.go:1194 +0xef
created by testing.(*T).Run
        C:/Go.tip/src/testing/testing.go:1239 +0x2b3

goroutine 1522 [runnable]:
os/exec.(*Cmd).Start.func1(0xc0005cedc0, 0xc00044e020)
        C:/Go.tip/src/os/exec/exec.go:440
created by os/exec.(*Cmd).Start
        C:/Go.tip/src/os/exec/exec.go:440 +0x6b0
exit status 2
FAIL    os      7.459s

It seems to take some time before it fails, here are two additional traces:

runtime: setevent failed; errno=6
fatal error: runtime.semawakeup

runtime stack:
runtime.throw(0x1162470, 0x12)
        C:/Go.tip/src/runtime/panic.go:1112 +0x79
runtime.semawakeup.func1()
        C:/Go.tip/src/runtime/os_windows.go:796 +0x85
runtime.semawakeup(0xc0008af800)
        C:/Go.tip/src/runtime/os_windows.go:794 +0x51
runtime.notewakeup(0xc0008af920)
        C:/Go.tip/src/runtime/lock_sema.go:161 +0x65
runtime.startm(0x0, 0xc000020001)
        C:/Go.tip/src/runtime/proc.go:2318 +0x1d9
runtime.wakep()
        C:/Go.tip/src/runtime/proc.go:2398 +0x6d
runtime.resetspinning()
        C:/Go.tip/src/runtime/proc.go:2941 +0x69
runtime.schedule()
        C:/Go.tip/src/runtime/proc.go:3097 +0x2b9
runtime.park_m(0xc000485500)
        C:/Go.tip/src/runtime/proc.go:3239 +0xb2
runtime.mcall(0x0)
        C:/Go.tip/src/runtime/asm_amd64.s:323 +0x5e

goroutine 1 [chan receive]:
testing.(*T).Run(0xc00063c780, 0x115f734, 0xb, 0x116f9c0, 0x100b001)
        C:/Go.tip/src/testing/testing.go:1240 +0x2da
testing.runTests.func1(0xc00068a180)
        C:/Go.tip/src/testing/testing.go:1512 +0x7f
testing.tRunner(0xc00068a180, 0xc0005a5de0)
        C:/Go.tip/src/testing/testing.go:1194 +0xef
testing.runTests(0xc000004078, 0x128c020, 0x8f, 0x8f, 0xbff87be3e8de6b4c, 0x8bb2f15161, 0x128f520, 0x116056c)
        C:/Go.tip/src/testing/testing.go:1510 +0x310
testing.(*M).Run(0xc0000c2080, 0x0)
        C:/Go.tip/src/testing/testing.go:1418 +0x1f5
main.main()
        _testmain.go:361 +0x145

goroutine 10836 [chan receive]:
testing.(*T).Parallel(0xc00088d980)
        C:/Go.tip/src/testing/testing.go:1060 +0x119
os_test.TestOpenFileKeepsPermissions(0xc00088d980)
        C:/Go.tip/src/os/os_test.go:2661 +0x47
testing.tRunner(0xc00088d980, 0x116fbc0)
        C:/Go.tip/src/testing/testing.go:1194 +0xef
created by testing.(*T).Run
        C:/Go.tip/src/testing/testing.go:1239 +0x2b3

goroutine 11301 [sleep]:
time.Sleep(0x4c4b40)
        C:/Go.tip/src/runtime/time.go:193 +0xe5
os.(*Process).wait(0xc0009e4000, 0xc000596000, 0x0, 0x0)
        C:/Go.tip/src/os/exec_windows.go:45 +0x41b
os.(*Process).Wait(...)
        C:/Go.tip/src/os/exec.go:129
os/exec.(*Cmd).Wait(0xc000106b00, 0x0, 0x0)
        C:/Go.tip/src/os/exec/exec.go:507 +0x6a
os/exec.(*Cmd).Run(0xc000106b00, 0xc00087a390, 0xc000106b00)
        C:/Go.tip/src/os/exec/exec.go:341 +0x66
os/exec.(*Cmd).CombinedOutput(0xc000106b00, 0x26, 0xc000270d50, 0x3, 0x3, 0xc000106b00)
        C:/Go.tip/src/os/exec/exec.go:567 +0x9b
os_test.TestCmdArgs(0xc00063c780)
        C:/Go.tip/src/os/os_windows_test.go:896 +0x76d
testing.tRunner(0xc00063c780, 0x116f9c0)
        C:/Go.tip/src/testing/testing.go:1194 +0xef
created by testing.(*T).Run
        C:/Go.tip/src/testing/testing.go:1239 +0x2b3

goroutine 11330 [runnable]:
os/exec.(*Cmd).Start.func1(0xc000106b00, 0xc000280bc0)
        C:/Go.tip/src/os/exec/exec.go:440
created by os/exec.(*Cmd).Start
        C:/Go.tip/src/os/exec/exec.go:440 +0x6b0
exit status 2
FAIL    os      65.757s
runtime: setevent failed; errno=6
fatal error: runtime.semawakeup
runtime: waitforsingleobject wait_failed; errno=6
fatal error: runtime.semasleep wait_failed
runtime: waitforsingleobject wait_failed; errno=6
fatal error: runtime.semasleep wait_failed
panic during panic

runtime stack:
runtime.throw(0x387328, 0x1d)
        C:/Go.tip/src/runtime/panic.go:1112 +0x79
runtime.semasleep.func2()
        C:/Go.tip/src/runtime/os_windows.go:778 +0x85
runtime.semasleep(0xffffffffffffffff, 0x0)
        C:/Go.tip/src/runtime/os_windows.go:776 +0x1ec
runtime.lock2(0x4f4310)
        C:/Go.tip/src/runtime/lock_sema.go:89 +0x145
runtime.lockWithRank(0x4f4310, 0x0)
        C:/Go.tip/src/runtime/lockrank_off.go:22 +0x32
runtime.lock(0x4f4310)
        C:/Go.tip/src/runtime/lock_sema.go:36 +0x3b
runtime.startpanic_m(0x1)
        C:/Go.tip/src/runtime/panic.go:1256 +0x145
runtime.fatalthrow.func1()
        C:/Go.tip/src/runtime/panic.go:1165 +0x45
runtime.fatalthrow()
        C:/Go.tip/src/runtime/panic.go:1164 +0x5e
runtime.throw(0x387328, 0x1d)
        C:/Go.tip/src/runtime/panic.go:1112 +0x79
runtime.semasleep.func2()
        C:/Go.tip/src/runtime/os_windows.go:778 +0x85
runtime.semasleep(0xffffffffffffffff, 0xffffffff)
        C:/Go.tip/src/runtime/os_windows.go:776 +0x1ec
runtime.notetsleep_internal(0x4af7b8, 0x186a0, 0xc000429500, 0x173d2e0e394, 0x1b4a65b0598)
        C:/Go.tip/src/runtime/lock_sema.go:266 +0x1c5
runtime.notetsleep(0x4af7b8, 0x186a0, 0x0)
        C:/Go.tip/src/runtime/lock_sema.go:283 +0x66
runtime.forEachP(0x390190)
        C:/Go.tip/src/runtime/proc.go:1493 +0x332
runtime.gcMarkTermination.func4()
        C:/Go.tip/src/runtime/mgc.go:1765 +0x34
runtime.systemstack(0x0)
        C:/Go.tip/src/runtime/asm_amd64.s:375 +0x6b
runtime.mstart()
        C:/Go.tip/src/runtime/proc.go:1219

goroutine 117 [running]:
runtime.systemstack_switch()
        C:/Go.tip/src/runtime/asm_amd64.s:335 fp=0xc000599d50 sp=0xc000599d48 pc=0x200d20
runtime.gcMarkTermination(0x3ff8cb29ca5050fc)
        C:/Go.tip/src/runtime/mgc.go:1764 +0x436 fp=0xc000599f08 sp=0xc000599d50 pc=0x1ada36
runtime.gcMarkDone()
        C:/Go.tip/src/runtime/mgc.go:1623 +0x285 fp=0xc000599f60 sp=0xc000599f08 pc=0x1ad545
runtime.gcBgMarkWorker()
        C:/Go.tip/src/runtime/mgc.go:2045 +0x2e5 fp=0xc000599fe0 sp=0xc000599f60 pc=0x1ae6a5
runtime.goexit()
        C:/Go.tip/src/runtime/asm_amd64.s:1367 +0x1 fp=0xc000599fe8 sp=0xc000599fe0 pc=0x202bc1
created by runtime.gcBgMarkStartWorkers
        C:/Go.tip/src/runtime/mgc.go:1835 +0x3e

goroutine 1 [chan receive]:

runtime stack:
runtime: waitforsingleobject wait_failed; errno=6
fatal error: runtime.semasleep wait_failedruntime: setevent failed; errno=6
fatal error: runtime.semawakeup
panic during panic

runtime stack:

stack trace unavailable
exit status 4
FAIL    os      119.629s

CC: @alexbrainman, @zx2c4

@egonelbre
Copy link
Contributor Author

Another slightly different one:

c:\Go.tip\src\os>go test -run . -count 100000  -failfast
runtime: waitforsingleobject wait_failed; errno=6
fatal error: runtime.semasleep wait_failed

runtime stack:
runtime.throw(0xaa7328, 0x1d)
        C:/Go.tip/src/runtime/panic.go:1112 +0x79
runtime.semasleep.func2()
        C:/Go.tip/src/runtime/os_windows.go:778 +0x85
runtime.semasleep(0xffffffffffffffff, 0x8f4cec)
        C:/Go.tip/src/runtime/os_windows.go:776 +0x1ec
runtime.notesleep(0xc00097a120)
        C:/Go.tip/src/runtime/lock_sema.go:181 +0xee
runtime.mPark()
        C:/Go.tip/src/runtime/proc.go:1309 +0x45
runtime.stopm()
        C:/Go.tip/src/runtime/proc.go:2222 +0xa5
runtime.findrunnable(0xc000029800, 0x0)
        C:/Go.tip/src/runtime/proc.go:2881 +0x74e
runtime.schedule()
        C:/Go.tip/src/runtime/proc.go:3090 +0x2d7
runtime.park_m(0xc000072900)
        C:/Go.tip/src/runtime/proc.go:3239 +0xb2
runtime.mcall(0x0)
        C:/Go.tip/src/runtime/asm_amd64.s:323 +0x5e

goroutine 1 [chan receive]:
testing.(*T).Run(0xc0004f6480, 0xa9f734, 0xb, 0xaaf9c0, 0x94b001)
        C:/Go.tip/src/testing/testing.go:1240 +0x2da
testing.runTests.func1(0xc000270300)
        C:/Go.tip/src/testing/testing.go:1512 +0x7f
testing.tRunner(0xc000270300, 0xc0004e7de0)
        C:/Go.tip/src/testing/testing.go:1194 +0xef
testing.runTests(0xc00011c060, 0xbcc020, 0x8f, 0x8f, 0xbff87cf38d452588, 0x8bb2f967b1, 0xbcf520, 0xaa056c)
        C:/Go.tip/src/testing/testing.go:1510 +0x310
testing.(*M).Run(0xc000154080, 0x0)
        C:/Go.tip/src/testing/testing.go:1418 +0x1f5
runtime: setevent failed; errno=6
runtime: waitforsingleobject wait_failed; errno=6runtime: setevent failed; errno=6
fatal error: runtime.semawakeup

fatal error: runtime.semasleep wait_failed
panic during panic

runtime stack:
runtime.throw(0xaa7328, 0x1d)
        C:/Go.tip/src/runtime/panic.go:1112 +0x79
runtime.semasleep.func2()
        C:/Go.tip/src/runtime/os_windows.go:778 +0x85
runtime.semasleep(0xffffffffffffffff, 0x0)
        C:/Go.tip/src/runtime/os_windows.go:776 +0x1ec
runtime.lock2(0xc14298)
        C:/Go.tip/src/runtime/lock_sema.go:89 +0x145
runtime.lockWithRank(...)
        C:/Go.tip/src/runtime/lockrank_off.go:22
runtime.lock(...)
        C:/Go.tip/src/runtime/lock_sema.go:36
runtime.printlock()
        C:/Go.tip/src/runtime/print.go:71 +0x78
runtime.gentraceback(0xffffffffffffffff, 0xffffffffffffffff, 0x0, 0xc000072000, 0x0, 0x0, 0x64, 0x0, 0x0, 0x0, ...)
        C:/Go.tip/src/runtime/traceback.go:434 +0x76a
runtime.traceback1(0xffffffffffffffff, 0xffffffffffffffff, 0x0, 0xc000072000, 0x0)
        C:/Go.tip/src/runtime/traceback.go:732 +0x105
runtime.traceback(...)
        C:/Go.tip/src/runtime/traceback.go:686
runtime.tracebackothers(0xc000586000)
        C:/Go.tip/src/runtime/traceback.go:951 +0x173
runtime.dopanic_m(0xc000586000, 0x8eb479, 0x57d7ffbf8, 0x1)
        C:/Go.tip/src/runtime/panic.go:1312 +0x1a5
runtime.fatalthrow.func1()
        C:/Go.tip/src/runtime/panic.go:1167 +0x67
runtime.fatalthrow()
        C:/Go.tip/src/runtime/panic.go:1164 +0x5e
runtime.throw(0xaa7328, 0x1d)
        C:/Go.tip/src/runtime/panic.go:1112 +0x79
runtime.semasleep.func2()
        C:/Go.tip/src/runtime/os_windows.go:778 +0x85
runtime.semasleep(0xffffffffffffffff, 0x8f4cec)
        C:/Go.tip/src/runtime/os_windows.go:776 +0x1ec
runtime.notesleep(0xc00097a120)
        C:/Go.tip/src/runtime/lock_sema.go:181 +0xee
runtime.mPark()
        C:/Go.tip/src/runtime/proc.go:1309 +0x45
runtime.stopm()
        C:/Go.tip/src/runtime/proc.go:2222 +0xa5
runtime.findrunnable(0xc000029800, 0x0)
        C:/Go.tip/src/runtime/proc.go:2881 +0x74e
runtime.schedule()
        C:/Go.tip/src/runtime/proc.go:3090 +0x2d7
runtime.park_m(0xc000072900)
        C:/Go.tip/src/runtime/proc.go:3239 +0xb2
runtime.mcall(0x0)
        C:/Go.tip/src/runtime/asm_amd64.s:323 +0x5e
runtime: waitforsingleobject wait_failed; errno=6
fatal error: runtime.semasleep wait_failed
stack trace unavailable
exit status 4
FAIL    os      60.762s

@zx2c4
Copy link
Contributor

zx2c4 commented Jan 15, 2021

runtime: setevent failed; errno=6

That's the error for invalid handle. Are we setevent'ing a closed handle somehow?

@zx2c4
Copy link
Contributor

zx2c4 commented Jan 15, 2021

I can't reproduce this so far. If you're able to trigger it easily, would you mind pasting the output for:

diff --git a/src/runtime/os_windows.go b/src/runtime/os_windows.go
index d389d38ab9..ad4e744759 100644
--- a/src/runtime/os_windows.go
+++ b/src/runtime/os_windows.go
@@ -792,7 +792,7 @@ func semasleep(ns int64) int32 {
 func semawakeup(mp *m) {
 	if stdcall1(_SetEvent, mp.waitsema) == 0 {
 		systemstack(func() {
-			print("runtime: setevent failed; errno=", getlasterror(), "\n")
+			print("runtime: setevent failed; errno=", getlasterror(), ", mp.waitsema=", mp.waitsema, "\n")
 			throw("runtime.semawakeup")
 		})
 	}

@zx2c4
Copy link
Contributor

zx2c4 commented Jan 15, 2021

I reproduced it too finally. (Will try again with my own patch now.)

== RUN   TestCmdArgs
runtime: waitforsingleobject wait_failed; errno=6
fatal error: runtime.semasleep wait_failed

runtime stack:
runtime.throw(0xab7328, 0x1d)
        C:/Go/src/runtime/panic.go:1112 +0x79
runtime.semasleep.func2()
        C:/Go/src/runtime/os_windows.go:778 +0x85
runtime.semasleep(0x186a0, 0x8e782e)
        C:/Go/src/runtime/os_windows.go:776 +0x1ec
runtime.notetsleep_internal(0xbdf7b8, 0x186a0, 0xc0008bb380, 0x32e943ed9cc, 0xc00002ae98)
        C:/Go/src/runtime/lock_sema.go:232 +0x117
runtime.notetsleep(0xbdf7b8, 0x186a0, 0x7fffffffffffffff)
        C:/Go/src/runtime/lock_sema.go:283 +0x66
runtime.forEachP(0xac0168)
        C:/Go/src/runtime/proc.go:1493 +0x332
runtime.gcMarkDone.func1()
        C:/Go/src/runtime/mgc.go:1524 +0x6c
runtime.systemstack(0x0)
        C:/Go/src/runtime/asm_amd64.s:375 +0x6b
runtime.mstart()
        C:/Go/src/runtime/proc.go:1219

goroutine 107 [GC worker (idle)]:
runtime.systemstack_switch()
        C:/Go/src/runtime/asm_amd64.s:335 fp=0xc000515f08 sp=0xc000515f00 pc=0x930d20
runtime.gcMarkDone()
        C:/Go/src/runtime/mgc.go:1517 +0x105 fp=0xc000515f60 sp=0xc000515f08 pc=0x8dd3c5
runtime.gcBgMarkWorker()
        C:/Go/src/runtime/mgc.go:2045 +0x2e5 fp=0xc000515fe0 sp=0xc000515f60 pc=0x8de6a5
runtime.goexit()
        C:/Go/src/runtime/asm_amd64.s:1367 +0x1 fp=0xc000515fe8 sp=0xc000515fe0 pc=0x932bc1
created by runtime.gcBgMarkStartWorkers
        C:/Go/src/runtime/mgc.go:1835 +0x3e

goroutine 1 [chan receive]:
testing.(*T).Run(0xc000a30c00, 0xaaf734, 0xb, 0xabf9c0, 0x95b001)
        C:/Go/src/testing/testing.go:1240 +0x2da
testing.runTests.func1(0xc000270000)
        C:/Go/src/testing/testing.go:1512 +0x7f
testing.tRunner(0xc000270000, 0xc000439de0)
        C:/Go/src/testing/testing.go:1194 +0xef
testing.runTests(0xc000004078, 0xbdc020, 0x8f, 0x8f, 0xbff87dd43b358b40, 0x8bb3133975, 0xbdf520, 0xab056c)
        C:/Go/src/testing/testing.go:1510 +0x310
testing.(*M).Run(0xc000020100, 0x0)
        C:/Go/src/testing/testing.go:1418 +0x1f5
main.main()
        _testmain.go:361 +0x145

goroutine 20353 [chan receive]:
testing.(*T).Parallel(0xc0008ba600)
        C:/Go/src/testing/testing.go:1060 +0x119
os_test.TestOpenFileKeepsPermissions(0xc0008ba600)
        C:/Go/src/os/os_test.go:2660 +0x47
testing.tRunner(0xc0008ba600, 0xabfbc0)
        C:/Go/src/testing/testing.go:1194 +0xef
created by testing.(*T).Run
        C:/Go/src/testing/testing.go:1239 +0x2b3

goroutine 21001 [runnable]:
os/exec.(*Cmd).Wait(0xc00083cdc0, 0x0, 0x0)
        C:/Go/src/os/exec/exec.go:498 +0x33b
os/exec.(*Cmd).Run(0xc00083cdc0, 0xc000078d80, 0xc00083cdc0)
        C:/Go/src/os/exec/exec.go:341 +0x66
os/exec.(*Cmd).CombinedOutput(0xc00083cdc0, 0x28, 0xc00042f710, 0x1, 0x1, 0xc00083cdc0)
        C:/Go/src/os/exec/exec.go:567 +0x9b
os_test.TestCmdArgs(0xc000a30c00)
        C:/Go/src/os/os_windows_test.go:896 +0x76d
testing.tRunner(0xc000a30c00, 0xabf9c0)
        C:/Go/src/testing/testing.go:1194 +0xef
created by testing.(*T).Run
        C:/Go/src/testing/testing.go:1239 +0x2b3

goroutine 21138 [runnable]:
os/exec.(*Cmd).Start.func1(0xc00083cdc0, 0xc0003f8020)
        C:/Go/src/os/exec/exec.go:440
created by os/exec.(*Cmd).Start
        C:/Go/src/os/exec/exec.go:440 +0x6b0
FAIL    os      196.865s
FAIL

@zx2c4
Copy link
Contributor

zx2c4 commented Jan 15, 2021

Got it.

)
=== RUN   TestRemoveAllLarge
runtime: setevent failed; errno=6, mp.waitsema=864
fatal error: runtime.semawakeup

runtime stack:
runtime.throw(0xa0251e, 0x12)
        C:/Go/src/runtime/panic.go:1112 +0x79
runtime.semawakeup.func1()
        C:/Go/src/runtime/os_windows.go:796 +0xc8
runtime.semawakeup(0xc000055000)
        C:/Go/src/runtime/os_windows.go:794 +0x6d
runtime.notewakeup(0xc000055120)
        C:/Go/src/runtime/lock_sema.go:161 +0x65
runtime.startm(0x0, 0xc000020001)
        C:/Go/src/runtime/proc.go:2318 +0x1d9
runtime.wakep()
        C:/Go/src/runtime/proc.go:2398 +0x6d
runtime.resetspinning()
        C:/Go/src/runtime/proc.go:2941 +0x69
runtime.schedule()
        C:/Go/src/runtime/proc.go:3097 +0x2b9
runtime.park_m(0xc0003ef800)
        C:/Go/src/runtime/proc.go:3239 +0xb2
runtime.mcall(0x0)
        C:/Go/src/runtime/asm_amd64.s:323 +0x5e

goroutine 1 [chan receive]:
testing.(*T).Run(0xc000590d80, 0xa02338, 0x12, 0xa0fda0, 0x8ab001)
        C:/Go/src/testing/testing.go:1240 +0x2da
testing.runTests.func1(0xc00038a000)
        C:/Go/src/testing/testing.go:1512 +0x7f
testing.tRunner(0xc00038a000, 0xc0004f7de0)
        C:/Go/src/testing/testing.go:1194 +0xef
testing.runTests(0xc00009c060, 0xb2c020, 0x8f, 0x8f, 0xbff87e2d1fcbf22c, 0x8bb2fe0c5d, 0xb2f520, 0xa0060c)
        C:/Go/src/testing/testing.go:1510 +0x310
testing.(*M).Run(0xc0000d6080, 0x0)
        C:/Go/src/testing/testing.go:1418 +0x1f5
main.main()
        _testmain.go:361 +0x145

goroutine 3416 [chan receive]:
testing.(*T).Parallel(0xc000590480)
        C:/Go/src/testing/testing.go:1060 +0x119
os_test.TestOpenFileKeepsPermissions(0xc000590480)
        C:/Go/src/os/os_test.go:2660 +0x47
testing.tRunner(0xc000590480, 0xa0fc68)
        C:/Go/src/testing/testing.go:1194 +0xef
created by testing.(*T).Run
        C:/Go/src/testing/testing.go:1239 +0x2b3

goroutine 4098 [chan receive]:
testing.(*T).Parallel(0xc000590180)
        C:/Go/src/testing/testing.go:1060 +0x119
os_test.TestFdReadRace(0xc000590180)
        C:/Go/src/os/pipe_test.go:435 +0x59
testing.tRunner(0xc000590180, 0xa0fb50)
        C:/Go/src/testing/testing.go:1194 +0xef
created by testing.(*T).Run
        C:/Go/src/testing/testing.go:1239 +0x2b3

goroutine 3953 [chan receive]:
testing.(*T).Parallel(0xc0001a6300)
        C:/Go/src/testing/testing.go:1060 +0x119
os_test.TestSymlinkCreation(0xc0001a6300)
        C:/Go/src/os/os_windows_test.go:971 +0x79
testing.tRunner(0xc0001a6300, 0xa0fec0)
        C:/Go/src/testing/testing.go:1194 +0xef
created by testing.(*T).Run
        C:/Go/src/testing/testing.go:1239 +0x2b3

goroutine 3970 [chan receive]:
testing.(*T).Parallel(0xc0001a6480)
        C:/Go/src/testing/testing.go:1060 +0x119
os_test.TestRootRelativeDirSymlink(0xc0001a6480)
        C:/Go/src/os/os_windows_test.go:1015 +0x7f
testing.tRunner(0xc0001a6480, 0xa0fe18)
        C:/Go/src/testing/testing.go:1194 +0xef
created by testing.(*T).Run
        C:/Go/src/testing/testing.go:1239 +0x2b3

goroutine 4099 [chan receive]:
testing.(*T).Parallel(0xc000590300)
        C:/Go/src/testing/testing.go:1060 +0x119
os_test.TestRawConnReadWrite(0xc000590300)
        C:/Go/src/os/rawconn_test.go:17 +0x5f
testing.tRunner(0xc000590300, 0xa0fce0)
        C:/Go/src/testing/testing.go:1194 +0xef
created by testing.(*T).Run
        C:/Go/src/testing/testing.go:1239 +0x2b3

goroutine 4105 [runnable]:
os.removeAll(0xc000800000, 0x3f, 0x37, 0xa3c83c)
        C:/Go/src/os/removeall_noat.go:15 +0x945
os.RemoveAll(...)
        C:/Go/src/os/path.go:67
os.removeAll(0xc000262300, 0x37, 0x0, 0x1b6)
        C:/Go/src/os/removeall_noat.go:69 +0x7e5
os.RemoveAll(...)
        C:/Go/src/os/path.go:67
os_test.TestRemoveAllLarge(0xc000590d80)
        C:/Go/src/os/removeall_test.go:151 +0x41d
testing.tRunner(0xc000590d80, 0xa0fda0)
        C:/Go/src/testing/testing.go:1194 +0xef
created by testing.(*T).Run
        C:/Go/src/testing/testing.go:1239 +0x2b3
FAIL    os      38.545s
FAIL

@egonelbre
Copy link
Contributor Author

I added the debug line to semasleep as well:

diff --git a/src/runtime/os_windows.go b/src/runtime/os_windows.go
index d389d38ab9..d893e68369 100644
--- a/src/runtime/os_windows.go
+++ b/src/runtime/os_windows.go
@@ -737,8 +737,10 @@ func semasleep(ns int64) int32 {
        )

        var result uintptr
+       var waitsema uintptr
        if ns < 0 {
-               result = stdcall2(_WaitForSingleObject, getg().m.waitsema, uintptr(_INFINITE))
+               waitsema = getg().m.waitsema
+               result = stdcall2(_WaitForSingleObject, waitsema, uintptr(_INFINITE))
        } else {
                start := nanotime()
                elapsed := int64(0)
@@ -747,8 +749,9 @@ func semasleep(ns int64) int32 {
                        if ms == 0 {
                                ms = 1
                        }
+                       waitsema = getg().m.waitsema
                        result = stdcall4(_WaitForMultipleObjects, 2,
-                               uintptr(unsafe.Pointer(&[2]uintptr{getg().m.waitsema, getg().m.resumesema})),
+                               uintptr(unsafe.Pointer(&[2]uintptr{waitsema, getg().m.resumesema})),
                                0, uintptr(ms))
                        if result != _WAIT_OBJECT_0+1 {
                                // Not a suspend/resume event
@@ -774,13 +777,13 @@ func semasleep(ns int64) int32 {

        case _WAIT_FAILED:
                systemstack(func() {
-                       print("runtime: waitforsingleobject wait_failed; errno=", getlasterror(), "\n")
+                       print("runtime: waitforsingleobject wait_failed; errno=", getlasterror(), ", waitsema=", waitsema, "\n")
                        throw("runtime.semasleep wait_failed")
                })

        default:
                systemstack(func() {
-                       print("runtime: waitforsingleobject unexpected; result=", result, "\n")
+                       print("runtime: waitforsingleobject unexpected; result=", result, ", waitsema=", waitsema, "\n")
                        throw("runtime.semasleep unexpected")
                })
        }
runtime: waitforsingleobject wait_failed; errno=6, waitsema=1520
fatal error: runtime.semasleep wait_failed

runtime stack:
runtime.throw(0x8473e1, 0x1d)
        C:/Go.tip/src/runtime/panic.go:1112 +0x79
runtime.semasleep.func2()
        C:/Go.tip/src/runtime/os_windows.go:781 +0xb6
runtime.semasleep(0xffffffffffffffff, 0x0)
        C:/Go.tip/src/runtime/os_windows.go:779 +0x226
runtime.notesleep(0xc00007c920)
        C:/Go.tip/src/runtime/lock_sema.go:181 +0xee
runtime.mPark()
        C:/Go.tip/src/runtime/proc.go:1309 +0x45
runtime.stopm()
        C:/Go.tip/src/runtime/proc.go:2222 +0xa5
runtime.findrunnable(0xc000022000, 0x0)
        C:/Go.tip/src/runtime/proc.go:2881 +0x74e
runtime.schedule()
        C:/Go.tip/src/runtime/proc.go:3090 +0x2d7
runtime.park_m(0xc000509c80)
        C:/Go.tip/src/runtime/proc.go:3239 +0xb2
runtime.mcall(0x0)
        C:/Go.tip/src/runtime/asm_amd64.s:323 +0x5e

goroutine 1 [chan receive]:
testing.(*T).Run(0xc000402600, 0x83f7df, 0xb, 0x84fa78, 0x6eb101)
        C:/Go.tip/src/testing/testing.go:1240 +0x2da
testing.runTests.func1(0xc000506300)
        C:/Go.tip/src/testing/testing.go:1512 +0x7f
testing.tRunner(0xc000506300, 0xc000385de0)
        C:/Go.tip/src/testing/testing.go:1194 +0xef
testing.runTests(0xc000004078, 0x96c020, 0x8f, 0x8f, 0xbff87e029639338c, 0x8bb2f1b435, 0x96f520, 0x840617)
        C:/Go.tip/src/testing/testing.go:1510 +0x310
testing.(*M).Run(0xc0000c2080, 0x0)
        C:/Go.tip/src/testing/testing.go:1418 +0x1f5
main.main()
        _testmain.go:361 +0x145

goroutine 1814 [chan receive]:
testing.(*T).Parallel(0xc000706c00)
        C:/Go.tip/src/testing/testing.go:1060 +0x119
os_test.TestOpenFileKeepsPermissions(0xc000706c00)
        C:/Go.tip/src/os/os_test.go:2661 +0x47
testing.tRunner(0xc000706c00, 0x84fc78)
        C:/Go.tip/src/testing/testing.go:1194 +0xef
created by testing.(*T).Run
        C:/Go.tip/src/testing/testing.go:1239 +0x2b3

goroutine 2330 [sleep]:
time.Sleep(0x4c4b40)
        C:/Go.tip/src/runtime/time.go:193 +0xe5
os.(*Process).wait(0xc00044c060, 0xc0003e4000, 0x0, 0x0)
        C:/Go.tip/src/os/exec_windows.go:45 +0x41b
os.(*Process).Wait(...)
        C:/Go.tip/src/os/exec.go:129
os/exec.(*Cmd).Wait(0xc000710000, 0x0, 0x0)
        C:/Go.tip/src/os/exec/exec.go:507 +0x6a
os/exec.(*Cmd).Run(0xc000710000, 0xc0003ba090, 0xc000710000)
        C:/Go.tip/src/os/exec/exec.go:341 +0x66
os/exec.(*Cmd).CombinedOutput(0xc000710000, 0x26, 0xc000142210, 0x2, 0x3, 0xc000710000)
        C:/Go.tip/src/os/exec/exec.go:567 +0x9b
os_test.TestCmdArgs(0xc000402600)
        C:/Go.tip/src/os/os_windows_test.go:896 +0x76d
testing.tRunner(0xc000402600, 0x84fa78)
        C:/Go.tip/src/testing/testing.go:1194 +0xef
created by testing.(*T).Run
        C:/Go.tip/src/testing/testing.go:1239 +0x2b3
exit status 2
FAIL    os      13.884s

@zx2c4
Copy link
Contributor

zx2c4 commented Jan 15, 2021

@bcmills This seems like it might be a 1.16 blocker.

@zx2c4
Copy link
Contributor

zx2c4 commented Jan 15, 2021

When do we actually close these event objects? It seems like these might just leak every time the thread exits?

@egonelbre
Copy link
Contributor Author

egonelbre commented Jan 15, 2021

I also found a remark in WaitForSingleObject documentation https://docs.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-waitforsingleobject

Use caution when calling the wait functions and code that directly or indirectly creates windows. If a thread creates any windows, it must process messages. Message broadcasts are sent to all windows in the system. A thread that uses a wait function with no time-out interval may cause the system to become deadlocked. Two examples of code that indirectly creates windows are DDE and the CoInitialize function. Therefore, if you have a thread that creates windows, use MsgWaitForMultipleObjects or MsgWaitForMultipleObjectsEx, rather than WaitForSingleObject.

I'm not sure, how relevant this is. I'm not aware that any thread would be creating any windows.

@egonelbre
Copy link
Contributor Author

I found an old forum post with a similar problem https://groups.google.com/g/golang-nuts/c/9Q70fcygg-c

@gopherbot
Copy link

Change https://golang.org/cl/284132 mentions this issue: runtime: close Windows events on unminit

@zx2c4
Copy link
Contributor

zx2c4 commented Jan 15, 2021

Change https://golang.org/cl/284132 mentions this issue: runtime: close Windows events on unminit

This doesn't appear to fully fix the issue, but it's at least related.

@zx2c4
Copy link
Contributor

zx2c4 commented Jan 15, 2021

I got an even crazier crash this time:

)
=== RUN   TestCmdArgs
runtime: waitforsingleobject wait_failed; errno=6
fatal error: runtime.semasleep wait_failed

runtime stack:
runtime.throw(0xa273d6, 0x1d)
        C:/Go/src/runtime/panic.go:1112 +0x79
runtime.semasleep.func2()
        C:/Go/src/runtime/os_windows.go:778 +0x85
runtime.semasleep(0xffffffffffffffff, 0x0)
        C:/Go/src/runtime/os_windows.go:776 +0x1ec
runtime.notesleep(0xc0004d4520)
        C:/Go/src/runtime/lock_sema.go:181 +0xee
runtime.mPark()
        C:/Go/src/runtime/proc.go:1309 +0x45
runtime.stopm()
        C:/Go/src/runtime/proc.go:2222 +0xa5
runtime.findrunnable(0xc000024800, 0x0)
        C:/Go/src/runtime/proc.go:2881 +0x74e
runtime.schedule()
        C:/Go/src/runtime/proc.go:3090 +0x2d7
runtime.goschedImpl(0xc00004a900)
        C:/Go/src/runtime/proc.go:3254 +0xf5
runtime.gosched_m(0xc00004a900)
        C:/Go/src/runtime/proc.go:3262 +0x3b
runtime.mcall(0x0)
        C:/Go/src/runtime/asm_amd64.s:323 +0x5e

goroutine 1 [chan receive]:
testing.(*T).Run(0xc000481200, 0xa1f7d4, 0xb, 0xa2fa68, 0x8cb101)
        C:/Go/src/testing/testing.go:1240 +0x2da
testing.runTests.func1(0xc0002aed80)
        C:/Go/src/testing/testing.go:1512 +0x7f
testing.tRunner(0xc0002aed80, 0xc0005cdde0)
        C:/Go/src/testing/testing.go:1194 +0xef
testing.runTests(0xc000004078, 0xb4c020, 0x8f, 0x8f, 0xbff87fa0943b6758, 0x8bb3151155, 0xb4f520, 0xa2060c)
        C:/Go/src/testing/testing.go:1510 +0x310
testing.(*M).Run(0xc000020100, 0x0)
        C:/Go/src/testing/testing.go:1418 +0x1f5
main.main()
        _testmain.go:361 +0x145

goroutine 12911 [syscall, locked to thread]:
syscall.Syscall12(0x7ff851f5c2c0, 0xa, 0xc000662aa0, 0xc0000a5ab0, 0x0, 0x0, 0x1, 0x400, 0xc0005d0a00, 0x0, ...)
        C:/Go/src/runtime/syscall_windows.go:369 +0xf2
syscall.CreateProcess(0xc000662aa0, 0xc0000a5ab0, 0x0, 0x0, 0x40000000001, 0xc0005d0a00, 0x0, 0xc00006d778, 0xc00006d688, 0xc0006c8170, ...)
        C:/Go/src/syscall/zsyscall_windows.go:549 +0x18b
syscall.StartProcess(0xc00046daa0, 0x27, 0xc000136bc0, 0xa2f890, 0xc00068a700, 0xc00001c6f0, 0x0, 0x83e265, 0x2a9acaf1548, 0x0)
        C:/Go/src/syscall/exec_windows.go:347 +0x674
runtime: unexpected return pc for os.startProcess called from 0x0
stack: frame={sp:0xc00006d7f0, fp:0xc00006d8e0} stack=[0xc00006a000,0xc00006e000)
000000c00006d6f0:  000000c000031000  000000c00006d730
000000c00006d700:  0000000000834993 <runtime.cgocall+211>  00000000008a4420 <runtime.asmstdcall+0>
000000c00006d710:  000000c0005502d8  0000000000000000
000000c00006d720:  0000000087461c68  000000c000550000
000000c00006d730:  000000c00006d770  000000000089f85c <syscall.Syscall+284>
000000c00006d740:  0000000000000000  0000000000000005
000000c00006d750:  000000c00006d7a8  0000000000849213 <runtime.(*mcentral).cacheSpan+435>
000000c00006d760:  000002a9ac7d2cb8  0000000000000000
000000c00006d770:  0000000000000000  0000000000000006
000000c00006d780:  000000c00006d7d8  0000000000849213 <runtime.(*mcentral).cacheSpan+435>
000000c00006d790:  000002a9acaf1548  0000000000000000
000000c00006d7a0:  0000000000000000  0000000000000009
000000c00006d7b0:  00000000000000c8  0000000000b98dc8
000000c00006d7c0:  000002a9acaf1548  0000000000000000
000000c00006d7d0:  0000000000000000  000000c00006d820
000000c00006d7e0:  00000000008488c5 <runtime.(*mcache).refill+325>  0000000000b98dc8
000000c00006d7f0: <0000000000b98dc8  0000000000a5a3c8
000000c00006d800:  000002a9ac7d2cb8  0000000000000200
000000c00006d810:  000002a9acaf1548  0000000000000001
000000c00006d820:  000000c00006d858  000000c00006d860
000000c00006d830:  0000000000875cd0 <runtime.reentersyscall+400>  0000000000834915 <runtime.cgocall+85>
000000c00006d840:  000000c00006d890  000000c000085500
000000c00006d850:  000002a9acaf1548  000000c00006d8e0
000000c00006d860:  000000c00006d880  000000000089d3a6 <runtime.entersyscall+38>
000000c00006d870:  0000000000834915 <runtime.cgocall+85>  000000c00006d890
000000c00006d880:  00000000008a2867 <runtime.asmcgocall+71>  000000000083496b <runtime.cgocall+171>
000000c00006d890:  00000000008a4420 <runtime.asmstdcall+0>  000000c0005502d8
000000c00006d8a0:  0000000000000010  0000000000000010
000000c00006d8b0:  000000c000550000  000000c00006d8f8
000000c00006d8c0:  000000000089f829 <syscall.Syscall+233>  00000000008a4420 <runtime.asmstdcall+0>
000000c00006d8d0:  000000c0005502d8 !0000000000000000
000000c00006d8e0: >010000c00006d928  000000c000550000
000000c00006d8f0:  0000000000a30440  000000c00006d958
000000c00006d900:  00000000008bb9af <syscall.WaitForSingleObject+143>  00007ff851f63600
000000c00006d910:  0000000000000002  0000000000000370
000000c00006d920:  00000000ffffffff  0000000000000000
000000c00006d930:  0000000000000000  0000000000000000
000000c00006d940:  0000000000000000  000000c0000aac60
000000c00006d950:  000000c00001d4d0  000000c00006d9e8
000000c00006d960:  00000000008e3e25 <os.(*Process).wait+133>  0000000000000370
000000c00006d970:  00000000ffffffff  0000000000000000
000000c00006d980:  0000000000000004  0000000000000004
000000c00006d990:  0000000002030000  0000000000000000
000000c00006d9a0:  0000000000000001  0000000000000370
000000c00006d9b0:  0000000000000160  0000000000000148
000000c00006d9c0:  0000000000a17760  000000c000136bc0
000000c00006d9d0:  0000000000000000  0000000000000000
os.startProcess(0x10000c00006d928, 0xc000550000, 0xa30440, 0xc00006d958, 0x8bb9af, 0x7ff851f63600, 0x2, 0x370, 0xffffffff)
        C:/Go/src/os/exec_posix.go:53 +0x265
created by testing.(*T).Run
        C:/Go/src/testing/testing.go:1239 +0x2b3

goroutine 12435 [chan receive]:
testing.(*T).Parallel(0xc0005d2d80)
        C:/Go/src/testing/testing.go:1060 +0x119
os_test.TestOpenFileKeepsPermissions(0xc0005d2d80)
        C:/Go/src/os/os_test.go:2660 +0x47
testing.tRunner(0xc0005d2d80, 0xa2fc68)
        C:/Go/src/testing/testing.go:1194 +0xef
created by testing.(*T).Run
        C:/Go/src/testing/testing.go:1239 +0x2b3

goroutine 11545 [runnable]:
os/exec.(*Cmd).Start.func1(0xc0000aac60, 0xc0004bed40)
        C:/Go/src/os/exec/exec.go:440
created by os/exec.(*Cmd).Start
        C:/Go/src/os/exec/exec.go:440 +0x6b0
FAIL    os      122.163s
FAIL

@egonelbre
Copy link
Contributor Author

I was trying whether Application Verifier gives me more information, but it doesn't look like it:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<avrf:logfile xmlns:avrf="Application Verifier">
	<avrf:logSession TimeStarted="2021-01-15 : 14:23:02" PID="19908" Version="2">
		<avrf:logEntry Time="2021-01-15 : 14:23:12" LayerName="Handles" StopCode="0x300" Severity="Error">
			<avrf:message>Invalid handle exception for current stack trace.</avrf:message>
			<avrf:parameter1>c0000008 - Exception code.</avrf:parameter1>
			<avrf:parameter2>2ddb5ff6f0 - Exception record. Use .exr to display it.</avrf:parameter2>
			<avrf:parameter3>2ddb5fef20 - Context record. Use .cxr to display it.</avrf:parameter3>
			<avrf:parameter4>0 - Not used.</avrf:parameter4>
			<avrf:stackTrace>
				<avrf:trace>vfbasics!+7fff64942c40 ( @ 0)</avrf:trace>
				<avrf:trace>ntdll!RtlDeleteAce+3cc ( @ 0)</avrf:trace>
				<avrf:trace>ntdll!RtlRaiseException+256 ( @ 0)</avrf:trace>
				<avrf:trace>ntdll!RtlRaiseException+185 ( @ 0)</avrf:trace>
				<avrf:trace>ntdll!KiRaiseUserExceptionDispatcher+3a ( @ 0)</avrf:trace>
				<avrf:trace>vfbasics!+7fff64957660 ( @ 0)</avrf:trace>
				<avrf:trace>KERNELBASE!CloseHandle+45 ( @ 0)</avrf:trace>
				<avrf:trace>vfbasics!+7fff64959799 ( @ 0)</avrf:trace>
				<avrf:trace>vfbasics!+7fff6495983d ( @ 0)</avrf:trace>
				<avrf:trace>vfbasics!+7fff64959799 ( @ 0)</avrf:trace>
				<avrf:trace>vfbasics!+7fff64959808 ( @ 0)</avrf:trace>
				<avrf:trace>os.test!+a5451e ( @ 0)</avrf:trace>
			</avrf:stackTrace>
		</avrf:logEntry>
	</avrf:logSession>
</avrf:logfile>

@egonelbre
Copy link
Contributor Author

egonelbre commented Jan 15, 2021

So based on that CloseHandle was failing, I added debugging to all close handles, and got a failure in:

	// Suspend the thread.
	if int32(stdcall1(_SuspendThread, thread)) == -1 {
		unlock(&suspendLock)
		if stdcall1(_CloseHandle, thread) != 1 {
			print("CloseHandle failed: ", getlasterror(), "\n")
			throw("CloseHandle failed")
		}
		atomic.Store(&mp.preemptExtLock, 0)
		// The thread no longer exists. This shouldn't be
		// possible, but just acknowledge the request.
		atomic.Xadd(&mp.preemptGen, 1)
		return
	}

Edit: this seems to be failing because, previously DuplicateHandle failed:

	var thread uintptr
	if stdcall7(_DuplicateHandle, currentProcess, mp.thread, currentProcess, uintptr(unsafe.Pointer(&thread)), 0, 0, _DUPLICATE_SAME_ACCESS) != 1 {
		print("DuplicateHandle failed: ", getlasterror(), "\n")
		throw("DuplicateHandle failed")
	}
	unlock(&mp.threadLock)

c:\Go.tip\src\os>go test -buildmode=exe -count 10000 .
DuplicateHandle failed: 6
fatal error: DuplicateHandle failed

runtime stack:
runtime.throw(0x5f4012, 0x16)
        C:/Go.tip/src/runtime/panic.go:1112 +0x79
runtime.preemptM(0xc000372000)
        C:/Go.tip/src/runtime/os_windows.go:1260 +0x528

@ALTree ALTree added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jan 15, 2021
@zx2c4
Copy link
Contributor

zx2c4 commented Jan 15, 2021

Edit: this seems to be failing because, previously DuplicateHandle failed:

Unclear yet whether it's related, but it would be a good thing to fix anyway. Can you send a CL with your error checking? That should squash other things down the line. I can see that bug being hit in api monitor too.

@egonelbre
Copy link
Contributor Author

@bcmills bcmills added this to the Go1.16 milestone Jan 15, 2021
@gopherbot
Copy link

Change https://golang.org/cl/284133 mentions this issue: runtime: debug code for windows

@zx2c4
Copy link
Contributor

zx2c4 commented Jan 15, 2021

At the moment I suspect that the handle is being closed by CreateProcess.

@egonelbre
Copy link
Contributor Author

So, what I've gathered is that the failures seem to happen where the .thread has been closed with CloseEvent. If you replace mp.thread = 0 in unminit with something like mp.thread = 0xFFFFFFFFFF, you can then see it in the respective failures.

@gopherbot
Copy link

Change https://golang.org/cl/284135 mentions this issue: runtime: detect errors in DuplicateHandle

@egonelbre
Copy link
Contributor Author

@zx2c4, so one thing I'm not understanding:

  1. Here is unminit called: https://github.com/golang/go/blob/master/src/runtime/proc.go#L1353
  2. unminit frees thread (and events in https://go-review.googlesource.com/c/go/+/284132/) https://github.com/golang/go/blob/master/src/runtime/os_windows.go#L947
  3. lock is called in https://github.com/golang/go/blob/master/src/runtime/proc.go#L1366
  4. which may call into semasleep
    semasleep(-1)
  5. which may wait on the waitsema:
    var result uintptr

Maybe, by closing the thread, it'll also take down the events that are used for waiting on locks?

@zx2c4
Copy link
Contributor

zx2c4 commented Jan 15, 2021

Does moving unminit below the lock taking fix it? Especially with https://go-review.googlesource.com/c/go/+/284132/ that's probably a good idea.

@gopherbot
Copy link

Change https://golang.org/cl/284137 mentions this issue: runtime: defer unminit until after last lock is dropped

@zx2c4
Copy link
Contributor

zx2c4 commented Jan 15, 2021

@egonelbre I can still repro after the CL I just pushed. I still think that CL is a good idea though.

@zx2c4
Copy link
Contributor

zx2c4 commented Jan 15, 2021

I can reproduce the problem even with this:

diff --git a/src/runtime/os_windows.go b/src/runtime/os_windows.go
index d66e30ee3c..30bc978f95 100644
--- a/src/runtime/os_windows.go
+++ b/src/runtime/os_windows.go
@@ -942,6 +942,7 @@ func minit() {
 // Called from dropm to undo the effect of an minit.
 //go:nosplit
 func unminit() {
+       return
        mp := getg().m
        lock(&mp.threadLock)
        stdcall1(_CloseHandle, mp.thread)

So I don't think it's related to unminit.

@egonelbre
Copy link
Contributor Author

I still got CloseHandle failed on resumesema after moving unminit. Maybe closing thread needs to be the last thing in unminit?

@zx2c4
Copy link
Contributor

zx2c4 commented Jan 15, 2021

We posted at the same time. Objects created by the thread should keep existing after the thread is closed anyway. I don't think this is related to unminit.

@egonelbre
Copy link
Contributor Author

egonelbre commented Jan 15, 2021

I'm slowly running out of ideas. I'm currently exploring, maybe the scheduler is scheduling on dead m-s, however since windows seems to be the exception with regards to using waitsema and resumesema for locking, it's the only one that fails with using dropped m.

@zx2c4
Copy link
Contributor

zx2c4 commented Jan 15, 2021

I wonder if there's some code that calls CloseHandle twice, which races with an interleaved semacreate, perhaps in exec/. Or any other code that calls CloseHandle twice erroneously.

@egonelbre
Copy link
Contributor Author

Hmm, so both mexit and dropm call unminit on the same thread.

@zx2c4
Copy link
Contributor

zx2c4 commented Jan 15, 2021

But unminit takes a lock. And we've already seen that nop'ing out unminit doesn't prevent the bug.

@zx2c4
Copy link
Contributor

zx2c4 commented Jan 15, 2021

@aclements @ianlancetaylor I suspect that @egonelbre and I could use a fresh set of eyes on this one... We're both running out of ideas.

@egonelbre
Copy link
Contributor Author

Yeah, I think I've been looking for too long this and going slowly crazy with suspecting every tiny thing. :D

gopherbot pushed a commit that referenced this issue Jan 15, 2021
These functions rely on DuplicateHandle succeeding, but they don't check
the return value, which might be masking subtle bugs that cause other
problems down the line.

Updates #43720.

Change-Id: I77f0e6645affa534777ffc173144a52e4afa5f81
Reviewed-on: https://go-review.googlesource.com/c/go/+/284135
Run-TryBot: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
Reviewed-by: Austin Clements <austin@google.com>
Trust: Alex Brainman <alex.brainman@gmail.com>
Trust: Jason A. Donenfeld <Jason@zx2c4.com>
@zx2c4
Copy link
Contributor

zx2c4 commented Jan 16, 2021

It really smells like getg().m is returning the wrong value. We had some reports of this happening already leading to this quick fix -- https://go-review.googlesource.com/c/go/+/271907 . In this case it was more drastic; it was being zeroed out. But what if the issue here is that it's returning bogus ms or bogus gs?

Really grasping at straws here.

@egonelbre
Copy link
Contributor Author

After running with static lock ranking, I got:

go test -tags goexperiment.staticlockranking -count 100000 -failfast .
0  ======
0 : LEAF 1000 0x10a8dc0
1 : mheap 38 0x10c0500
fatal error: lock ordering problem

runtime stack:
runtime.throw(0xf66042, 0x15)
        C:/Go.tip/src/runtime/panic.go:1112 +0x79 fp=0xe890fff208 sp=0xe890fff1d8 pc=0xdac959
runtime.checkRanks(0x10924c0, 0x3e8, 0x26)
        C:/Go.tip/src/runtime/lockrank_on.go:165 +0x111 fp=0xe890fff230 sp=0xe890fff208 pc=0xd7d991
runtime.lockWithRank.func1()
        C:/Go.tip/src/runtime/lockrank_on.go:92 +0xbc fp=0xe890fff260 sp=0xe890fff230 pc=0xdd957c
runtime.lockWithRank(0x10c0500, 0x26)
        C:/Go.tip/src/runtime/lockrank_on.go:81 +0xb2 fp=0xe890fff298 sp=0xe890fff260 pc=0xd7d612
runtime.lock(...)
        C:/Go.tip/src/runtime/lock_sema.go:36
runtime.(*mheap).allocSpan(0x10c0500, 0x1, 0xa00, 0x62dcfd3a0427)
        C:/Go.tip/src/runtime/mheap.go:1162 +0x50a fp=0xe890fff308 sp=0xe890fff298 pc=0xd9a42a
runtime.(*mheap).alloc.func1()
        C:/Go.tip/src/runtime/mheap.go:910 +0x5f fp=0xe890fff358 sp=0xe890fff308 pc=0xddb65f
runtime.(*mheap).alloc(0x10c0500, 0x1, 0x7fff6848010a, 0x100000009)
        C:/Go.tip/src/runtime/mheap.go:904 +0x88 fp=0xe890fff3a8 sp=0xe890fff358 pc=0xd99968
runtime.(*mcentral).grow(0x10d1340, 0x0)
        C:/Go.tip/src/runtime/mcentral.go:232 +0x88 fp=0xe890fff3f0 sp=0xe890fff3a8 pc=0xd8a0c8
runtime.(*mcentral).cacheSpan(0x10d1340, 0x1)
        C:/Go.tip/src/runtime/mcentral.go:158 +0x31f fp=0xe890fff448 sp=0xe890fff3f0 pc=0xd89e9f
runtime.(*mcache).refill(0x29b40e70108, 0xa)
        C:/Go.tip/src/runtime/mcache.go:162 +0xb5 fp=0xe890fff490 sp=0xe890fff448 pc=0xd89375
runtime.(*mcache).nextFree(0x29b40e70108, 0xde6b0a, 0x1007fff69041d2c, 0x29b40e79498, 0x0)
        C:/Go.tip/src/runtime/malloc.go:882 +0xa5 fp=0xe890fff4c8 sp=0xe890fff490 pc=0xd7ed85
runtime.mallocgc(0x30, 0xf53940, 0xd7d601, 0xe890fff570)
        C:/Go.tip/src/runtime/malloc.go:1069 +0x890 fp=0xe890fff550 sp=0xe890fff4c8 pc=0xd7f7d0
runtime.newobject(0xf53940, 0x10a8dc0)
        C:/Go.tip/src/runtime/malloc.go:1177 +0x3f fp=0xe890fff580 sp=0xe890fff550 pc=0xd7fa3f
runtime.makemap_small(0xf35320)
        C:/Go.tip/src/runtime/map.go:293 +0x34 fp=0xe890fff5a0 sp=0xe890fff580 pc=0xd80694
syscall.compileCallback(0xf34d20, 0xf72980, 0x1, 0x27)
        C:/Go.tip/src/runtime/syscall_windows.go:196 +0x517 fp=0xe890fff698 sp=0xe890fff5a0 pc=0xde18b7
runtime.monitorSuspendResume()
        C:/Go.tip/src/runtime/os_windows.go:323 +0x147 fp=0xe890fff738 sp=0xe890fff698 pc=0xda6b47
runtime.goenvs()
        C:/Go.tip/src/runtime/os_windows.go:608 +0x1b6 fp=0xe890fff790 sp=0xe890fff738 pc=0xda7716
runtime.schedinit()
        C:/Go.tip/src/runtime/proc.go:648 +0x1ae fp=0xe890fff7f0 sp=0xe890fff790 pc=0xdb03ee
runtime.rt0_go(0xe890fff820, 0x7fff6a2c7034, 0xe890fff820, 0x0, 0x7fff6a2c7034, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        C:/Go.tip/src/runtime/asm_amd64.s:216 +0x13d fp=0xe890fff7f8 sp=0xe890fff7f0 pc=0xde33bd
FAIL    os      0.054s
FAIL

@zx2c4
Copy link
Contributor

zx2c4 commented Jan 17, 2021

Seems unrelated. That happens at startup only.

@egonelbre
Copy link
Contributor Author

If I disable TestReadStdin then I haven't, yet, been able to reproduce the failures.

@zx2c4
Copy link
Contributor

zx2c4 commented Jan 18, 2021

I think I see the bug... syscall.Stdin winds up being closed, and never reassigned. I'll send a fix.

@gopherbot
Copy link

Change https://golang.org/cl/284140 mentions this issue: os: do not close os.Stdin in TestReadStdin

@zx2c4
Copy link
Contributor

zx2c4 commented Jan 18, 2021

My fix seems correct to me, but it doesn't actually fix the issue after enough iterations...

@egonelbre
Copy link
Contributor Author

Ah, finally happened to me as well.

@zx2c4
Copy link
Contributor

zx2c4 commented Jan 18, 2021

I just pushed a small change that uses DuplicateHandle instead of GetStdHandle. Can you test that? So far so good for me.

@egonelbre
Copy link
Contributor Author

egonelbre commented Jan 18, 2021

Seems to be stable so far.

After running 5x go test -count 10000 in parallel for ~5min it was still stable.

@zx2c4
Copy link
Contributor

zx2c4 commented Jan 18, 2021

Likewise. I can't get it to crash. It just times out after 10 minutes per usual every time.

@zx2c4
Copy link
Contributor

zx2c4 commented Jan 18, 2021

After a few hours of this, it seems safe to conclude that https://golang.org/cl/284140 fixes the issue.

gopherbot pushed a commit that referenced this issue Jan 18, 2021
Calls to lock may need to use global members of mOS that also need to be
cleaned up before the thread exits. Before this commit, these resources
would leak. Moving them to be cleaned up in unminit, however, would race
with gstack on unix. So this creates a new helper, mdestroy, to release
resources that must be destroyed only after locks are no longer
required. We also move highResTimer lifetime to the same semantics,
since it doesn't help to constantly acquire and release the timer object
during dropm.

Updates #43720.

Change-Id: Ib3f598f3fda1b2bbcb608099616fa4f85bc1c289
Reviewed-on: https://go-review.googlesource.com/c/go/+/284137
Run-TryBot: Jason A. Donenfeld <Jason@zx2c4.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Trust: Alex Brainman <alex.brainman@gmail.com>
Trust: Jason A. Donenfeld <Jason@zx2c4.com>
@golang golang locked and limited conversation to collaborators Jan 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Windows release-blocker
Projects
None yet
Development

No branches or pull requests

6 participants
@zx2c4 @egonelbre @ALTree @bcmills @gopherbot and others