-
Notifications
You must be signed in to change notification settings - Fork 18k
runtime: netpoll responding to I/O events owned by client code crashes #21172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@ianlancetaylor wrote in #21153 (comment) :
|
Unfortunately I agree with what Ian said. Nothing more to add. Alex |
Unfortunately this is really hard (and really slow) to test in an automated fashion, since it involves building new VM images (itself slow) and then booting them on GCE and watching serial console output. This is probably only reasonably debuggable by hand on an actual Windows machine. |
Change https://golang.org/cl/52490 mentions this issue: |
@ianlancetaylor, we have a theory this might be related to the internal runtime poller interacting badly with github.com/tarm/serial on Windows. Maybe the serial port FD is in a weird state and the http request in the backtrace gets into the runtime poller code and finds the serial port readable/writeable but in a weird state and explodes. Will try to write a test later today. |
go1.8.3: no crash
go1.9rc1_cl163855560: crash
console:
|
I still don't understand what is happening here, but can you try https://golang.org/cl/52551 ? |
Change https://golang.org/cl/52551 mentions this issue: |
I did look at the code, and I agree that it is COM port file handle that might be causing problems. The github.com/tarm/serial is designed to do its own IO, but we, probably, include COM port file handle into netpoll monitor. github.com/tarm/serial opens the COM port with Windows CreateFile (as everything else on Windows), and then to read it calls Windows ReadFile passing it not nil OVERLAPPED structure that has hEvent field set to an Windows Event object. This will make ReadFile return immediately, and github.com/tarm/serial calls Windows GetOverlappedResult with the same OVERLAPPED structure. The GetOverlappedResult will block until some data arrives to COM port, which will trigger the Windows Event that hEvent points to, and that will make blocked GetOverlappedResult return. This is how the authors of github.com/tarm/serial expect things to work. netpoller on the other hand uses different strategy to do IO. It calls CreateIoCompletionPort to associate many file / IO handles with single "completion port". Once setup you make one single thread wait for "io is completed" events to be reported to that "completion port" - you call GetQueuedCompletionStatus in a loop. The GetQueuedCompletionStatus only returns when some IO is completed, and it uses lpOverlapped parameter to report which IO is completed. The lpOverlapped parameter is set to whatever you pass into Windows ReadFile or WriteFile when you start IO. So what is happening here, I suspect, github.com/tarm/serial calls os.NewFile with its file handle, and this calls CreateIoCompletionPort and associates COM port with our netpoller. Since COM port is now monitored by netpoller, when GetQueuedCompletionStatus (in netpoller) returns, we get what github.com/tarm/serial has passed to syscall.WriteFile - syscall.Overlapped, while netpoller expect that to be internal/poll.operation. Therefore memory corruption and crash. Or nice stack trace, if we are luck and memory is filled with zeros. Alex |
@ianlancetaylor that patch fixes the crash which I think further shines light into @alexbrainman 's excellent explanation. This bug doesn't actually require a fresh VM, I can RDP in, execute it and it repros. Nothing special about boot time/network. |
to quote @alexbrainman from the CL
It sounds like some additional plumbing is needed and the default behavior should be not to wait on files unless explicitly indicated. It also sounds like this code hasn't changed so this crash could be a diffrence in undefined behavior rather than an true regression. If that's all true then this shouldn't block rc1 but the problem demonstrated is a bug in implementation that should be addressed. |
Change https://golang.org/cl/52857 mentions this issue: |
Updates #21172 Change-Id: I0fec6e645328bbc85f3e47f4f71dd8d1d68c75ab Reviewed-on: https://go-review.googlesource.com/52551 Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
Change https://golang.org/cl/52930 mentions this issue: |
The basic problem seems to be fixed in 1.9. Pushing to 1.10 for a cleaner fix and a good test. (We can still put a test into 1.9 if it comes soon enough.) |
Change https://golang.org/cl/53530 mentions this issue: |
I also wanted to test net sockets, but I do not know how to access their file handles. So I did not implement socket tests. Updates #21172 Change-Id: I5062c0e65a817571d755397d60762c175f9791ce Reviewed-on: https://go-review.googlesource.com/53530 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Change https://golang.org/cl/65810 mentions this issue: |
Change https://golang.org/cl/71131 mentions this issue: |
Change https://golang.org/cl/71132 mentions this issue: |
…serial ports I also wanted to test net sockets, but I do not know how to access their file handles. So I did not implement socket tests. Updates #21172 Change-Id: I5062c0e65a817571d755397d60762c175f9791ce Reviewed-on: https://go-review.googlesource.com/53530 Reviewed-by: Ian Lance Taylor <iant@golang.org> Reviewed-on: https://go-review.googlesource.com/71131 Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
…netpoller internal/poll package assumes that only net sockets use runtime netpoller on windows. We get memory corruption if other file handles are passed into runtime poller. Make FD.Init receive and use useNetpoller argument, so FD.Init caller is explicit about using runtime netpoller. Fixes #21172 Change-Id: I60e2bfedf9dda9b341eb7a3e5221035db29f5739 Reviewed-on: https://go-review.googlesource.com/65810 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-on: https://go-review.googlesource.com/71132 Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
In #21153 @johnsonj and I discovered that Go 1.9 crashes on Windows (during early boot before the network is up? related?), but Go 1.8 does not.
Relevant crash:
the line we're crashing on is
src/runtime/netpoll_windows.go:105
at:And
0xc0000005
isEXCEPTION_ACCESS_VIOLATION
I believe.Is
op
orop.fd
nil?The only interesting and unique thing about this Go program on Windows is that it's running before/while the network is coming up, which is something we've never tested before.
/cc @alexbrainman @ianlancetaylor @aclements
The text was updated successfully, but these errors were encountered: