Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: unable to acquire - semaphore out of sync #16646

Closed
karalabe opened this issue Aug 9, 2016 · 15 comments
Closed

runtime: unable to acquire - semaphore out of sync #16646

karalabe opened this issue Aug 9, 2016 · 15 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Windows
Milestone

Comments

@karalabe
Copy link
Contributor

karalabe commented Aug 9, 2016

Today one of our CI tests failed on AppVeyor, Windows, Go 1.6.2 with the error message seen in the title. I don't have a reliable way to reproduce it, it's inside a huge project, but here's the complete stack dump if it helps, at least to provide some hints whether it's our code or Go. (I assume the fault is on our end but I haven't ever seen this message so can't trace is properly).

@davecheney
Copy link
Contributor

Are you positive thing codebase contains no data races?

On Tue, 9 Aug 2016, 22:27 Péter Szilágyi notifications@github.com wrote:

Today one of our CI tests failed on AppVeyor, Windows, Go 1.6.2 with the
error message seen in the title. I don't have a reliable way to reproduce
it, it's inside a huge project, but here's the complete stack dump
https://ci.appveyor.com/project/tgerring/go-ethereum/build/develop.298#L236
if it helps, at least to provide some hints whether it's our code or Go. (I
assume the fault is on our end but I haven't ever seen this message so
can't trace is properly).


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#16646, or mute the thread
https://github.com/notifications/unsubscribe-auth/AAAcA0vBuI0FKBZhBSZ-_qdGwRYX_31pks5qeHJPgaJpZM4JgBXz
.

@fjl
Copy link

fjl commented Aug 9, 2016

We are reasonably confident that the codebase doesn't contain data races. But of course there is no way to prove it to you ;)

@karalabe
Copy link
Contributor Author

karalabe commented Aug 9, 2016

@fjl I did merge in one of your PRs a few hours ago, so it's worth a double check, though it's a very strange error that I haven't seen before.

@davecheney
Copy link
Contributor

Thank you for confirming. Go 1.7 will be released next week, so any fix to
this issue will land in go 1.7 if it is not already fixed. Can you please
test with the latest go 1.7 release candidate.

On Tue, 9 Aug 2016, 22:56 Felix Lange notifications@github.com wrote:

We are reasonably confident that the codebase doesn't contain data races.


You are receiving this because you commented.

Reply to this email directly, view it on GitHub
#16646 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAAcA1mVFak2ASyM-3I6SbXtU1ulHoJtks5qeHkHgaJpZM4JgBXz
.

@ianlancetaylor ianlancetaylor added this to the Go1.8 milestone Aug 9, 2016
@ianlancetaylor
Copy link
Contributor

You should have gotten a stack backtrace with the error. Can you attach it here?

@ianlancetaylor
Copy link
Contributor

Oh, sorry, I see you provided a link to the stack trace above.

@ianlancetaylor
Copy link
Contributor

I don't know what the bug is, but here is what is going on. Your program is entering the stop-the-world phase of a garbage collection. The goroutine that started that is telling all the other goroutines to stop. It is sleeping on a note waiting for a notification that a goroutine has stopped, with a deadline of 100us. The code (notetsleep_internal) calls WaitForSingleObject with a deadline of 100us. WaitForSingleObject returned an error, assumed to indicate a timeout, meaning that the deadline has expired. When the goroutine goes to check the note, it finds that the note has been woken up. It calls WaitForSingleObject with no deadline, expecting to acquire the semaphore. That calls fails unexpectedly.

A call to WaitForSingleObject with no deadline should not fail. I think what we need to do is modify os_windows.go to report the actual failure in that case. That might help clarify what has happened here.

@jboelter
Copy link

jboelter commented Aug 9, 2016

It looks like a few things could be cleaned up here. The return value isn't checked from CreateEvent which could be returning a null event if it failed to create an event.

WaitForSingleObject in semasleep is assuming any non-zero return values is a timeout, there's some nuance here. Timeout is a 0x0102, other errors may be returned.

@gopherbot
Copy link

CL https://golang.org/cl/26655 mentions this issue.

@fjl
Copy link

fjl commented Aug 16, 2016

It has happened again, in the same unit test: https://ci.appveyor.com/project/tgerring/go-ethereum/build/develop.308#L236

@jboelter
Copy link

Are you able to test with a custom build from the CL above? It should give you better insight into the failure.

@fjl
Copy link

fjl commented Aug 16, 2016

Not really. I'll try to get go built with the Cl onto AppVeyor this week.

I haven't been able to reproduce the failure locally, the test passes 300+ iterations in my Windows VM.

@quentinmit quentinmit added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Oct 11, 2016
gopherbot pushed a commit that referenced this issue Oct 12, 2016
Add checks for failure of CreateEvent, SetEvent or
WaitForSingleObject. Any failures are considered fatal and
will throw() after printing an informative message.

Updates #16646

Change-Id: I3bacf9001d2abfa8667cc3aff163ff2de1c99915
Reviewed-on: https://go-review.googlesource.com/26655
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
@rsc rsc modified the milestones: Go1.9, Go1.8 Nov 11, 2016
@aclements
Copy link
Member

Hi @fjl. CL 26655 was released as part of Go 1.8. Have you had any failures of this sort on 1.8?

@fjl
Copy link

fjl commented Jun 14, 2017

No, it hasn't happened again. You can close.

@aclements
Copy link
Member

Thanks!

@golang golang locked and limited conversation to collaborators Jun 14, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Windows
Projects
None yet
Development

No branches or pull requests

9 participants