Skip to content

runtime: stray println in runtime epoll function #11498

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mdwhatcott opened this issue Jul 1, 2015 · 12 comments
Closed

runtime: stray println in runtime epoll function #11498

mdwhatcott opened this issue Jul 1, 2015 · 12 comments
Milestone

Comments

@mdwhatcott
Copy link

After seeing the following message (that looks like an error) I found a stray println that doesn't seem to be very helpful since there is retry logic in place. The program's exit code was 0 and the results were as expected so there actually wasn't a problem.

https://github.com/golang/go/blob/master/src/runtime/netpoll_epoll.go#L72

I'm proposing that the stray println be removed unless it is otherwise helpful or necessary.

@adg adg changed the title Stray println in runtime epoll function runtime: stray println in runtime epoll function Jul 1, 2015
@adg adg added this to the Go1.5Maybe milestone Jul 1, 2015
@gopherbot
Copy link
Contributor

CL https://golang.org/cl/12047 mentions this issue.

@ianlancetaylor
Copy link
Member

@mdwhatcott You neglected to include the actual error message. Do you still have it?

CC @dvyukov

@mdwhatcott
Copy link
Author

@ianlancetaylor:

The exact error text was:

runtime: epollwait on fd 7 failed with 9

I believe it was generated here:

https://github.com/golang/go/blob/master/src/runtime/netpoll_epoll.go#L72

@ianlancetaylor
Copy link
Member

Thanks. @dvyukov This is failing with EBADF. I admit that I don't know how that could happen.

I continue to believe that we must either remove this println or add a throw after it. We can not have Go programs print out random messages in the middle of their execution. Which approach do you prefer?

@dvyukov
Copy link
Member

dvyukov commented Jul 11, 2015

@ianlancetaylor That can happen if the program closes random file descriptors (due to a race for example).
I think we need to throw, but it is somewhat risky just before release. I would prefer to do it early in 1.6 cycle. It is possible that windows/darwin/openbsd return some legitimate temp failures other than EINTR. This does not look very urgent as it is most likely a bug in user program.

@mdwhatcott
Copy link
Author

It could very well be a bug, but it's only happened for one user running Linux version 2.6.18-400.1.1.el5 (mockbuild@x86-012.build.bos.redhat.com) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-55)). We have many users running this program across various versions of windows, linux (x64 and x86), and mac.

I think either plan that was suggested (print and throw or just remove the print) is better than the current state of affairs.

@ianlancetaylor
Copy link
Member

@dvyukov I agree that throwing is risky just before the release. In that case I think we should commit http://golang.org/cl/12047 and revisit after the release. The current code is not what we want. Can you +2 that CL?

(I actually don't see how epoll_wait can return EBADF even if the program closes random file descriptors. epoll_wait doesn't care whether the file descriptors it uses have been closed. I can't see any kernel path that returns EBADF, except of course if the epoll descriptor itself is closed.)

@dvyukov
Copy link
Member

dvyukov commented Jul 12, 2015

@ianlancetaylor Yes, I mean that the epoll fd is closed.
In the case of EBADF the thread that calls netpoll will hang in an active spin, and the program will most likely hang. I agree that proper error handling of any transient errors and a throw on non-transient failures would be better. But if we now choose between a silent non-diagnosable hang and a "runtime: epollwait on fd 7 failed with 9" error message before hang; it is not clear to me what is better. And note that this print should not happen at all in correct programs (I don't remember any mentions of this print in the last two years). For example, this issue would not be failed if we would removed the print, which is bad. So what exactly situation you are trying to solve/improve by removing the print?

@ianlancetaylor
Copy link
Member

How could the epoll FD be closed? Nothing ever closes it.

The original issue report says that the program completed normally. That is the situation I am trying to improve: the one reported in this issue. The program ran normally, except that it printed out a meaningless message.

If the choice is a meaningless message and a hanging program, or a hanging program without a meaningless message, that is no choice at all. If those are the real choices--which if this issue is correct is not actually the case--then I choose the throw.

@ianlancetaylor
Copy link
Member

In fact, if the epoll FD is closed, then as far as I can tell by looking at the code we aren't going to see a single message. We're going to see a program that loops endlessly printing the same message over and over. So, again, we should do the throw.

@ianlancetaylor
Copy link
Member

Oh, I see, netpolllasterr avoids the endless print.

I'm going to change the CL to throw.

@dvyukov
Copy link
Member

dvyukov commented Jul 12, 2015

How could the epoll FD be closed? Nothing ever closes it.

Well, that's trivial, you just do:

syscall.Close(7)

We're going to see a program that loops endlessly printing the same message over and over.

That's what netpolllasterr for.

The original issue report says that the program completed normally.

I think normal completion is pure luck in this case. If it would wait on some network IO, it would hang because epoll is broken.

@ianlancetaylor ianlancetaylor modified the milestones: Go1.6, Go1.5Maybe Jul 13, 2015
@golang golang locked and limited conversation to collaborators Sep 22, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants