Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os: buffer not flushed correctly #25631

Closed
theute opened this issue May 29, 2018 · 15 comments
Closed

os: buffer not flushed correctly #25631

theute opened this issue May 29, 2018 · 15 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@theute
Copy link

theute commented May 29, 2018

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

I ran the code on multiple versions:
It works on v1.7.5 and v1.8.7
It fails on v1.9, v1.9.4, v1.9.6 and v1.10.2

Does this issue reproduce with the latest release?

Yes (1.10.2)

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/theute/.go"
GORACE=""
GOROOT="/home/theute/Tools/Go/latest"
GOTOOLDIR="/home/theute/Tools/Go/latest/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build923971487=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="1"
PKG_CONFIG="pkg-config"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"

What did you do?

I'm writing to the USB device of my linux machine /dev/usb/lp0 to a printer. (I tried on my laptop and a raspberry)
The code is https://play.golang.org/p/7J8srsKSzSf

What did you expect to see?

It should have printed up to "test-11"

What did you see instead?

It stopped printing at "test-8" with no error.
If I compile with v1.7.5 or v1.8.7 then it prints everything

Note: If I add a 5s sleep at the end of the print method then it prints everything. But if I call the print method followed by a call to Sleep it will not help.

@ianlancetaylor ianlancetaylor changed the title Buffer not flushed correctly os: buffer not flushed correctly May 29, 2018
@ianlancetaylor
Copy link
Contributor

You aren't checking the error return of the Close method. Please do so, although I don't know that it will make any difference.

Do you see the same behavior from an identical C program?

@ianlancetaylor ianlancetaylor added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label May 29, 2018
@ianlancetaylor ianlancetaylor added this to the Go1.11 milestone May 29, 2018
@bradfitz
Copy link
Contributor

Please post strace -f output too.

@bradfitz bradfitz added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label May 30, 2018
@theute
Copy link
Author

theute commented May 31, 2018

So here is the code with the error check on Close (there is no error showing up) https://play.golang.org/p/9vEmEpSuXK_b
Here is the "strace -f" when it stops printing at "test-8":
https://pastebin.com/89mjEEvQ

I didn't try in C as I wouldn't remember how to do that ;)

@theute
Copy link
Author

theute commented May 31, 2018

BTW by writing straight to the file it works (os.File.WriteString()), it's just through the buffer that it eats the end even though the code is executed and I see the iobuf.Buffered() growing as expected.
It sounds like the flushing is killed prematurely.

@kostix
Copy link

kostix commented May 31, 2018

May it be due to this change introduced in 1.9 IIUC?

I mean, from the strace output, it looks like the syscall on line 212

epoll_ctl(4, EPOLL_CTL_DEL, 3, 0xc42004fd6c <unfinished ...>

is operating on the FD 3, which is the same FD, write(2) output the data into;
so it looks like that FD gone through the poller somehow.

@theute
Copy link
Author

theute commented May 31, 2018

The behaviour definitely changed from 1.9 so it's very possible

@ghost
Copy link

ghost commented May 31, 2018

[pid 26602] write(3, "\33@test-1\ntest-2\ntest-3\ntest-4\nte"..., 81 <unfinished ...>
[pid 26602] <... write resumed> ) = 81

Looks like all the 81 bytes were written.

@bradfitz bradfitz removed the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label May 31, 2018
@bradfitz
Copy link
Contributor

@ianlancetaylor, thoughts on how the runtime poller for files would affect this?

@kostix
Copy link

kostix commented May 31, 2018

I wonder how the strace output looks under Go < 1.9.

May it be that there it would not be written as a single chunk?

@ianlancetaylor
Copy link
Contributor

I agree with @opennota : the strace output clearly shows that all 81 bytes were written to the file, and then the file is closed without error.

The effect of the change to use the runtime poller is that the file is put into nonblocking mode which shows up in the strace as fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK|O_LARGEFILE). Since this is an unusual file (/dev/usb/lp0) that could be related. However, that is not consistent with the fact that calling the WriteString method reportedly does work. The fact that the code uses bufio.NewWriter should be irrelevant, and, according to the strace, it is irrelevant. The bytes are definitely being written to the file descriptor, and the write system call is successfully writing all the bytes.

If it weren't for that discrepancy, I would guess that there is a bug in the kernel driver when the descriptor is in non-blocking mode: I would guess that it is incorrectly reporting that all the bytes were written, when only some were. It is also possible that in nonblocking mode the kernel queues the bytes for writing, returns before they are sent to the hardware, and then aborts the queued bytes when the descriptor is closed. However that too is not consistent with different behavior when using bufio.NewWriter.

The simple workaround may be to call the Fd method to put the descriptor back into blocking mode. But I don't know what the real problem is.

@kostix
Copy link

kostix commented Jun 1, 2018

@ianlancetaylor, do I understand you correctly that by saying

bug in the kernel driver when the descriptor is in non-blocking mode: <…> it is incorrectly reporting that all the bytes were written, when only some were.

you mean that it had to return EAGAIN or EWOULDBLOCK so that the Go runtime would put that FD to the poller and wait on it?

@ianlancetaylor
Copy link
Contributor

I should make clear that I am only speculating, I haven't looked at the code.

And thinking about it a bit more, my speculation can't be right, since all the bytes are written out if the program waits a few seconds. So maybe the problem is that the close system call is aborting any queued bytes, when it should instead wait until they are written. But again I am only speculating. I don't know what is happening. But I do still suspect that there is some bug in the kernel driver.

@kostix
Copy link

kostix commented Jun 1, 2018

So, may be the answer is to make sure that the FD is put back into the blocking mode
(via fcntl(2) on Linux) right before calling close(2)?

The reasoning is as follows:
the open(2) manual page for Linux says:

O_NONBLOCK or O_NDELAY

When possible, the file is opened in nonblocking mode.
Neither the open() nor any subsequent operations on the file
descriptor
which is returned will cause the calling process to wait.

(Emphasis mine.) And this suggests close(2) is also non-blocking (though I admit
its manual page does not mention how it supposed to behave in this case).

The close(2) manual of POSIX
mentions O_NONBLOCK only in relation with the STREAMS subsystem and sockets;
its open(2) manual says that the call itself should not block and the behaviour of all
the other operations is "device-specific".


A quick test would be to copy the implementation of SetNonblock() from there and call it right before invoking the CloseFunc on the FD here.

@ianlancetaylor
Copy link
Contributor

It seems to me that even if close is non-blocking, it should not abort writing any queued up data to the descriptor. The bytes should continue to flow out even after the descriptor is closed. For example, that is how network sockets work (as adjustable using SO_LINGER).

@ianlancetaylor
Copy link
Contributor

Since the strace shows that all the bytes are written to the kernel, I'm very skeptical that there is anything we can do in Go to fix this. I'm going to close this issue. Please comment if you disagree.

@golang golang locked and limited conversation to collaborators Jun 27, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

5 participants