Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os: File.Close throws interrupted system call (EINTR) error #44964

Closed
meetme2meat opened this issue Mar 12, 2021 · 10 comments
Closed

os: File.Close throws interrupted system call (EINTR) error #44964

meetme2meat opened this issue Mar 12, 2021 · 10 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker
Milestone

Comments

@meetme2meat
Copy link

meetme2meat commented Mar 12, 2021

What version of Go are you using (go version)?

$ go version
go version go1.16 darwin/amd64

Does this issue reproduce with the latest release?

The binary was build using the go1.16 version (installed on my mac) and our production binaries are build using go 1.15.x.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/admin/Library/Caches/go-build"
GOENV="/Users/admin/Library/Application Support/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/admin/Documents/goProject/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/Users/admin/Documents/goProject"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/Cellar/go/1.16/libexec"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/Cellar/go/1.16/libexec/pkg/tool/darwin_amd64"
GOVCS=""
GOVERSION="go1.16"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="0"
GOMOD="/dev/null"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/s8/zq3ly8712h7b9t9trrrxfphw0000gn/T/go-build3907506663=/tmp/go-build -gno-record-gcc-switches"

What did you do?

I build the binary from the following script https://play.golang.org/p/U-SEH0ACtZ2

export GOOS=linux
export GOARCH=amd64
go build -o filed3 file.go

Started a docker with ubuntu

docker run  --cap-add SYS_ADMIN --cap-add DAC_READ_SEARCH  -it ubuntu /bin/bash

Mounted the Azure File share at /data/cdrs as described here

Note: This is exactly how our production is setup(where we are seeing similar a problem), except for building the binary which we do use the multi-stage docker image

Copied the binary onto the docker container

docker cp filed3 73795394b327:/

What did you expect to see?

No error while writing the file to disk

What did you see instead?

./filed3 random01.txt
439
 Written 50 Lac recordsErr: Closing file close /data/cdrs/random01.txt: interrupted system call

The problem is easier to trigger when the terminal window is continously resized (this triggers SIGWINCH signals).

Because of EINTR at times we are seeing corrupted files(incomplete write) getting written to disk.

@AlexRouSg
Copy link
Contributor

Does calling handler.Sync(), possibly in a loop until err is not EINTR, help?

@bcmills bcmills added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Mar 12, 2021
@bcmills bcmills added this to the Go1.17 milestone Mar 12, 2021
@bcmills
Copy link
Contributor

bcmills commented Mar 12, 2021

CC @ianlancetaylor

@dmitshur dmitshur changed the title file.Close throws interrupted system call(EINTR) error os: File.Close throws interrupted system call (EINTR) error Mar 13, 2021
@ianlancetaylor
Copy link
Contributor

ianlancetaylor commented Mar 15, 2021

This is an interesting case. POSIX does not define what happens if close returns EINTR: the descriptor might be closed, or it might remain open. So we definitely can't loop on EINTR as we usually do. If we looped, and the kernel did close the descriptor, then the next call to close might close a descriptor that had just been opened by some other goroutine. In particular, the Linux kernel does promise to close the file descriptor, so we definitely can't loop on GNU/Linux.

Presumably we see EINTR because most kernels do not treat close as a restartable call, so although since this is a pure Go program the SIGWINCH signal handler will have been installed with the SA_RESTART flag set, the kernel may not restart the close calls when it sees EINTR.

You report that you are sometimes seeing corrupted files on disk, so there really was an error. The close operation really was interrupted by a signal, the system call was not restarted, and EINTR is reporting a real problem. But we definitely can't call close again.

So I don't see any solution. I'm open to suggestions.

I would imagine that you could create the same problem with a pure C program, and would have the same difficulty in how to address it. It seems like a bug in how the mount of the Azure file system handles signals, one with no obvious workaround.

@meetme2meat
Copy link
Author

@ianlancetaylor I don't know much of C sorry about that. I'll see if I can build a C program to test it against.

@meetme2meat
Copy link
Author

@ianlancetaylor I tried the sample c code it did not yield EINTR error by terminal resizing so I had to write a go script to sent SIGWINCH for the running (C program) PID.

I ran the executable(C exec) multiple times. And the only time it failed it errored with "Input/output error" EIO. I'm not very confident with my C code so I can't probably speak much as to why it did not error with EINTR.

If it helps the go program that was sending SIGWINCH to running C program Pid look somewhat like this

        pid, err := strconv.Atoi(os.Args[1])
	if err != nil {
		panic(err)
	}

	for {
		err = syscall.Kill(pid, syscall.SIGWINCH)
		if err != nil {
			panic(err)
		}
		time.Sleep(10 * time.Millisecond)
	} 

@networkimprov
Copy link

A user-code workaround may be to call f.Sync() before f.Close(). If either of those fails, you should abort.

@meetme2meat
Copy link
Author

@networkimprov That could have a possible solution but because of the EINTR error on .Close (even when all data have been flushed) we mounted drive contain a reference to file because of which creating the same file again never succeed.

Just an update I'm also speaking with Azure tech support about this as @ianlancetaylor suggested it could be a bug on the Azure file system on how it handles signals.

@networkimprov
Copy link

Have you tried writing the file at a temporary path, then f.Sync() and rename it? On failure, remove the temporary path if it exists, then start over if the renamed path doesn't exist. See also:

https://danluu.com/file-consistency/
https://danluu.com/deconstruct-files/

@ianlancetaylor
Copy link
Contributor

@meetme2meat To get equivalent C code the C program needs to use the sigaction function to install a signal handler for SIGWINCH. Without a signal handler you won't get an EINTR error.

@ianlancetaylor
Copy link
Contributor

At this point I don't see anything that we could change in Go to fix this problem. So I am going to close this issue.

I am definitely open to suggestions if anybody has any ideas, but for now I think this can only be regarded as a bug in the file system implementation, or possibly the kernel.

@golang golang locked and limited conversation to collaborators Apr 1, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker
Projects
None yet
Development

No branches or pull requests

7 participants