Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os/exec: data race in TestStdinClose #17647

Closed
dvyukov opened this issue Oct 28, 2016 · 13 comments
Closed

os/exec: data race in TestStdinClose #17647

dvyukov opened this issue Oct 28, 2016 · 13 comments
Labels
FrozenDueToAge NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Milestone

Comments

@dvyukov
Copy link
Member

dvyukov commented Oct 28, 2016

While testing new race runtime I've hit:

WARNING: DATA RACE
Read at 0x00c4200f8900 by goroutine 28:
  os.(*File).Write()
      /tmp/workdir/go/src/os/file.go:139 +0x70
  os.(*File).WriteString()
      /tmp/workdir/go/src/os/file.go:200 +0x7b
  os/exec.(*closeOnce).WriteString()
      <autogenerated>:43 +0x77
  io.WriteString()
      /tmp/workdir/go/src/io/io.go:289 +0xa5
  strings.(*Reader).WriteTo()
      /tmp/workdir/go/src/strings/reader.go:133 +0x172
  io.copyBuffer()
      /tmp/workdir/go/src/io/io.go:380 +0x432
  io.Copy()
      /tmp/workdir/go/src/io/io.go:360 +0x7e
  os/exec_test.TestStdinClose.func2()
      /tmp/workdir/go/src/os/exec/exec_test.go:241 +0xff

Previous write at 0x00c4200f8900 by goroutine 27:
  os.(*file).close()
      /tmp/workdir/go/src/os/file_unix.go:138 +0x10a
  os.(*File).Close()
      /tmp/workdir/go/src/os/file_unix.go:127 +0x55
  os/exec.(*closeOnce).close()
      /tmp/workdir/go/src/os/exec/exec.go:536 +0x50
  os/exec.(*closeOnce).(os/exec.close)-fm()
      /tmp/workdir/go/src/os/exec/exec.go:531 +0x41
  sync.(*Once).Do()
      /tmp/workdir/go/src/sync/once.go:44 +0xe1
  os/exec.(*closeOnce).Close()
      /tmp/workdir/go/src/os/exec/exec.go:531 +0x79
  os/exec.(*closeOnce).safeClose()
      /tmp/workdir/go/src/os/exec/exec.go:569 +0x55
  os/exec.(*closeOnce).(os/exec.safeClose)-fm()
      /tmp/workdir/go/src/os/exec/exec.go:518 +0x41
  os/exec.closerFunc.Close()
      /tmp/workdir/go/src/os/exec/exec.go:541 +0x34
  os/exec.(*Cmd).closeDescriptors()
      /tmp/workdir/go/src/os/exec/exec.go:262 +0x67
  os/exec.(*Cmd).Wait()
      /tmp/workdir/go/src/os/exec/exec.go:447 +0x2b8
  os/exec_test.TestStdinClose()
      /tmp/workdir/go/src/os/exec/exec_test.go:246 +0x21b
  testing.tRunner()
      /tmp/workdir/go/src/testing/testing.go:639 +0xcd

Goroutine 28 (running) created at:
  os/exec_test.TestStdinClose()
      /tmp/workdir/go/src/os/exec/exec_test.go:245 +0x20d
  testing.tRunner()
      /tmp/workdir/go/src/testing/testing.go:639 +0xcd

Goroutine 27 (running) created at:
  testing.(*T).Run()
      /tmp/workdir/go/src/testing/testing.go:676 +0x535
  testing.runTests.func1()
      /tmp/workdir/go/src/testing/testing.go:832 +0xaa
  testing.tRunner()
      /tmp/workdir/go/src/testing/testing.go:639 +0xcd
  testing.runTests()
      /tmp/workdir/go/src/testing/testing.go:838 +0x4a8
  testing.(*M).Run()
      /tmp/workdir/go/src/testing/testing.go:773 +0x12f
  main.main()
      os/exec/_test/_testmain.go:98 +0x1b8

I can't reproduce it with current race runtime, but the race looks very real. Introduced while fixing another race in os/exec #6270. One does not simply use os/exec without races.

@dvyukov
Copy link
Member Author

dvyukov commented Oct 28, 2016

It's not OK to call Wait while working with stdin/stdout.

@bradfitz bradfitz assigned rsc and unassigned adg Oct 28, 2016
@bradfitz bradfitz added this to the Go1.8 milestone Oct 28, 2016
@rsc
Copy link
Contributor

rsc commented Oct 28, 2016

@dvyukov, I don't understand why this is a race. I will explain why I think it's not a race. Please show me why I am wrong.

The subprocess is running case "stdinClose" at line 660:

    case "stdinClose":
            b, err := ioutil.ReadAll(os.Stdin)
            if err != nil {
                    fmt.Fprintf(os.Stderr, "Error: %v\n", err)
                    os.Exit(1)
            }
            if s := string(b); s != stdinCloseTestString {
                    fmt.Fprintf(os.Stderr, "Error: Read %q, want %q", s, stdinCloseTestString)
                    os.Exit(1)
            }
            os.Exit(0)

That is, the subprocess consumes the entire stdin and then exits.

TestStdinClose does (simplifying):

    cmd := helperCommand(t, "stdinClose")
    stdin, err := cmd.StdinPipe()
    check("StdinPipe", err)
    check("Start", cmd.Start())
    go func() {
            _, err := io.Copy(stdin, strings.NewReader(stdinCloseTestString))
            check("Copy", err)
            check("Close", stdin.Close())
    }()
    check("Wait", cmd.Wait())

The new goroutine is kicked off and cmd.Wait blocks in the wait system call waiting for the subprocess to exit.

The new goroutine writes stdinCloseTestString to stdin (using the underlying *os.File) and then closes stdin (using the underlying *os.File). Once stdin closes, the subprocess should see the EOF and exit. Then cmd.Wait unblocks - the wait system call returns - and now sees the subprocess exit and calls close on the underlying *os.File for stdin, just in case the calling code has not already closed the file. That second close should not do anything because the file is already closed. Also, the whole thing is sequenced: the goroutine's close of stdin is what makes the subprocess exit, and the subprocess exit is what makes the second half of cmd.Wait begin, and the second half of cmd.Wait is what is closing the file in the race report.

Having explained why this code is essentially single-threaded, I am confused about the race report. The race report seems to be saying that the second half of cmd.Wait has already executed and the copy loop in the goroutine is now running and racing with Wait. But assuming something is not completely broken in the test, the copy loop should always run before cmd.Wait.

Is the race report accurate about the time ordering here? Or does the race detector just pick one event to be 'previous' when it doesn't actually know which one came first in time order?

If the race report is accurate about the time ordering, then the test itself must be broken somehow, in which case there should be other failure output. If the race report has the time ordering backward, and in fact it is objecting to the write in goroutine 27 racing with a previous read in goroutine 28, that's just not a race: they are sequenced by the subprocess exiting.

@dvyukov
Copy link
Member Author

dvyukov commented Oct 28, 2016

@rsc Your reasoning regarding event ordering is correct, I missed that there is transitive relation. Race detector does not see transitive synchronization that involves other process. To fix that we could synchronize all fd Close's with all unblocks of cmd.Wait (Close will need to a release on a fake address, and Wait do acquire on the same address after return). However, the signal to exit can have other forms, e.g. one process could write some magic value to a pipe, or maybe create a special file or disk.

The ordering of events in the report is precise. Tsan observed that in this particular execution Close happened before Write.

How it happened is a good question. Fortunately I still have the log:

--- FAIL: TestStdPipe (8.17s)
    pipe_test.go:85: unexpected exit status signal: killed for descriptor 1s sig false
FAIL
FAIL    os  57.546s
==================
WARNING: DATA RACE
Read at 0x00c4200f8900 by goroutine 28:
  os.(*File).Write()
      /tmp/workdir/go/src/os/file.go:139 +0x70

So in this case the child process was killed (seems to be happening on race gomote machines regularly:
https://groups.google.com//forum/#!topic/golang-dev/y0GnG9vvttI).

Which suggests that: you can never be sure that the process won't exit before you signal it. A child process can be killed by user, by kernel, or just exit with an error.
Now consider that we are sending some sensitive data to a trusted subprocess. The process gets killed. Wait returns and closes stdin fd. Then the fd is reused for an external internet connection and we continue sending the sensitive data there as if nothing happened.

@dvyukov
Copy link
Member Author

dvyukov commented Oct 28, 2016

That was a wrong part of the log, but it was still killed:

==================
WARNING: DATA RACE
Read at 0x00c4200f8900 by goroutine 28:
  os.(*File).Write()
      /tmp/workdir/go/src/os/file.go:139 +0x70
  os.(*File).WriteString()
      /tmp/workdir/go/src/os/file.go:200 +0x7b
  os/exec.(*closeOnce).WriteString()
      <autogenerated>:43 +0x77
  io.WriteString()
      /tmp/workdir/go/src/io/io.go:289 +0xa5
  strings.(*Reader).WriteTo()
      /tmp/workdir/go/src/strings/reader.go:133 +0x172
  io.copyBuffer()
      /tmp/workdir/go/src/io/io.go:380 +0x432
  io.Copy()
      /tmp/workdir/go/src/io/io.go:360 +0x7e
  os/exec_test.TestStdinClose.func2()
      /tmp/workdir/go/src/os/exec/exec_test.go:241 +0xff

Previous write at 0x00c4200f8900 by goroutine 27:
  os.(*file).close()
      /tmp/workdir/go/src/os/file_unix.go:138 +0x10a
  os.(*File).Close()
      /tmp/workdir/go/src/os/file_unix.go:127 +0x55
  os/exec.(*closeOnce).close()
      /tmp/workdir/go/src/os/exec/exec.go:536 +0x50
  os/exec.(*closeOnce).(os/exec.close)-fm()
      /tmp/workdir/go/src/os/exec/exec.go:531 +0x41
  sync.(*Once).Do()
      /tmp/workdir/go/src/sync/once.go:44 +0xe1
  os/exec.(*closeOnce).Close()
      /tmp/workdir/go/src/os/exec/exec.go:531 +0x79
  os/exec.(*closeOnce).safeClose()
      /tmp/workdir/go/src/os/exec/exec.go:569 +0x55
  os/exec.(*closeOnce).(os/exec.safeClose)-fm()
      /tmp/workdir/go/src/os/exec/exec.go:518 +0x41
  os/exec.closerFunc.Close()
      /tmp/workdir/go/src/os/exec/exec.go:541 +0x34
  os/exec.(*Cmd).closeDescriptors()
      /tmp/workdir/go/src/os/exec/exec.go:262 +0x67
  os/exec.(*Cmd).Wait()
      /tmp/workdir/go/src/os/exec/exec.go:447 +0x2b8
  os/exec_test.TestStdinClose()
      /tmp/workdir/go/src/os/exec/exec_test.go:246 +0x21b
  testing.tRunner()
      /tmp/workdir/go/src/testing/testing.go:639 +0xcd

Goroutine 28 (running) created at:
  os/exec_test.TestStdinClose()
      /tmp/workdir/go/src/os/exec/exec_test.go:245 +0x20d
  testing.tRunner()
      /tmp/workdir/go/src/testing/testing.go:639 +0xcd

Goroutine 27 (running) created at:
  testing.(*T).Run()
      /tmp/workdir/go/src/testing/testing.go:676 +0x535
  testing.runTests.func1()
      /tmp/workdir/go/src/testing/testing.go:832 +0xaa
  testing.tRunner()
      /tmp/workdir/go/src/testing/testing.go:639 +0xcd
  testing.runTests()
      /tmp/workdir/go/src/testing/testing.go:838 +0x4a8
  testing.(*M).Run()
      /tmp/workdir/go/src/testing/testing.go:773 +0x12f
  main.main()
      os/exec/_test/_testmain.go:98 +0x1b8
==================
--- FAIL: TestStdinClose (36.33s)
    exec_test.go:227: Wait: signal: killed
    exec_test.go:227: Copy: write |1: file already closed
FAIL
FAIL    os/exec 65.409s

@yoni386
Copy link

yoni386 commented Oct 30, 2016

cmd.Process.Kill() can be a workaround? is there temporary solution?

@dvyukov
Copy link
Member Author

dvyukov commented Oct 30, 2016

The solution is to not call cmd.Wait() while working with stdin/stdout pipes.

@yoni386
Copy link

yoni386 commented Oct 31, 2016

Thanks the problem is still valid
file descriptors are still used. any idea when version 1.8 is scheduled to?

@ianlancetaylor
Copy link
Contributor

1.8 is scheduled to be released February 1, 2017.

@quentinmit quentinmit added the NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. label Nov 7, 2016
@ianlancetaylor ianlancetaylor assigned ianlancetaylor and unassigned rsc Nov 16, 2016
@gopherbot
Copy link

CL https://golang.org/cl/33298 mentions this issue.

@yoni386
Copy link

yoni386 commented Feb 1, 2017 via email

@ianlancetaylor
Copy link
Contributor

1.8 includes the fix. Release candidate 3 for Go 1.8 is currently available, and the final release will occur within a couple of weeks. See the golang-dev mailing list for details.

@yoni386
Copy link

yoni386 commented Feb 1, 2017 via email

@gopherbot
Copy link

Change https://golang.org/cl/65490 mentions this issue: os/exec: remove protection against simultaneous Wait/Write

gopherbot pushed a commit that referenced this issue Sep 22, 2017
CL 31148 added code to protect again simultaneous calls to Close and
Wait when using the standard input pipe, to fix the race condition
described in issue #9307. That issue is a special case of the race
between Close and Write described by issue #7970. Since issue #7970
was not fixed, CL 31148 fixed the problem specific to os/exec.

Since then, issue #7970 has been fixed, so the specific fix in os/exec
is no longer necessary. Remove it, effectively reverting CL 31148 and
followup CL 33298.

Updates #7970
Updates #9307
Updates #17647

Change-Id: Ic0b62569cb0aba44b32153cf5f9632bd1f1b411a
Reviewed-on: https://go-review.googlesource.com/65490
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Miguel Bernabeu <miguelbernadi@gmail.com>
Reviewed-by: Russ Cox <rsc@golang.org>
Reviewed-by: Joe Tsai <joetsai@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
@golang golang locked and limited conversation to collaborators Sep 22, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Projects
None yet
Development

No branches or pull requests

8 participants