Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime/cgo: "pthread_create failed" during syscall.Exec on Darwin/OpenBSD/DragonFly #18146

Closed
bcmills opened this issue Dec 1, 2016 · 12 comments

Comments

@bcmills
Copy link
Contributor

bcmills commented Dec 1, 2016

Twice this week I've helped people debug panics in Go programs due to runtime/cgo: pthread_create failed: Resource temporarily unavailable. Both times, it has turned out that they were calling syscall.Exec and it happened to execute concurrently with a pthread_create for a background goroutine, causing the pthread_create to fail with EAGAIN.

There have been a couple of previous threads on go-nuts involving similar issues.

We could consider modifying the runtime to stop the world and/or shutdown threads during calls to syscall.Exec, but that seems like a fair amount of work and the syscall package is frozen / deprecated anyway.

As a simpler step, I think we should have vet warn about any calls to syscall.Exec from outside the standard library.

@bcmills bcmills changed the title cmd/vet: warn about calls to syscall.Exec cmd/vet: warn about calls to syscall.Exec Dec 1, 2016
@ianlancetaylor
Copy link
Contributor

If you can write a test case for the problem, we will simply change runtime/cgo to retry the pthread_create if it fails with EAGAIN. The only reason I didn't do that years ago is that I wasn't able to write a test case.

@bcmills
Copy link
Contributor Author

bcmills commented Dec 2, 2016

I'll try to write one tomorrow based on the failures I've been seeing.

I think the key elements are:

  • linking with C (to use pthread_create)
  • starting nontrivial background goroutines during package init (to produce a shortage of threads)
  • calling Exec very early (so that we haven't reached GOMAXPROCS yet when it executes)

@bcmills
Copy link
Contributor Author

bcmills commented Dec 2, 2016

The following test gives me a flake rate of about 0.1% of attempts:

// Copyright 2016 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.

// Issue 18146: pthread_create failure during syscall.Exec.

package cgotest

import "C"

import (
	"bytes"
	"crypto/md5"
	"os"
	"os/exec"
	"runtime"
	"syscall"
	"testing"
)

func test18146(t *testing.T) {
	const (
		attempts = 1000
		threads  = 4
	)

	if os.Getenv("test18146") == "exec" {
		runtime.GOMAXPROCS(1)
		for n := threads; n > 0; n-- {
			go func() {
				for {
					_ = md5.Sum([]byte("Hello, !"))
				}
			}()
		}
		runtime.GOMAXPROCS(threads)
		if err := syscall.Exec("/bin/true", []string{"/bin/true"}, nil); err != nil {
			t.Fatal(err)
		}
	}

	var cmds []*exec.Cmd
	defer func() {
		for _, cmd := range cmds {
			cmd.Process.Kill()
		}
	}()

	args := append(append([]string(nil), os.Args[1:]...), "-test.run=Test18146")
	for n := attempts; n > 0; n-- {
		cmd := exec.Command(os.Args[0], args...)
		cmd.Env = append(os.Environ(), "test18146=exec")
		buf := bytes.NewBuffer(nil)
		cmd.Stdout = buf
		cmd.Stderr = buf
		if err := cmd.Start(); err != nil {
			t.Error(err)
			return
		}
		cmds = append(cmds, cmd)
	}

	failures := 0
	for _, cmd := range cmds {
		err := cmd.Wait()
		if err == nil {
			continue
		}

		t.Errorf("syscall.Exec failed: %v\n%s", err, cmd.Stdout)
		failures++
	}

	if failures > 0 {
		t.Logf("Failed %v of %v attempts.", failures, len(cmds))
	}
}

@bcmills bcmills changed the title cmd/vet: warn about calls to syscall.Exec runtime/cgo: "pthread_create failed" panics during syscall.Exec Dec 2, 2016
@bcmills bcmills changed the title runtime/cgo: "pthread_create failed" panics during syscall.Exec runtime/cgo: "pthread_create failed" panic during syscall.Exec Dec 2, 2016
@bcmills bcmills changed the title runtime/cgo: "pthread_create failed" panic during syscall.Exec runtime/cgo: "pthread_create failed" during syscall.Exec Dec 2, 2016
@ianlancetaylor
Copy link
Contributor

@bcmills thanks for figuring out the test. Sent https://golang.org/cl/33894.

@gopherbot
Copy link

CL https://golang.org/cl/33894 mentions this issue.

gopherbot pushed a commit that referenced this issue Dec 5, 2016
Update #18146.

Change-Id: Ib447aabae9f203a8b61fb8c984b57d8e2bfe69c2
Reviewed-on: https://go-review.googlesource.com/33894
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@ianlancetaylor ianlancetaylor changed the title runtime/cgo: "pthread_create failed" during syscall.Exec runtime/cgo: "pthread_create failed" during syscall.Exec on Darwin & OpenBSD Dec 5, 2016
@ianlancetaylor
Copy link
Contributor

I believe this is now fixed on systems other than OpenBSD (requires changes to runtime/cgo/gcc_libinit_openbsd.c) and Darwin (test fails for unknown reasons).

@ianlancetaylor
Copy link
Contributor

Also fails on Dragonfly according to #18198 (https://build.golang.org/log/06e9dfb416c79a70e4c2842b57c1f86d1beb6be0).

@ianlancetaylor ianlancetaylor changed the title runtime/cgo: "pthread_create failed" during syscall.Exec on Darwin & OpenBSD runtime/cgo: "pthread_create failed" during syscall.Exec on Darwin/OpenBSD/DragonFly Dec 5, 2016
@gopherbot
Copy link

CL https://golang.org/cl/33905 mentions this issue.

@gopherbot
Copy link

CL https://golang.org/cl/33906 mentions this issue.

gopherbot pushed a commit that referenced this issue Dec 5, 2016
Fails on builder for unknown reasons.

Fixes #18198.
Update #18146.

Change-Id: Iaa85826655eee57d86e0c73d06c930ef3f4647ec
Reviewed-on: https://go-review.googlesource.com/33906
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@gopherbot
Copy link

CL https://golang.org/cl/33907 mentions this issue.

gopherbot pushed a commit that referenced this issue Dec 6, 2016
Seen on the OpenBSD/AMD64 builder:
https://build.golang.org/log/fa34df1bcd3af12d4fc0fb0e60e3c6197a2a6f75

Update #18146.

Change-Id: I2646621488be84d50f47c312baa0817c72e3c058
Reviewed-on: https://go-review.googlesource.com/33907
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@mdempsky
Copy link
Member

FYI, just saw a flake on linux-amd64: https://storage.googleapis.com/go-build-log/28b72716/linux-amd64_50de8e07.log

gopherbot pushed a commit that referenced this issue Jun 28, 2017
The test added for issue #18146 exposed a long-existing bug in the
Solaris port; notably, that syscall.Exec uses RawSyscall -- which is not
actually functional for the Solaris port (intentionally) and only exists
as a placebo to satisfy build requirements.

Call syscall.execve instead for Solaris.

Fixes #20832

Change-Id: I327d863f4bbbbbb6e5ecf66b82152c4030825d09
Reviewed-on: https://go-review.googlesource.com/47032
Run-TryBot: Shawn Walker-Salas <shawn.walker@oracle.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@gopherbot
Copy link

Change https://golang.org/cl/47032 mentions this issue: syscall: fix Exec on solaris

@golang golang locked and limited conversation to collaborators Jul 27, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants