Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: preemption-related deadlock when calling Go from C #35294

Closed
mdempsky opened this issue Oct 31, 2019 · 11 comments
Closed

runtime: preemption-related deadlock when calling Go from C #35294

mdempsky opened this issue Oct 31, 2019 · 11 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker
Milestone

Comments

@mdempsky
Copy link
Member

While comparing dvyukov/go-fuzz with mdempsky/go114-fuzz-build, I noticed that fuzzing k8s.io/kubernetes/test/fuzz/yaml.FuzzSigYaml with libFuzzer seems to periodically run into timeouts. I haven't been able to reproduce this with Go 1.13, so this seems like a regression.

Tentatively marking release blocker, since this seems like it could be a subtle runtime and/or cgo regression.

I'm going to try bisecting to see if I can figure out what commit caused the problem.

@mdempsky mdempsky added this to the Go1.14 milestone Oct 31, 2019
@mdempsky mdempsky added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Oct 31, 2019
@mdempsky
Copy link
Member Author

mdempsky commented Oct 31, 2019

I ran a bunch of times at master to see how long it takes to hang. Here's a record of the number of fuzz iterations before hang:

118784
274821
97466
41172
705368
374579
773952
135061
4180
34428
520221
199902
241640
306002
25566
160724
359761
205918
26332
518383
99168
316188

I would guess it's exponentially distributed?

Running until 2M iterations seems like a safe threshold for bisection.

Edit: Using a 1.5M iteration threshold because I'm impatient, and estimating the exponential distribution based on the above samples suggests to me the chance of 1.5M iterations without failure is 0.25%. (But I'm no statistician.)

@mdempsky
Copy link
Member Author

/cc @aclements

Still bisecting, but it seems to be narrowing down on one of the CLs related to #10958 / #24543.

@ianlancetaylor
Copy link
Contributor

Honestly I wouldn't be surprised if the culprit CL turns out to be https://golang.org/cl/171883, which turns on the new timers. I suspect there may be some case we are failing to handle.

It would be nice (for me) to hear that it was a different CL, though.

@mdempsky
Copy link
Member Author

@ianlancetaylor You look safe for now. My current bisect interval is from 316fb95 (good) to 6058603 (bad), and that CL looks like it was merged outside of that window.

@mdempsky
Copy link
Member Author

Git bisect identified 3f83411.

I double checked to make sure it does hang at that CL; and I'm up to 2M iterations without hanging for double checking the previous commit (which assuming I didn't mess up "git bisect bad" / "git bisect good" earlier should be at least 3.5M iterations in total).

@ianlancetaylor
Copy link
Contributor

CC @aclements @cherrymui

@mdempsky
Copy link
Member Author

mdempsky commented Nov 1, 2019

I've been able to minimize the failure below, though it seems to reproduce the issue somewhat less reliably than actually using libfuzzer.

The program should run forever printing increasing numbers, but occasionally it hangs.

$ go build -buildmode=c-archive -o fuzz.a fuzz.go
$ cc -o driver driver.c fuzz.a -lpthread
$ ./driver

(Edit: See better repro below.)

$ cat fuzz.go
package main

// #include <stdint.h>
import "C"

import (
	"unsafe"
)

//export LLVMFuzzerTestOneInput
func LLVMFuzzerTestOneInput(data *C.uint8_t, size C.size_t) C.int {
	s := (*[1 << 30]byte)(unsafe.Pointer(data))[:size:size]

	for _, x := range s {
		_ = make([]byte, x)
	}
	return 0
}

func main() {
}
$ cat driver.c
// +build ignore

#include <stdio.h>

#include "fuzz.h"

uint8_t data[1] = { 64 };

int main() {
  int i = 0;
  for (;;) {
    i++;
    if ((i % 4096) == 0) printf("#%d\n", i);
    LLVMFuzzerTestOneInput(data, sizeof(data));
  }
}

@mdempsky
Copy link
Member Author

mdempsky commented Nov 1, 2019

Simpler, much more reliable repro:

$ cat fuzz.go
package main

import "C"

var sink []byte

//export Fuzz
func Fuzz() {
	sink = make([]byte, 4096)
}

func main() {
}
$ cat driver.c
// +build ignore

#include <stdio.h>

#include "fuzz.h"

int main() {
  int i = 0;
  for (;;) {
    i++;
    if ((i % 256) == 0) printf("#%d\n", i);
    Fuzz();
  }
}

@mdempsky
Copy link
Member Author

mdempsky commented Nov 1, 2019

I'm not able to make any more progress on this. The last repro is very reliable, but I don't sufficiently understand the runtime preemption logic, and my naive debugging skills aren't sufficient here.

@mdempsky mdempsky changed the title runtime: dvyukov/go-fuzz -libfuzzer timeouts at master, but not Go 1.13 runtime: preemption-related deadlock when calling Go from C Nov 1, 2019
@ianlancetaylor
Copy link
Contributor

I found the problem. Working on adding a test.

@gopherbot
Copy link

Change https://golang.org/cl/204957 mentions this issue: runtime: clear preemptStop in dropm

@golang golang locked and limited conversation to collaborators Nov 1, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker
Projects
None yet
Development

No branches or pull requests

4 participants