Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: possible memory corruption caused by CL 304470 "cmd/compile, runtime: add metadata for argument printing in traceback" #49075

Closed
katiehockman opened this issue Oct 19, 2021 · 169 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Milestone

Comments

@katiehockman
Copy link
Contributor

OSS-Fuzz reported an issue a few weeks ago that we suspect is memory corruption caused by the runtime. This started on August 16th, so is likely a Go 1.17 issue.

A slice bounds out of range issue is being reported from calls to regexp.MustCompile(,\s*).Split
However, this is not reproducible with the inputs provided by OSS-Fuzz, so we expect something else is going on.

Below are some of the panic logs:

panic: runtime error: slice bounds out of range [:18416820578376] with length 59413

goroutine 17 [running, locked to thread]:
regexp.(*Regexp).Split(0x10c0000b2640, {0x10c0001c801f, 0x76dbc0}, 0xffffffffffffffff)
	regexp/regexp.go:1266 +0x61c
github.com/google/gonids.(*Rule).option(0x10c000068000, {0x100c000096970, {0x10c0001c8016, 0x8}}, 0x10c00029a040)
	github.com/google/gonids/parser.go:675 +0x36cf
github.com/google/gonids.parseRuleAux({0x10c0001c8000, 0x630000350400}, 0x0)
	github.com/google/gonids/parser.go:943 +0x6b3
github.com/google/gonids.ParseRule(...)
	github.com/google/gonids/parser.go:972
github.com/google/gonids.FuzzParseRule({0x630000350400, 0x0, 0x10c000000601})
	github.com/google/gonids/fuzz.go:20 +0x54
main.LLVMFuzzerTestOneInput(...)
	./main.1689543426.go:21

panic: runtime error: slice bounds out of range [628255583:13888]

goroutine 17 [running, locked to thread]:
regexp.(*Regexp).Split(0x10c0000b2640, {0x10c00033601f, 0x76dbc0}, 0xffffffffffffffff)
	regexp/regexp.go:1266 +0x617
github.com/google/gonids.(*Rule).option(0x10c00026cc00, {0x100c00026e190, {0x10c000336016, 0x7}}, 0x10c0001a4300)
	github.com/google/gonids/parser.go:675 +0x36cf
github.com/google/gonids.parseRuleAux({0x10c000336000, 0x62f00064a400}, 0x0)
	github.com/google/gonids/parser.go:943 +0x6b3
github.com/google/gonids.ParseRule(...)
	github.com/google/gonids/parser.go:972
github.com/google/gonids.FuzzParseRule({0x62f00064a400, 0x0, 0x10c000000601})
	github.com/google/gonids/fuzz.go:20 +0x54
main.LLVMFuzzerTestOneInput(...)
	./main.1689543426.go:21
AddressSanitizer:DEADLYSIGNAL

panic: runtime error: slice bounds out of range [473357973:29412]

goroutine 17 [running, locked to thread]:
regexp.(*Regexp).Split(0x10c0000b2640, {0x10c0002a001f, 0x76dbc0}, 0xffffffffffffffff)
	regexp/regexp.go:1266 +0x617
github.com/google/gonids.(*Rule).option(0x10c0001b0180, {0x100c000280100, {0x10c0002a0016, 0xb}}, 0x10c0001ae040)
	github.com/google/gonids/parser.go:675 +0x36cf
github.com/google/gonids.parseRuleAux({0x10c0002a0000, 0x632000930800}, 0x0)
	github.com/google/gonids/parser.go:943 +0x6b3
github.com/google/gonids.ParseRule(...)
	github.com/google/gonids/parser.go:972
github.com/google/gonids.FuzzParseRule({0x632000930800, 0x0, 0x10c000000601})
	github.com/google/gonids/fuzz.go:20 +0x54
main.LLVMFuzzerTestOneInput(...)
	./main.1689543426.go:21

From rsc@:

The relevant code is processing the [][]int returned from regexp.(*Regexp).FindAllStringIndex.
That [][]int is prepared by repeated append:

func (re *Regexp) FindAllStringIndex(s string, n int) [][]int {
    if n < 0 {
        n = len(s) + 1
    }
    var result [][]int
    re.allMatches(s, nil, n, func(match []int) {
        if result == nil {
            result = make([][]int, 0, startSize)
        }
        result = append(result, match[0:2])
    })
    return result
}

Each of the match[0:2] being appended is prepared in regexp.(*Regexp).doExecute by:

dstCap = append(dstCap, m.matchcap...)

appending to a zero-length, non-nil slice to copy m.matchcap.

And each of the m.matchcap is associated with the *regexp.machine m, which is kept in a sync.Pool for reuse.

The specific corruption is that the integers in the [][]int are clear non-integers (like pointers),
which suggests that either one of the appends is losing the reference accidentally during GC
or something in sync.Pool is wonky.

This could also be something strange that OSS-Fuzz is doing, and doesn't necessarily represent a real-world use case.

/cc @golang/security

@katiehockman katiehockman added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Oct 19, 2021
@randall77
Copy link
Contributor

How often does this happen? Is there any way we could reproduce?
If reproducible, we could turn off the sync.Pool usage and see if anything changes.

@cherrymui cherrymui added this to the Go1.18 milestone Oct 19, 2021
@rsc
Copy link
Contributor

rsc commented Oct 20, 2021

@catenacyber says it is still happening multiple times a day on OSS-Fuzz.

Philippe, do you have any hints about an easy way to get a reproduction case on our own machines?
I looked briefly at the instructions for running oss-fuzz itself and they were a bit daunting.

@catenacyber
Copy link
Contributor

How often does this happen?

Around 50 times a day on oss-fuzz

Is there any way we could reproduce?

I did not manage to reproduce it myself, did not try very hard though...

I looked briefly at the instructions for running oss-fuzz itself and they were a bit daunting.

Well, the hard thing is that this bug does not reproduce for a specific input.
But running it should be ok cf https://google.github.io/oss-fuzz/getting-started/new-project-guide/#testing-locally
That is

  • install docker
  • cd /path/to/oss-fuzz
  • python infra/helper.py build_image gonids
  • python infra/helper.py build_fuzzers gonids
  • python infra/helper.py run_fuzzer --corpus-dir=<path-to-temp-corpus-dir> gonids fuzz_parserule

Then, I guess you need to wait one hour, and relaunch the fuzzer if it did not trigger the bug, until it does

we could turn off the sync.Pool usage and see if anything changes.

Is there some environment variable to do so ?

@catenacyber
Copy link
Contributor

Maybe oss-fuzz uses -fork=2 as an extra argument to run_fuzzer

@randall77
Copy link
Contributor

we could turn off the sync.Pool usage and see if anything changes.

Is there some environment variable to do so ?

No, you'd have to edit the code to replace pool allocations with new or make.

@catenacyber
Copy link
Contributor

Is there some environment variable to do so ?

No, you'd have to edit the code to replace pool allocations with new or make.

so rebuild the standard library ?

@randall77
Copy link
Contributor

Just edit it, rebuild will be automatic.

@catenacyber
Copy link
Contributor

So, I did google/oss-fuzz#6623 with regex.go not using sync package

@rsc
Copy link
Contributor

rsc commented Oct 20, 2021

google/oss-fuzz#6623 looks worth a shot. Thanks.

@catenacyber
Copy link
Contributor

It looks like the bug is still happening but much less often.

One last stack trace is

panic: runtime error: slice bounds out of range [:107271103185152] with length 45246

goroutine 17 [running, locked to thread]:
regexp.(*Regexp).Split(0x10c0000b2640, {0x10c00010801f, 0x76dbc0}, 0xffffffffffffffff)
	regexp/regexp.go:1260 +0x61c
github.com/google/gonids.(*Rule).option(0x10c00034a180, {0x100c0007c4350, {0x10c000108016, 0x6}}, 0x10c000334080)
	github.com/google/gonids/parser.go:675 +0x36cf
github.com/google/gonids.parseRuleAux({0x10c000108000, 0x62e0000d8400}, 0x0)
	github.com/google/gonids/parser.go:943 +0x6b3
github.com/google/gonids.ParseRule(...)
	github.com/google/gonids/parser.go:972
github.com/google/gonids.FuzzParseRule({0x62e0000d8400, 0x0, 0x10c000000601})
	github.com/google/gonids/fuzz.go:20 +0x54
main.LLVMFuzzerTestOneInput(...)
	./main.3230035416.go:21
AddressSanitizer:DEADLYSIGNAL

regexp.go:1260 seems to prove that this is the modified regex.go file without sync right ?

Any more clues ?
Any more debug assertions to insert in regexp.go ?

@josharian
Copy link
Contributor

This started on August 16th, so is likely a Go 1.17 issue.

Could you bisect to a particular commit that introduced the corruption?

@catenacyber
Copy link
Contributor

Could you bisect to a particular commit that introduced the corruption?

oss-fuzz uses latest Golang release, so they switched from 1.16.x to 1.17 on August 16th, but we do not know which commit exactly in this major release induced the buggy behavior...

@randall77
Copy link
Contributor

I have another possible theory:

regexp.(*Regexp).Split(0x10c0000b2640, {0x10c0001c801f, 0x76dbc0}, 0xffffffffffffffff)
	regexp/regexp.go:1266 +0x61c

Is there any way you can get at the regexp and the contents of the string? In this case, 0x10c0000b2640 is the address of a Regexp, the first field of which is a string I'd like to see. Also, we'd need the very long string at address 0x10c0001c801f for 0x76dbc0 bytes (7MB+!).
If you have any way for us to reproduce that execution state, we could grab those values ourselves.

My theory is that regexp is using the code in internal/bytealg pretty hard, and maybe it's tripping up on some weird corner case. The string pointers always end in 0x1f, which is maybe one of those corner cases. That might explain the intermittency - if the same test case gets allocated in a different place, it might not trigger the bug.
Not sure why we would hit it only for 1.17. https://go-review.googlesource.com/c/go/+/310184 seems to be the only CL of note in internal/bytealg for 1.17, and it looks reasonable to me.

@catenacyber
Copy link
Contributor

Put the hex encoded version of one string here :
https://catenacyber.fr/stringhex.txt

@dgryski
Copy link
Contributor

dgryski commented Oct 22, 2021

If it's an alignment issue, could we try to reproduce by mmap'ing a large block and creating a string with all the different possible alignments / ending pointer bytes?

@randall77
Copy link
Contributor

What is the regexp that is being used?

@catenacyber
Copy link
Contributor

regexp.MustCompile(,\s*).Split(fuzzedinput, -1)

@catenacyber
Copy link
Contributor

So, there are kinds of a lot of answers to split....

@randall77
Copy link
Contributor

Nothing obvious with the string+regexp you posted. Putting it at different alignments doesn't seem to trigger anything.
The string you posted doesn't seem long enough, though. It is only 227927 bytes, and all the failure traces above have a string that is 0x76dbc0 = 7789504 bytes.

@catenacyber
Copy link
Contributor

This string posted comes with stack trace

panic: runtime error: slice bounds out of range [699494522:64250]
--
  |  
  | goroutine 17 [running, locked to thread]:
  | regexp.(*Regexp).Split(0x10c0000b2640, {0x10c000ac201f, 0x962c00}, 0xffffffffffffffff)
  | regexp/regexp.go:1266 +0x617
  | github.com/google/gonids.(*Rule).option(0x10c000066000, {0x0, {0x10c000ac2016, 0x10c000044000}}, 0x10c0001dc000)
  | github.com/google/gonids/parser.go:675 +0x36c5
  | github.com/google/gonids.parseRuleAux({0x10c000ac2000, 0x37a78}, 0x0)
  | github.com/google/gonids/parser.go:942 +0x6b3
  | github.com/google/gonids.ParseRule(...)
  | github.com/google/gonids/parser.go:971
  | github.com/google/gonids.FuzzParseRule({0x7f369cb75800, 0x0, 0x10c000000601})
  | github.com/google/gonids/fuzz.go:20 +0x54
  | main.LLVMFuzzerTestOneInput(...)
  | ./main.3953748960.go:21

@catenacyber
Copy link
Contributor

When running manually, it seems my string is always aligned on 16 bytes.like 0x10c0001185a0

@randall77
Copy link
Contributor

That doesn't seem right. The string posted is only 227927 bytes, but the stack trace shows that is is 9841664 bytes.

To unalign a string, do:

s2 := (strings.Repeat("*", i) + s)[i:]

@catenacyber
Copy link
Contributor

So, the string must have been corrupted before the call to regexp.(*Regexp).Split

@randall77
Copy link
Contributor

I'm not sure I understand. How did you get the string, if not just dumping the 0x962c00 bytes starting at address 0x10c000ac201f ?

@catenacyber
Copy link
Contributor

I'm not sure I understand. How did you get the string, if not just dumping the 0x962c00 bytes starting at address 0x10c000ac201f ?

oss-fuzz provides the stack trace, and the input that triggered it.
The input that triggers it, when run over gonids.FuzzParseRule ends up once at gonids.(*Rule).option to call regexp.(*Regexp).Split and that is where I got the string

@randall77
Copy link
Contributor

Hm, then I guess the values in the stack traces are incorrect. Which can happen, especially with regabi introduced in 1.17.
I don't have any ideas on how to figure out the problem without bisecting through the 1.17 commits.
I am unable to reproduce on my laptop. It never crashes for me...
Speaking of which, it's always possible that it is a bad machine. Have you reproduced on different machines?

Another thing to try, run with GODEBUG=cpu.all=off. That will disable most of the complicated bytealg code.

@thepudds
Copy link
Contributor

thepudds commented Oct 24, 2021

I am unable to reproduce on my laptop. It never crashes for me...

FWIW, I was not able to reproduce either using the oss-fuzz local execution steps on a clean Linux VM (amd64, Ubuntu 20.04) following the directions outlined above in #49075 (comment), with 3 separate runs that totaled about 24 hours.

I also ran for about 24 hours using dvyukov/go-fuzz against the same gonids fuzz target, which also did not crash (although I don't think there is an enormous amount of signal from that result, given libFuzzer and go-fuzz have different process restart strategies, different mutations, different propensity for creating large inputs, and so on).

@catenacyber some questions for you:

  1. I am curious if you are able to reproduce in a clean environment using those steps you outlined for local execution?

  2. I wonder if some additional steps might be needed, such as perhaps downloading a snapshot of the oss-fuzz working corpus (vs. if I followed, the oss-fuzz local execution steps you outlined above results in only using a seed corpus downloaded from https://rules.emergingthreats.net?).

  3. I wonder if -max_len needs to be set, or alternatively, perhaps if the live oss-fuzz working corpus has larger inputs that push libFuzzer to use larger examples? Following the steps above, my local execution currently reports:

INFO: -max_len is not provided; libFuzzer will not generate inputs larger 
than 4096 bytes
INFO: seed corpus: files: 67174 min: 1b max: 3228b total: 19520296b rss: 57Mb

That said, it might be that size of inputs in the corpus is not meaningful if this turns out to be due to some corruption.

  1. If I followed, it looks like the live gonids Dockerfile on the oss-fuzz repo is using the attempted workaround of removing use of sync.Pool, which I think you said resulted in "It looks like the bug is still happening but much less often." If Keith or others are trying to reproduce, it might make sense to not use that attempted workaround when trying to reproduce locally in order to increase frequency?

In any event, sorry if anything I wrote is off base, but curious for your thoughts.

@randall77
Copy link
Contributor

Do I understand correctly that the libfuzzer signal handler overflows the golang stack and overflows unrelated nearby memory ?

Yes.

Would having a guard page after a golang stack help debug this ? (by making reproducing easier)

Yes. The hacking I described has a similar effect.

Or setting up another signal handler to backtrace the second SEGV ?

I'm not sure what the second SEGV is here. There is a SIGALRM, and handling that silently overwrites the heap. When the kinda-guard-page is hacked in, handling the SIGALRM instead triggers a SIGSEGV.

cf https://github.com/catenacyber/nallocfuzz/blob/main/nallocfuzz.c#L112 for an example to do this without recompiling all llvm

I'm not sure what that is showing me. I can change the fuzz sources no problem, I just don't know how to include those changes when building the reproducer. aka, how do I replace the built-in libFuzzer.a with my own libFuzzer.a? Just putting a -L early in the command line didn't seem to work.

I have LLVM built now, but for some reason it isn't built with the sanitizers (fuzzer, asan, ubsan, ...), so the repro complains that those libraries are missing. Trying to figure that out now. Currently trying this: https://stackoverflow.com/questions/56847345/how-to-compile-clang-with-sanitizers

@catenacyber
Copy link
Contributor

Would having a guard page after a golang stack help debug this ? (by making reproducing easier)

Yes. The hacking I described has a similar effect.

Having a guard page after stack allows earlier catching, right ?

Or setting up another signal handler to backtrace the second SEGV ?

I'm not sure what the second SEGV is here. There is a SIGALRM, and handling that silently overwrites the heap. When the kinda-guard-page is hacked in, handling the SIGALRM instead triggers a SIGSEGV.

The heap or the stack ?

cf https://github.com/catenacyber/nallocfuzz/blob/main/nallocfuzz.c#L112 for an example to do this without recompiling all llvm

I'm not sure what that is showing me. I can change the fuzz sources no problem, I just don't know how to include those changes when building the reproducer. aka, how do I replace the built-in libFuzzer.a with my own libFuzzer.a? Just putting a -L early in the command line didn't seem to work.

Change -fsanitize=fuzzer with your own libFuzzer.a ;-)

I have LLVM built now, but for some reason it isn't built with the sanitizers (fuzzer, asan, ubsan, ...), so the repro complains that those libraries are missing. Trying to figure that out now. Currently trying this: https://stackoverflow.com/questions/56847345/how-to-compile-clang-with-sanitizers

I use oss-fuzz Docker images to do this cf
https://github.com/google/oss-fuzz/blob/master/infra/base-images/base-clang/checkout_build_install_llvm.sh

I would use a second signal handler to gather more debug information : this signal was triggered with this backtrace.

@randall77
Copy link
Contributor

So far so good, with this patch to libFuzzer:

--- a/compiler-rt/lib/fuzzer/FuzzerUtilPosix.cpp
+++ b/compiler-rt/lib/fuzzer/FuzzerUtilPosix.cpp
@@ -82,6 +82,7 @@ static void SetSigaction(int signum,
   // dedicated stack) in order to be able to detect stack overflows; keep the
   // flag if it's set.
   new_sigact.sa_flags = SA_SIGINFO | (sigact.sa_flags & SA_ONSTACK);
+  new_sigact.sa_flags |= SA_ONSTACK;
   new_sigact.sa_sigaction = callback;
   if (sigaction(signum, &new_sigact, nullptr)) {
     Printf("libFuzzer: sigaction failed with %d\n", errno);

I'm not sure that's a general fix, but for Go it's enough because Go does its own sigaltstack. Maybe everyone who cares does their own sigaltstack? In that case, this may be all that's required.

@randall77
Copy link
Contributor

Having a guard page after stack allows earlier catching, right ?

Yes, almost certainly. Having a mode where the Go runtime keeps guard pages at the end of goroutine stacks would be awesome.

The heap or the stack ?

The heap. It writes off the end of the stack into an earlier page, which is probably a general heap page. Could also be a different goroutine's stack, I suppose. All equally bad.

@catenacyber
Copy link
Contributor

Yes, almost certainly. Having a mode where the Go runtime keeps guard pages at the end of goroutine stacks would be awesome.

Does it not exist already ?
Looks like it would be useful...

So far so good, with this patch to libFuzzer:

It should then be :

--- a/compiler-rt/lib/fuzzer/FuzzerUtilPosix.cpp
+++ b/compiler-rt/lib/fuzzer/FuzzerUtilPosix.cpp
@@ -82,6 +82,7 @@ static void SetSigaction(int signum,
   // dedicated stack) in order to be able to detect stack overflows; keep the
   // flag if it's set.
-   new_sigact.sa_flags = SA_SIGINFO | (sigact.sa_flags & SA_ONSTACK);
+  new_sigact.sa_flags = SA_SIGINFO | SA_ONSTACK;
   new_sigact.sa_sigaction = callback;
   if (sigaction(signum, &new_sigact, nullptr)) {
     Printf("libFuzzer: sigaction failed with %d\n", errno);

But that looks dubious, like it will break another case.
Do you know why sigact.sa_flags & SA_ONSTACK is false ?

@randall77
Copy link
Contributor

Does it not exist already ?

It does not.

Do you know why sigact.sa_flags & SA_ONSTACK is false ?

It's the default. I think if we redesigned unix from scratch we'd always force signal handlers to have their own stack.

@ianlancetaylor
Copy link
Contributor

A bit of background: when running with the Go runtime, all signals must be installed with the SA_ONSTACK flag set. This is documented at https://pkg.go.dev/os/signal#hdr-Go_programs_that_use_cgo_or_SWIG . For a Go program (not -buildmode=c-archive) the Go runtime will normally install a signal handler for SIGALRM with SA_ONSTACK set. In which case there shouldn't be a problem here.

But if this program is using -buildmode=c-archive to build the Go program, then the Go runtime will not set a signal handler for SIGALRM. If there is no signal handler when the Go runtime is initialized, it will be left alone. If there is a signal handler, then the Go runtime will set the SA_ONSTACK flag for it.

So while I'm sure I'm missing something, right now the only way I can see that a problem can arise is if we are using -buildmode=c-archive, and the Go runtime is initialized, and then after that happens the fuzzer code installs a SIGALRM signal handler.

As a side note, setting SA_ONSTACK does no harm, as it only affects code if something calls sigaltstack for the thread. So the libfuzzer code may as well always set it.

@gopherbot
Copy link

Change https://go.dev/cl/494117 mentions this issue: runtime: fix misaligned SP for libfuzzer entry

@randall77
Copy link
Contributor

I'm somewhat sure my patch fixes the issue, although hard to be 100% sure with the low failure rate. If anyone else wants to run some overnight jobs, that would be great. Especially on 1.19/1.20, as I've been working at tip.

Patch for libFuzzer has been mailed: https://reviews.llvm.org/D150231

@randall77
Copy link
Contributor

So while I'm sure I'm missing something, right now the only way I can see that a problem can arise is if we are using -buildmode=c-archive, and the Go runtime is initialized, and then after that happens the fuzzer code installs a SIGALRM signal handler.

This is indeed the situation, Go initializes first.

gopherbot pushed a commit that referenced this issue May 10, 2023
libfuzzer is written in C and so requires by the C abi that SP be
aligned correctly mod 16. Normally CALLs need to have SP aligned to 0
mod 16, but because we're simulating a CALL (which pushes a return
address) with a JMP (which doesn't), we need to align to 8 mod 16
before JMPing.

This is not causing any current problems that I know of. All the
functions called from this callsite that I checked don't rely on
correct alignment.  So this CL is just futureproofing.

Update #49075

Change-Id: I13fcbe9aaf2853056a6d44dc3aa64b7db689e144
Reviewed-on: https://go-review.googlesource.com/c/go/+/494117
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
@catenacyber
Copy link
Contributor

Thanks for these inputs.

As I understand it, the problem lies rather in https://github.com/mdempsky/go114-fuzz-build ,the framework to link libFuzzer and golang: it should set a signaler handler that sets SA_ONSTACK
I am testing this : https://github.com/catenacyber/gonids/tree/repro/mainc

By the way, will the feature "guard page after a stack" be implemented ? It looks like ASAN for golang to me...
It would have been useful here, and may be in the future...

@randall77
Copy link
Contributor

As I understand it, the problem lies rather in https://github.com/mdempsky/go114-fuzz-build ,the framework to link libFuzzer and golang: it should set a signaler handler that sets SA_ONSTACK
I am testing this : https://github.com/catenacyber/gonids/tree/repro/mainc

I don't think that is the problem. libfuzzer would still be setting a SA_ONSTACK-less handler even if other handlers are registered correctly.

By the way, will the feature "guard page after a stack" be implemented ? It looks like ASAN for golang to me...
It would have been useful here, and may be in the future...

There is no plan for that at the moment. I'm happy to review the code if someone wants to take it on...

Normally the compiler+linker enforces that Go can't overflow the stack. Signals not on an alternate stack are the only way I know of that the Go stack can overflow.
Go has asan already. Not for Go stack overflows though, it is intended for the C parts of cgo.

@catenacyber
Copy link
Contributor

I don't think that is the problem. libfuzzer would still be setting a SA_ONSTACK-less handler even if other handlers are registered correctly.

I think the chaining of signal handlers solves the issue.
I will run some more tests, but setting a golang signal handler solves (or hides) the issue so far...

@randall77
Copy link
Contributor

I think the chaining of signal handlers solves the issue.

I can see that preventing the crash, if your signal.Notify call happens after the libFuzzer's sigaction call.

But won't that prevent delivery of the signals that libFuzzer wants? I presume signal.Notify eats the signal and libFuzzer just won't see that signal raised at all any more. So I guess the question is, does the timeout flag still work with your additional code? I think that's what libFuzzer is using signals for.

@randall77
Copy link
Contributor

I'm going to close this issue (and the backports). This probably needs a fix somewhere but it isn't in the Go repository.
We can reopen if we figure out some workaround that would require stdlib code.

@catenacyber
Copy link
Contributor

catenacyber commented May 11, 2023

your signal.Notify call happens after the libFuzzer's sigaction call.

I think it does

But won't that prevent delivery of the signals that libFuzzer wants?

Oh, is there a way for golang to pass signal to the next handler ?
Otherwise, I need to set up, instead of a golang signal handler, a new C signal handler with SA_ONSTACK and passing to the old signal handler (ie libFuzzer)

Cf https://github.com/catenacyber/gonids/tree/repro/pocsig
Build with

CC=clang go build -o fuzz.a -buildmode c-archive cfuzz.go
clang -fsanitize=address -c sigfuzz.c
clang++ -fsanitize=address sigfuzz.o fuzz.a -o fuzz_sig

Running ./fuzz_sig -> looks like C signal handlers reuse golang stack (0x10c000020090 versus 0x10c00006a680)

Running C_SA_ONSTACK=1 ./fuzz_sig -> looks like C signal handlers have a different stack than golang (0x10c00001a0a0 versus 0x7fb151cb2a00)

Running GO_SIGHANDLER=1 shows that golang does not relay the signal to the previous handler

TL;DR @randall77 should go114-fuzz-build be fixed by adding a C signal handler with SA_ONSTACK and relaying to the next signal handler ?

@ianlancetaylor
Copy link
Contributor

You don't need to install a new signal handler, you just need to reinstall the old signal handler with the SA_ONSTACK flag set. Call sigaction to get the old one, set SA_ONSTACK, and pass it back to sigaction.

@catenacyber
Copy link
Contributor

Indeed Ian, thanks.

So, you confirm that I cannot do that in pure Golang, and need to resort to C ?

And I cannot build a static library containing both C and go with only one go command, right ?
That is, in

CC=clang go build -o sigfuzz.a -buildmode c-archive sigfuzz.go
clang -fsanitize=address -c sigfuzz.c
clang++ -fsanitize=address sigfuzz.o sigfuzz.a -o fuzz_sig

I cannot compile sigfuzz.c and sigfuzz.go into sigfuzz.a in one command ?

@ianlancetaylor
Copy link
Contributor

So, you confirm that I cannot do that in pure Golang, and need to resort to C ?

You can do it in pure Go using syscall.Syscall and defining struct sigaction yourself.

Or, just to be clear, you can do it using cgo without any supporting C files, although I don't know if that meets your requirements.

And I cannot build a static library containing both C and go with only one go command, right ?

If you put your Go and C files into a single directory, then you can run go build -buildmode=c-archive in that directory and get a single Go archive that includes both the Go and the C files.

@catenacyber
Copy link
Contributor

defining struct sigaction yourself.

Not sure I see how to do that...

Thanks anyways :-)

@ianlancetaylor
Copy link
Contributor

To define struct sigaction yourself in Go, you could copy the various definitions of sigactiont in the runtime package.

catenacyber added a commit to catenacyber/oss-fuzz that referenced this issue May 16, 2023
libfuzzer needs to have its signal handler with SA_ONSTACK
because golang stacks are small
cf golang/go#49075
DonggeLiu pushed a commit to google/oss-fuzz that referenced this issue May 18, 2023
libfuzzer needs to have its signal handler with SA_ONSTACK because
golang stacks are small and not guarded against overflows

Waiting for the clang libfuzzer patch cf
https://reviews.llvm.org/D150231, we can fix the signal handlers with
the help of `LLVMFuzzerInitialize`

Long story : golang/go#49075

I hope it will improve the stability of all golang projects
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Projects
Development

No branches or pull requests