Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: musl libc setxid/setgroups signals clobber stacks / do not use SA_ONSTACK #39857

Open
nmeum opened this issue Jun 25, 2020 · 31 comments
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@nmeum
Copy link

nmeum commented Jun 25, 2020

This is a follow up to #39343 where I already briefly mentioned this problem. This issue is probably related to musl libc I can reliably reproduce it on Alpine Linux which uses musl.

What version of Go are you using (go version)?

$ go version
go version go1.14.3 linux/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/soeren/.cache/go-build"
GOENV="/home/soeren/.config/go/env"
GOEXE=""
GOFLAGS="-buildmode=pie"
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/soeren/src/go"
GOPRIVATE=""
GOPROXY="direct"
GOROOT="/usr/lib/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/lib/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build802004994=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Started the TestCrossPackageTests from misc/cgo/test/pkg_test.go:

misc/cgo/test$ go test -run TestCrossPackageTests

What did you expect to see?

A successful test run.

What did you see instead?

An error message:

--- FAIL: TestCrossPackageTests (1.95s)
    pkg_test.go:67: go test: exit status 1
        --- FAIL: Test9400 (0.00s)
            issue9400_linux.go:55: entry 804 of test pattern is wrong; 0x7fc60b3b0cf4 != 0x123456789abcdef
        FAIL
        exit status 1
        FAIL	cgotest	0.005s
FAIL
exit status 1
FAIL	misc/cgo/test	1.958s
@cagedmantis cagedmantis changed the title TestCrossPackageTests fails on musl (Alpine Linux Edge) cmd/go: TestCrossPackageTests fails on musl (Alpine Linux Edge) Jun 29, 2020
@cagedmantis cagedmantis added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jun 29, 2020
@cagedmantis cagedmantis added this to the Backlog milestone Jun 29, 2020
@jayconrod jayconrod changed the title cmd/go: TestCrossPackageTests fails on musl (Alpine Linux Edge) cmd/cgo: TestCrossPackageTests fails on musl (Alpine Linux Edge) Jul 24, 2020
@oflebbe
Copy link

oflebbe commented Aug 8, 2020

I didn't run any of the tests, but read a bunch of comments, since this seems an interesting problem...

As far as I understood: test9400 is checking if the handling of setxid (i.e. setuid, setgid, ...) class of system calls may smash the go stack, as setxid() is implemented by sending signals to all threads to fullfill POSIX requirements. (see linux man setgid)

In order to prevent stack overrun by signals one usually creates an alternate signal stack and providing the SA_ONSTACK while installing signal handlers, to use the alternate stack. This way one cannot overrun the original stack.

glibc doesn't set it, but it installs a signal handler for SIGSETXID at startup in nptl-init.c . In PR#9400 therefore it is possible to enumerate all signal handlers and add a missing SA_ONSTACK flag, fixing the issue on glibc.

musl doesn't implement an alternate stack and SA_ONSTACK for their internal signal implementation of setxid either. This is actually confirmed by @richfelker in a somewhat related issue #19938 (comment) . Unfortunately the fix of #9400 doesn't apply to musl, since the signal handler is dynamically installed when setxid is called by the __synccall() function in src/thread/synccall.c of musl .

I would vote for adding SA_ONSTACK to musl's __synccall implementation.

@richfelker
Copy link

There's a thread from 2019 on this topic: sigaltstack for implementation-internal signals? that never reached a conclusion. Basically I'm unclear whether it's arguably conforming for the implementation to use the alternate signal stack for implementation-internal signals, since it may have observable side effects on the application in the absence of any signals/signal-handlers setup to run on the alternate stack.

@ianlancetaylor
Copy link
Contributor

If musl doesn't use SA_ONSTACK for the signal handler that it installs, that won't work with any program that uses sigaltstack.

I don't see any way to fix this in Go. I don't see what change we could make that would make things work better.

@richfelker
Copy link

@ianlancetaylor I don't follow how it "doesn't work with any program that uses sigaltstack". It just doesn't work with any program that has a severely undersized stack, which is a known constraint. But it would be nice to be able to use the alt stack when it's available, assuming it's more likely to always have sufficient space for the signal handler to run.

@richfelker
Copy link

Note that I've reopened the topic on the musl list: https://www.openwall.com/lists/musl/2020/08/09/1

@oflebbe
Copy link

oflebbe commented Aug 10, 2020

Hi @richfelker , I tested: It is sufficient to have a patch like this on musl
PATCH.txt
to resolve. This will fix go, and will not harm musl.

If there is an alternate stack, it will use it. Go does create an alternate stack.
If there is no alternate stack, kernel will ignore SA_ONSTACK. That's it.

@ianlancetaylor
Copy link
Contributor

@richfelker I'm assuming that a program that calls sigaltstack does so for some good reason. I'm not sure why a program that calls sigaltstack would want to receive signals on the normal stack.

@richfelker
Copy link

Nobody said anything about wanting to receive signals on the normal stack. From the relevant perspective these aren't signals. They are asynchronous use of the alt signal stack by the implementation in a way the application isn't and can't be aware of.

@ianlancetaylor
Copy link
Contributor

That is a valid perspective.

But it also a valid perspective for a program to say "I am in control of my stack, and do not use my stack for any unexpected purpose. In particular, don't use it to catch signals."

In any case I'm not sure there is anything we can do here in the Go standard library. If musl decides not to change, then as far as I can see code like this can't work on musl. So perhaps we should close this issue.

@richfelker
Copy link

@ianlancetaylor: I have been wanting to change this for a while (see the 2019 thread), but I'm making sure we actually consider the consequences of such a change and whether they break anything that someone can reasonably expect to work. (My leaning is that they don't, but I like to explore this kind of thing thoroughly since making hasty decisions has bitten us in the past.) The point of my bringing these things up is not to argue against the change, but to make sure it's well-supported when (technically if, but most likely when) it's made.

@ianlancetaylor
Copy link
Contributor

Understood.

(I suppose musl could also change to act as glibc does. Is there an advantage to only installing the signal handler when a relevant libc function is called?)

@richfelker
Copy link

Yes, it avoids syscall spam (strace) and wasted time in processes (the vast, vast majority) that don't need the handler. And I don't see how the glibc behavior makes it any easier unless you're poking at implementation internals which are not a stable interface. The signal numbers used for these internal signals are not a public interface, and they're not even pokable via public interfaces (as far as the public interfaces are concerned, the reserved signal numbers simply are not existant signals). The only way you can poke at them is via directly making syscalls, and this will break if signal handling is ever wrapped (which has been considered at times, but turned out we could always get by without it).

@dmitshur dmitshur changed the title cmd/cgo: TestCrossPackageTests fails on musl (Alpine Linux Edge) misc/cgo/test: TestCrossPackageTests fails on musl (Alpine Linux Edge) Jan 26, 2021
jspc added a commit to vinyl-linux/vin-packages-stable that referenced this issue Feb 12, 2021
From: golang/go#39857

This test always fails on musl, because certain things behave differently to glibc. So... sod it, ignore the test
@gopherbot
Copy link

Change https://go.dev/cl/419995 mentions this issue: all: disable tests that fail on Alpine

gopherbot pushed a commit that referenced this issue Aug 2, 2022
These changes are enough to pass all.bash using the
disabled linux-amd64-alpine builder via debugnewvm.

For #19938.
For #39857.

Change-Id: I7d160612259c77764b70d429ad94f0864689cdce
Reviewed-on: https://go-review.googlesource.com/c/go/+/419995
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
jproberts pushed a commit to jproberts/go that referenced this issue Aug 10, 2022
These changes are enough to pass all.bash using the
disabled linux-amd64-alpine builder via debugnewvm.

For golang#19938.
For golang#39857.

Change-Id: I7d160612259c77764b70d429ad94f0864689cdce
Reviewed-on: https://go-review.googlesource.com/c/go/+/419995
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
@prattmic prattmic changed the title misc/cgo/test: TestCrossPackageTests fails on musl (Alpine Linux Edge) runtime: musl libc setxid/setgroups signals clobber stacks / do not use SA_ONSTACK Aug 19, 2022
@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Aug 19, 2022
@prattmic
Copy link
Member

I rediscovered this issue and posted a summary at #54306 (comment) before discovering this issue, where I came to more-or-less the same conclusions as the discussion in this thread.

@richfelker regarding #39857 (comment), did you ever make any progress on the 2019 thread? It is rather unfortunate to have to tell users that using setxid functions with musl+Go is broken and will cause random memory corruption.

@gopherbot
Copy link

Change https://go.dev/cl/425001 mentions this issue: misc/cgo/test: disable setgid tests with musl

@richfelker
Copy link

Thanks for the ping. I don't think we ever came up with a good reason we can't use the alt stack for this, so I'm inclined to go ahead and switch to using it.

@richfelker
Copy link

FYI https://git.musl-libc.org/cgit/musl/commit/?id=2e5fff43dd7fc808197744c67cca7908ac19bb4f switched musl to using the alt stack for internal signals and will be present in future releases starting with 1.2.4.

gopherbot pushed a commit that referenced this issue Nov 7, 2022
Alpine has a known issue where setgid clobbers the Go stack (#39857).
misc/cgo/test skips other tests that use setgid on Alpine, but not
this one. It's not clear to me why this test *used to* pass, but when
I refactored misc/cgo/test in CL 447355 it started failing.

Disable this test on Alpine, like the other setgid tests.

Change-Id: I2e646ef55e2201a4f0b377319d719a011ec847f0
Reviewed-on: https://go-review.googlesource.com/c/go/+/448355
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
@nmeum
Copy link
Author

nmeum commented Jun 7, 2023

FYI https://git.musl-libc.org/cgit/musl/commit/?id=2e5fff43dd7fc808197744c67cca7908ac19bb4f switched musl to using the alt stack for internal signals and will be present in future releases starting with 1.2.4.

On Alpine 3.18 (which ships musl 1.2.4) these tests seem to pass now on the architectures I have tested so far (tested by reverting 42105eb). Is there a change that these could be enabled again?

@richfelker
Copy link

On Alpine 3.18 (which ships musl 1.2.4) these tests seem to pass now on the architectures I have tested so far (tested by reverting 42105eb). Is there a change that these could be enabled again?

Excellent news. Glad to know the changes had the desired effect!

algitbot pushed a commit to alpinelinux/aports that referenced this issue Jun 7, 2023
The underlying issue has been fixed in musl 1.2.4.

See:

• golang/go#39857 (comment)
* golang/go@42105eb
@nbryant42
Copy link

Unfortunately I am getting other breakage on Alpine 3.18, and had to revert to 3.17 due to a SIGSEGV, so an update to 3.18 might not work for everyone just yet.

(This issue here doesn't affect me, so I'm fine, just thought I'd mention.)

bell-sw pushed a commit to bell-sw/alpaquita-aports that referenced this issue Jun 19, 2023
[ commit affd27c5e84301e444527a1b4a87107ed454c382 ]

The underlying issue has been fixed in musl 1.2.4.

See:

• golang/go#39857 (comment)
* golang/go@42105eb
@bcmills
Copy link
Contributor

bcmills commented Aug 16, 2023

@nmeum, @golang/release: would it make sense to upgrade the linux-x86-alpine builder to Alpine 3.18 or higher so that this test-skip can be removed?

It looks like the builder is currently on Alpine 3.16 and maintained by the Go team.

@bcmills
Copy link
Contributor

bcmills commented Aug 16, 2023

Alternately: @nmeum, is it feasible for the test to detect the Alpine or musl version in use and only skip on versions that predate the fix?

@nmeum
Copy link
Author

nmeum commented Aug 21, 2023

Alternately: @nmeum, is it feasible for the test to detect the Alpine or musl version in use and only skip on versions that predate the fix?

You can check the apk info musl output or the contents of /etc/os-release but that's obviously somewhat hacky…

@cagedmantis cagedmantis self-assigned this Aug 22, 2023
@gopherbot
Copy link

Change https://go.dev/cl/521975 mentions this issue: env/linux-x86-alpine: update version of alpine to 3.18

gopherbot pushed a commit to golang/build that referenced this issue Aug 22, 2023
Updates golang/go#39857

Change-Id: Ibb2f27947d9f5f5856862dc78465215964850e95
Reviewed-on: https://go-review.googlesource.com/c/build/+/521975
Auto-Submit: Carlos Amedee <carlos@golang.org>
Reviewed-by: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
@cagedmantis cagedmantis removed their assignment Sep 5, 2023
@cagedmantis
Copy link
Contributor

The builder was updated to Alpine 3.18 in go.dev/cl/521975.

@heschi heschi closed this as completed Sep 19, 2023
@bcmills
Copy link
Contributor

bcmills commented Sep 19, 2023

I don't think this is quite completed — there are still skips in the codebase referring to this issue:
https://cs.opensource.google/go/go/+/master:src/cmd/cgo/internal/test/cgo_linux_test.go;l=20;drc=a3f69c778fa6c082d38f9cdfacab27714f08a343

Those skips should be removed before the issue is closed to ensure that this does not regress.

@bcmills bcmills reopened this Sep 19, 2023
@dmitshur dmitshur added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Sep 19, 2023
@dmitshur dmitshur modified the milestones: Backlog, Go1.22 Sep 19, 2023
algitbot pushed a commit to alpinelinux/aports that referenced this issue Sep 22, 2023
Apart from Test9400 there are additional tests which were disabled
on Alpine and should now just pass with recent musl libc.

See: golang/go#39857 (comment)
@nmeum
Copy link
Author

nmeum commented Sep 22, 2023

Thanks for pointing that out, I wasn't aware that additional tests were disabled because of this. Just FYI: I just enable these tests again for our Alpine Linux downstream package and they pass on all of our architectures.

bell-sw pushed a commit to bell-sw/alpaquita-aports that referenced this issue Sep 24, 2023
[ commit 68b45368dd53d570477e8c14cef83a33a2b2d886 ]

Apart from Test9400 there are additional tests which were disabled
on Alpine and should now just pass with recent musl libc.

See: golang/go#39857 (comment)
@gopherbot gopherbot modified the milestones: Go1.22, Go1.23 Feb 6, 2024
@nmeum
Copy link
Author

nmeum commented Feb 9, 2024

Just FYI: I just enable these tests again for our Alpine Linux downstream package and they pass on all of our architectures.

Ping. Any chances these tests could be enabled upstream again as well?

@gopherbot
Copy link

Change https://go.dev/cl/563096 mentions this issue: cmd/cgo/internal/test: don't skip some tests on musl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsFix The path to resolution is known, but the work has not been done.
Projects
Status: In Progress
Archived in project
Development

No branches or pull requests