Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/link: panic: operation not permitted #41356

Closed
fogfish opened this issue Sep 12, 2020 · 25 comments
Closed

cmd/link: panic: operation not permitted #41356

fogfish opened this issue Sep 12, 2020 · 25 comments
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@fogfish
Copy link

fogfish commented Sep 12, 2020

What version of Go are you using (go version)?

$ go version
go version go1.15.2 linux/amd64

Does this issue reproduce with the latest release?

The issue exists in a family of 1.15.x releases, most probably it relates to linker changes introduced at https://golang.org/doc/go1.15#linker

Previous family of 1.14.x releases are not impacted by the issue

What operating system and processor architecture are you using (go env)?

go env Output
$ go env

GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/tmp"
GOENV=""
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/tmp/8c7ec6d123a7abf0769333196e0d26f470afdc59/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/tmp/8c7ec6d123a7abf0769333196e0d26f470afdc59"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/tmp/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/tmp/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build935346230=/tmp/go-build -gno-record-gcc-switches"

What did you do?

go run hw.go
// hw.go
package main

func main(){}

What did you expect to see?

No error

What did you see instead?

The following command fail at Amazon Linux only while running in the cloud

It fails with

# command-line-arguments
panic: operation not permitted
goroutine 1 [running]:
cmd/link/internal/ld.Main(0x870840, 0x20, 0x20, 0x1, 0x7, 0x10, 0x0, 0x0, 0x6da8ff, 0x1b, ...) | 
/usr/local/go/src/cmd/link/internal/ld/main.go:320 +0x21bd
main.main()
/usr/local/go/src/cmd/link/main.go:68 +0x1dc
...
go run -x hw.go


WORK=/tmp/go-build825787747
mkdir -p $WORK/b001/
cat >$WORK/b001/importcfg << 'EOF' # internal
# import config
packagefile runtime=/tmp/go/pkg/linux_amd64/runtime.a
EOF
cd /tmp
./go/pkg/tool/linux_amd64/compile -o ./go-build825787747/b001/_pkg_.a -trimpath "$WORK/b001=>" -p main -complete -buildid ehL3WBFzOeSpryH7yP9P/ehL3WBFzOeSpryH7yP9P -dwarf=false -goversion go1.15.2 -D _/tmp -importcfg ./go-build825787747/b001/importcfg -pack -c=2 ./test.go
/tmp/go/pkg/tool/linux_amd64/buildid -w $WORK/b001/_pkg_.a # internal
cp $WORK/b001/_pkg_.a /tmp/f8/f806946fd9e5455a99ccbdeaa0e34798d70aed6edb248123b8fd2df38d8bbda3-d # internal
cat >$WORK/b001/importcfg.link << 'EOF' # internal
packagefile command-line-arguments=$WORK/b001/_pkg_.a
packagefile runtime=/tmp/go/pkg/linux_amd64/runtime.a
packagefile internal/bytealg=/tmp/go/pkg/linux_amd64/internal/bytealg.a
packagefile internal/cpu=/tmp/go/pkg/linux_amd64/internal/cpu.a
packagefile runtime/internal/atomic=/tmp/go/pkg/linux_amd64/runtime/internal/atomic.a
packagefile runtime/internal/math=/tmp/go/pkg/linux_amd64/runtime/internal/math.a
packagefile runtime/internal/sys=/tmp/go/pkg/linux_amd64/runtime/internal/sys.a
EOF
mkdir -p $WORK/b001/exe/
cd .
/tmp/go/pkg/tool/linux_amd64/link -o $WORK/b001/exe/test -importcfg $WORK/b001/importcfg.link -s -w -buildmode=exe -buildid=Juw7E6kz4zUijtlWIqs0/ehL3WBFzOeSpryH7yP9P/3cKdpLiR9cWyq_AaprQT/Juw7E6kz4zUijtlWIqs0 -extld=gcc $WORK/b001/_pkg_.a
@cherrymui
Copy link
Member

cherrymui commented Sep 12, 2020

This line is the error when mmap failed. Does the machine you are using not support mmap?

We dropped the fallback path when mmap fails. Maybe we should add them back? @jeremyfaller

@cherrymui cherrymui changed the title internal/ld/main.go panic: operation not permitted cmd/link: panic: operation not permitted Sep 12, 2020
@jeremyfaller
Copy link
Contributor

jeremyfaller commented Sep 14, 2020

Seems to be the OS is returning a different error than one we've seen before. We check for "operation not supported", not "operation not permitted". This error seems related to using CONFIG_STRICT_DEVMEM in the kernel. Although this could result in a game of whack-a-mole with error codes, since this is the only currently reported error, I'll send a CL to also fail gracefully with that error. If any more come up, I'll come up with a better pattern.

** Thinking about this more, and digging into the code more. I think I will default to not failing. Reading about that flag makes me hesitant to accept I have all the failure cases in mind.

@cherrymui
Copy link
Member

The old linker uses a fallback path if mmap fails with any reason. Maybe we want to add that back?

@jeremyfaller
Copy link
Contributor

Yes, that's what I'm gonna do.

@cherrymui
Copy link
Member

SGTM. Thanks.

@gopherbot
Copy link

Change https://golang.org/cl/254777 mentions this issue: cmd/link: ignore mmap failures in the linker

@fogfish
Copy link
Author

fogfish commented Sep 14, 2020

Thank you!

@aclements
Copy link
Member

Wait, backing up a moment, I don't see at all what this would have to do with CONFIG_STRICT_DEVMEM. We're definitely not mmaping /dev/mem.

@aclements
Copy link
Member

@fogfish, it would be helpful to see an strace of the failing link. Could you run go build -work -x hw.go (-work will let you rerun build commands by hand), then copy-paste the WORK=... line at the top into your shell, then finally run strace -f -e mmap,open,openat <link command> with the failing link command and paste the output?

@fogfish
Copy link
Author

fogfish commented Sep 15, 2020

Thank you! I'll try. This is CI/CD instance, which is fully managed by AWS. There are no shell access to it.

@cherrymui
Copy link
Member

Yeah, taking another look, it seems this fails with EPERM, which is rather weird. Looking at the manpages, I don't think any of the EPERM cases of mmap or fallocate can happen in our case. Maybe ftruncate?

@aclements
Copy link
Member

@fogfish, thanks. You don't happen to know what file system it's running on, do you (or what your test instance from your initial report was running on)?

@fogfish
Copy link
Author

fogfish commented Sep 15, 2020

It runs on top of Linux Kernel 4.14.171-105.231.amzn1.x86_64

@fogfish
Copy link
Author

fogfish commented Sep 15, 2020

Let me try to reproduce it with given kernel on EC2

@prattmic
Copy link
Member

If you can run mount, we can also see any special mount options that may be in use.

w.r.t. strace, if you can't trace the WORK= command directly, strace -f -e mmap,open,openat,fallocate,ftruncate go build hw.go should work too, it will just give us a much longer trace to comb through.

@fogfish
Copy link
Author

fogfish commented Sep 15, 2020

The environment is hardened enough and I cannot use sudo, etc. I need more time to find a work around my limits.

strace: exit status 1
strace: test_ptrace_get_syscall_info: PTRACE_TRACEME: Operation not permitted
strace: ptrace(PTRACE_TRACEME, ...): Operation not permitted
+++ exited with 1 +++

@toothrot toothrot added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Sep 15, 2020
@toothrot toothrot added this to the Backlog milestone Sep 15, 2020
@toothrot toothrot added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Sep 15, 2020
@ianlancetaylor
Copy link
Contributor

Various sandbox environments that restrict the set of permitted system calls can cause a result of EPERM for any system call that is not permitted. Those environments often don't bother to figure out the "right" error to return.

@aclements
Copy link
Member

@ianlancetaylor , that's a good point. It wouldn't surprise me at all of fallocate in particular simply wasn't on the allow list.

@fogfish
Copy link
Author

fogfish commented Sep 17, 2020

Indeed fallocate failed: Operation not permitted. It also fails in docker for mounted filesystem but linker work in the docker.

@fogfish
Copy link
Author

fogfish commented Sep 18, 2020

Here is the all info, I've managed to collect about the environment

Linux version 4.14.177-104.253.amzn2.x86_64 (gcc version 7.3.1 20180712 (Red Hat 7.3.1-6) (GCC)) #1 SMP Fri May 1 02:01:13 UTC 2020

/dev/vdd /tmp ext4 rw,relatime,data=writeback 0 0

sh-4.2$ fallocate -l 1K a
fallocate: fallocate failed: Operation not permitted
sh-4.2$ fallocate -x -l 1K a
sh-4.2$


strace: exit status 1
strace: test_ptrace_get_syscall_info: PTRACE_TRACEME: Operation not permitted
strace: ptrace(PTRACE_TRACEME, ...): Operation not permitted
+++ exited with 1 +++

I can mock a simple program that uses some of system calls or build excessive debug logging of cmd/link if you just give an advice what direction to look ignorer to isolate and understand linker issue.

@prattmic
Copy link
Member

I tend to agree with @ianlancetaylor that this looks like a sandboxed environment. Is this AWS CodePipeline? I've been trying to find docs about its execution environment, though I haven't found much.

@cherrymui
Copy link
Member

Maybe the path forward here is to add EPERM to the allowed errors of fallocate (along with ENOTSUP/EOPNOTSUPP which we already do), but not accepting arbitrary errors (for one, we do want to catch "not enough space" error, which is the whole purpose of fallocate), also not accepting errors on Mmap.

@prattmic
Copy link
Member

That seems fine; the fallocate is just a hint and if there really is a permission error it should come up again on mmap.

@fogfish
Copy link
Author

fogfish commented Oct 19, 2020

the issue is still waiting for info. How I can help you to dig more info?

@cherrymui cherrymui removed the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Oct 19, 2020
@cherrymui
Copy link
Member

@fogfish thanks. I think we have enough information and a proposed solution (#41356 (comment)).

@aclements aclements added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Oct 19, 2020
@aclements aclements modified the milestones: Backlog, Go1.16 Oct 19, 2020
@golang golang locked and limited conversation to collaborators Oct 29, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

8 participants