Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/link: compilation failure on armv6/armv7 due to truncated relocations #58428

Closed
tpaschalis opened this issue Feb 9, 2023 · 16 comments
Closed
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.

Comments

@tpaschalis
Copy link
Contributor

What version of Go are you using (go version)?

$ docker run grafana/agent-build-image:0.21.0 bash -c "go version"
go version go1.20 linux/arm64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

The following is for the build container that cross-compiles to armv6/armv7 and encounters the failure

go env Output
$ docker run grafana/agent-build-image:0.21.0 bash -c "go env"
GO111MODULE=""
GOARCH="arm64"
GOBIN=""
GOCACHE="/root/.cache/go-build"
GOENV="/root/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="arm64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_arm64"
GOVCS=""
GOVERSION="go1.20"
GCCGO="gccgo"
AR="ar"
CC="viceroycc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/dev/null"
GOWORK=""
CGO_CFLAGS="-O2 -g"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-O2 -g"
CGO_FFLAGS="-O2 -g"
CGO_LDFLAGS="-O2 -g"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -pthread -Wl,--no-gc-sections -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build1972461801=/tmp/go-build -gno-record-gcc-switches"

What did you do?

  1. Cloned the grafana/agent repo, and checked out the reproduce-relocation-truncated branch or grafana/agent@7e4cae2
  2. Run USE_CONTAINER=1 GOOS=linux GOARCH=arm GOARM=6 make agentctl to cross-compile the agentctl binary for ARMv6
  3. Wait around

What did you expect to see?

A successful compilation

What did you see instead?

# github.com/grafana/agent/cmd/grafana-agentctl
/usr/local/go/pkg/tool/linux_amd64/link: running viceroycc failed: exit status 1
/tmp/go-link-3047421554/go.o: in function `github.com/google/gnostic/openapiv3.NewResponse':
/go/pkg/mod/github.com/google/gnostic@v0.6.9/openapiv3/OpenAPIv3.go:3631:(.text+0x207d790): relocation truncated to fit: R_ARM_CALL against `runtime.duffzero'
/go/pkg/mod/github.com/google/gnostic@v0.6.9/openapiv3/OpenAPIv3.go:3635:(.text+0x207d894): relocation truncated to fit: R_ARM_CALL against `runtime.duffzero'
collect2: error: ld returned 1 exit status

make: *** [Makefile:195: agentctl] Error 1

Background

Hello team! I'm opening this issue on behalf of the Grafana Agent squad.

Recently, we saw our ARMv6 and ARMv6 builds start to fail with relocation truncated to fit: errors.

We think it has to do with the growing binary size and the number of dependencies that the Agent brings in. Looking at the GCC arm-specific options, we used the -mlong-calls flag for our ARM builds which hid the issue for a few more commits, until it resurfaced.

We also discovered #15823 for a similar issue around builds, but for a different architecture ppc64le which was fixed at the language level in CL27790. Do you think this is similar enough?

Some other information that might be useful:

  • Our process uses a Docker container to perform the build and copies the resulting build back to the host
  • It looks like that having CGO enabled/disabled has an effect on builds; we need CGO_ENABLED=1 for our dependencies
  • Within the build container, CC is set to a shell script which looks for an appropriate gcc for the GOOS/GOARCH/GOARM tuple and calls out to it; in this case arm-linux-gnueabi-gcc installed from Debian bullseye is used
@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Feb 9, 2023
tpaschalis added a commit to tpaschalis/go that referenced this issue Feb 10, 2023
Fixes golang#58428

Change-Id: I2e5c277cf609b98081f38da8de4716fd0cd8efdd
@tpaschalis
Copy link
Contributor Author

tpaschalis commented Feb 10, 2023

So the reference PR is over 6 years old now, and things have moved around in the linker.

I noticed the splitTextSections helper; could it be that we need to add sys.ARM to the architectures that need their text sections split, like this? I'd love feedback from anyone more knowledgeable about ARM and the linker here.

@dr2chase
Copy link
Contributor

dr2chase commented Feb 10, 2023

@golang/compiler

@dr2chase dr2chase added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Feb 10, 2023
@liggitt
Copy link
Contributor

liggitt commented Feb 11, 2023

I'd be interested to know if the workaround in #58425 (comment) works here as well

@rfratto
Copy link

rfratto commented Feb 11, 2023

👋 Reporting in for the Grafana Agent team while @tpaschalis is out:

I just tested the workaround from #58425 (comment). One of our binaries still gets linking errors for GOARCH=arm GOARM=7 even when using the workaround:

$ GOOS=linux GOARCH=arm GOARM=7 CGO_ENABLED=1 GOEXPERIMENT=nounified \
  go build \
    -ldflags "-X github.com/grafana/agent/pkg/build.Branch=main -X github.com/grafana/agent/pkg/build.Version=main-b82b26c -X github.com/grafana/agent/pkg/build.Revision=b82b26cb -X github.com/grafana/agent/pkg/build.BuildUser=root@bca20e00e921 -X github.com/grafana/agent/pkg/build.BuildDate=2023-02-11T20:15:03Z" \
    -tags "netgo " \
    -gcflags=all=-d=inlstaticinit=0 \
    -o build/grafana-agentctl \
    ./cmd/grafana-agentctl

...

/usr/local/go/pkg/tool/linux_amd64/link: running viceroycc failed: exit status 1
/tmp/go-link-201436594/go.o: in function `k8s.io/kube-openapi/pkg/internal/third_party/go-json-experiment/json.(*Encoder).WriteToken':
/go/pkg/mod/k8s.io/kube-openapi@v0.0.0-20221123214604-86e75ddd809a/pkg/internal/third_party/go-json-experiment/json/encode.go:395:(.text+0x207d648): relocation truncated to fit: R_ARM_CALL against `runtime.duffcopy'
/go/pkg/mod/k8s.io/kube-openapi@v0.0.0-20221123214604-86e75ddd809a/pkg/internal/third_party/go-json-experiment/json/encode.go:399:(.text+0x207d734): relocation truncated to fit: R_ARM_CALL against `runtime.duffcopy'
/go/pkg/mod/k8s.io/kube-openapi@v0.0.0-20221123214604-86e75ddd809a/pkg/internal/third_party/go-json-experiment/json/encode.go:434:(.text+0x207d8a4): relocation truncated to fit: R_ARM_CALL against `runtime.duffcopy'
collect2: error: ld returned 1 exit status

@liggitt
Copy link
Contributor

liggitt commented Feb 11, 2023

thanks for checking... does the same issue reproduce with go1.19.5 (dropping the GOEXPERIMENT=nounified and -gcflags=all=-d=inlstaticinit=0 bits which don't apply to go1.19)?

@rfratto
Copy link

rfratto commented Feb 11, 2023

Yes, we first started to observe this with go1.19.4, and I'm able to locally reproduce the same issue using go1.19.5 as well.

@liggitt
Copy link
Contributor

liggitt commented Feb 11, 2023

That's a helpful data point, thanks... sounds like this particular build / set of dependencies comes up "unlucky" as described in #57410 (comment), unrelated to changes in the go compiler to switch to the unified compiler frontend between go1.19 and go1.20

@thanm thanm self-assigned this Feb 13, 2023
@thanm
Copy link
Contributor

thanm commented Feb 13, 2023

I am going to dup this bug against issue #58425 since it looks pretty much the same, and I've verified that the fix also eliminates the link failure here.

If you have any way to test the resulting ARM binary that would be great, thanks.

@thanm
Copy link
Contributor

thanm commented Feb 13, 2023

dup of #58425

@thanm thanm closed this as completed Feb 13, 2023
@gopherbot
Copy link

Change https://go.dev/cl/467715 mentions this issue: cmd/link/internal/ld: fix text section splitting for ARM

gopherbot pushed a commit that referenced this issue Feb 13, 2023
Fix a problem with trampoline generation for ARM that was causing link
failures when building selected k8s targets. Representative error
(this is coming from the external linker):

  go.go:(.text+...): relocation truncated to fit: R_ARM_CALL against `runtime.duffcopy'

The Go linker is supposed to be limiting text section size for ARM to
0x1c00000 bytes, however due to a problem in the tramp generation
phase this limit wasn't being enforced.

Updates #58428.
Fixes #58425.

Change-Id: I4e778bdcbebeab607a6e626b354ca5109e52a1aa
Reviewed-on: https://go-review.googlesource.com/c/go/+/467715
Run-TryBot: Than McIntosh <thanm@google.com>
Reviewed-by: David Chase <drchase@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
@gopherbot
Copy link

Change https://go.dev/cl/468537 mentions this issue: cmd/link/internal/ld: fix text section splitting for ARM

@gopherbot
Copy link

Change https://go.dev/cl/468538 mentions this issue: cmd/link/internal/ld: fix text section splitting for ARM

@gopherbot
Copy link

Change https://go.dev/cl/469275 mentions this issue: cmd/link: revise fix for arm32 trampgen problem with duff routines

johanbrandhorst pushed a commit to Pryz/go that referenced this issue Feb 22, 2023
Fix a problem with trampoline generation for ARM that was causing link
failures when building selected k8s targets. Representative error
(this is coming from the external linker):

  go.go:(.text+...): relocation truncated to fit: R_ARM_CALL against `runtime.duffcopy'

The Go linker is supposed to be limiting text section size for ARM to
0x1c00000 bytes, however due to a problem in the tramp generation
phase this limit wasn't being enforced.

Updates golang#58428.
Fixes golang#58425.

Change-Id: I4e778bdcbebeab607a6e626b354ca5109e52a1aa
Reviewed-on: https://go-review.googlesource.com/c/go/+/467715
Run-TryBot: Than McIntosh <thanm@google.com>
Reviewed-by: David Chase <drchase@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
@gopherbot
Copy link

Change https://go.dev/cl/471596 mentions this issue: cmd/link: revert CL 467715 in favor of better fix

gopherbot pushed a commit that referenced this issue Feb 27, 2023
This patch provides a fix for a problem linking large arm32 binaries
with external linking, specifically R_CALLARM relocations against
runtime.duff* routines being flagged by the external linker as not
reaching.

What appears to be happening in the bug in question is that the Go
linker and the external linker are using slightly different recipes to
decide whether a given R_CALLARM relocation will "fit" (e.g. will not
require a trampoline). The Go linker is taking into account the addend
on the call reloc (which for calls to runtime.duffcopy or
runtime.duffzero is nonzero), whereas the external linker appears to
be ignoring the addend.

Example to illustrate:

   Addr      Size   Func
   -----     -----  -----
   ...
   XYZ       1024   runtime.duffcopy
   ...
   ABC       ...    mypackge.MyFunc
     + R0: R_CALLARM  o=8 a=848 tgt=runtime.duffcopy<0>

Let's say that the distance between ABC (start address of
runtime.duffcopy) and XYZ (start of MyFunc) is just over the
architected 24-bit maximum displacement for an R_CALLARM (let's say
that ABC-XYZ is just over the architected limit by some small value,
say 36). Because we're calling into runtime.duffcopy at offset 848,
however, the relocation does in fact fit, but if the external linker
isn't taking into account the addend (assuming that all calls target
the first instruction of the called routine), then we'll get a
"doesn't fit" error from the linker.

To work around this problem, revise the ARM trampoline generation code
in the Go linker that computes the trampoline threshold to ignore the
addend on R_CALLARM relocations, so as to harmonize the two linkers.

Updates #58428.
Updates #58425.

Change-Id: I56e580c05b7b47bbe8edf5532a1770bbd700fbe5
Reviewed-on: https://go-review.googlesource.com/c/go/+/469275
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Than McIntosh <thanm@google.com>
gopherbot pushed a commit that referenced this issue Feb 27, 2023
This patch backs out CL 467715 (written to fix 58425), now that we
have a better fix for the "relocation doesn't fit" problem in the
trampoline generation phase (send in a previous CL).

Updates #58428.
Updates #58425.

Change-Id: Ib0d966fed00bd04db7ed85aa4e9132382b979a44
Reviewed-on: https://go-review.googlesource.com/c/go/+/471596
Run-TryBot: Than McIntosh <thanm@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
@gopherbot
Copy link

Change https://go.dev/cl/471597 mentions this issue: cmd/link: better fix for arm32 trampgen problem with duff routines

@gopherbot
Copy link

Change https://go.dev/cl/471598 mentions this issue: cmd/link: better fix for arm32 trampgen problem with duff routines

gopherbot pushed a commit that referenced this issue Feb 27, 2023
…em with duff routines

This patch provides a fix for a problem linking large arm32 binaries
with external linking, specifically R_CALLARM relocations against
runtime.duff* routines being flagged by the external linker as not
reaching.

What appears to be happening in the bug in question is that the Go
linker and the external linker are using slightly different recipes to
decide whether a given R_CALLARM relocation will "fit" (e.g. will not
require a trampoline). The Go linker is taking into account the addend
on the call reloc (which for calls to runtime.duffcopy or
runtime.duffzero is nonzero), whereas the external linker appears to
be ignoring the addend.

Example to illustrate:

   Addr      Size   Func
   -----     -----  -----
   ...
   XYZ       1024   runtime.duffcopy
   ...
   ABC       ...    mypackge.MyFunc
     + R0: R_CALLARM  o=8 a=848 tgt=runtime.duffcopy<0>

Let's say that the distance between ABC (start address of
runtime.duffcopy) and XYZ (start of MyFunc) is just over the
architected 24-bit maximum displacement for an R_CALLARM (let's say
that ABC-XYZ is just over the architected limit by some small value,
say 36). Because we're calling into runtime.duffcopy at offset 848,
however, the relocation does in fact fit, but if the external linker
isn't taking into account the addend (assuming that all calls target
the first instruction of the called routine), then we'll get a
"doesn't fit" error from the linker.

To work around this problem, revise the ARM trampoline generation code
in the Go linker that computes the trampoline threshold to ignore the
addend on R_CALLARM relocations, so as to harmonize the two linkers.

Fixes #58502.
Updates #58428.
Updates #58425.

Change-Id: I56e580c05b7b47bbe8edf5532a1770bbd700fbe5
Reviewed-on: https://go-review.googlesource.com/c/go/+/469275
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Than McIntosh <thanm@google.com>
(cherry picked from commit 0b5affb)
Reviewed-on: https://go-review.googlesource.com/c/go/+/471598
gopherbot pushed a commit that referenced this issue Feb 27, 2023
…em with duff routines

This patch provides a fix for a problem linking large arm32 binaries
with external linking, specifically R_CALLARM relocations against
runtime.duff* routines being flagged by the external linker as not
reaching.

What appears to be happening in the bug in question is that the Go
linker and the external linker are using slightly different recipes to
decide whether a given R_CALLARM relocation will "fit" (e.g. will not
require a trampoline). The Go linker is taking into account the addend
on the call reloc (which for calls to runtime.duffcopy or
runtime.duffzero is nonzero), whereas the external linker appears to
be ignoring the addend.

Example to illustrate:

   Addr      Size   Func
   -----     -----  -----
   ...
   XYZ       1024   runtime.duffcopy
   ...
   ABC       ...    mypackge.MyFunc
     + R0: R_CALLARM  o=8 a=848 tgt=runtime.duffcopy<0>

Let's say that the distance between ABC (start address of
runtime.duffcopy) and XYZ (start of MyFunc) is just over the
architected 24-bit maximum displacement for an R_CALLARM (let's say
that ABC-XYZ is just over the architected limit by some small value,
say 36). Because we're calling into runtime.duffcopy at offset 848,
however, the relocation does in fact fit, but if the external linker
isn't taking into account the addend (assuming that all calls target
the first instruction of the called routine), then we'll get a
"doesn't fit" error from the linker.

To work around this problem, revise the ARM trampoline generation code
in the Go linker that computes the trampoline threshold to ignore the
addend on R_CALLARM relocations, so as to harmonize the two linkers.

Fixes #58503.
Updates #58428.
Updates #58425.

Change-Id: I56e580c05b7b47bbe8edf5532a1770bbd700fbe5
Reviewed-on: https://go-review.googlesource.com/c/go/+/469275
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Than McIntosh <thanm@google.com>
(cherry picked from commit 0b5affb)
Reviewed-on: https://go-review.googlesource.com/c/go/+/471597
romaindoumenc pushed a commit to TroutSoftware/go that referenced this issue Mar 3, 2023
…em with duff routines

This patch provides a fix for a problem linking large arm32 binaries
with external linking, specifically R_CALLARM relocations against
runtime.duff* routines being flagged by the external linker as not
reaching.

What appears to be happening in the bug in question is that the Go
linker and the external linker are using slightly different recipes to
decide whether a given R_CALLARM relocation will "fit" (e.g. will not
require a trampoline). The Go linker is taking into account the addend
on the call reloc (which for calls to runtime.duffcopy or
runtime.duffzero is nonzero), whereas the external linker appears to
be ignoring the addend.

Example to illustrate:

   Addr      Size   Func
   -----     -----  -----
   ...
   XYZ       1024   runtime.duffcopy
   ...
   ABC       ...    mypackge.MyFunc
     + R0: R_CALLARM  o=8 a=848 tgt=runtime.duffcopy<0>

Let's say that the distance between ABC (start address of
runtime.duffcopy) and XYZ (start of MyFunc) is just over the
architected 24-bit maximum displacement for an R_CALLARM (let's say
that ABC-XYZ is just over the architected limit by some small value,
say 36). Because we're calling into runtime.duffcopy at offset 848,
however, the relocation does in fact fit, but if the external linker
isn't taking into account the addend (assuming that all calls target
the first instruction of the called routine), then we'll get a
"doesn't fit" error from the linker.

To work around this problem, revise the ARM trampoline generation code
in the Go linker that computes the trampoline threshold to ignore the
addend on R_CALLARM relocations, so as to harmonize the two linkers.

Fixes golang#58503.
Updates golang#58428.
Updates golang#58425.

Change-Id: I56e580c05b7b47bbe8edf5532a1770bbd700fbe5
Reviewed-on: https://go-review.googlesource.com/c/go/+/469275
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Than McIntosh <thanm@google.com>
(cherry picked from commit 0b5affb)
Reviewed-on: https://go-review.googlesource.com/c/go/+/471597
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

6 participants