Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime,cmd/compile: build errors with "relocation truncated to fit" on linux-riscv-unmatched as of CL 345051 #48791

Closed
bcmills opened this issue Oct 5, 2021 · 6 comments
Labels
arch-riscv Issues solely affecting the riscv64 architecture. FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker
Milestone

Comments

@bcmills
Copy link
Contributor

bcmills commented Oct 5, 2021

# golang.org/x/text/unicode/norm.test
/tmp/workdir-host-linux-riscv64-unmatched/go/pkg/tool/linux_riscv64/link: running gcc failed: exit status 1
/tmp/workdir-host-linux-riscv64-unmatched/tmp/go-link-767494397/go.o: in function `.L0 ':
go.go:(.text+0x314324): relocation truncated to fit: R_RISCV_JAL against `runtime.typedmemclr-tramp3'
collect2: error: ld returned 1 exit status

greplogs --dashboard -md -l -e 'relocation truncated to fit' --since=2021-05-01

2021-10-05T07:32:41-5bd84dd-d55009c/linux-riscv64-unmatched
2021-10-05T01:30:56-5bd84dd-123393a/linux-riscv64-unmatched
2021-10-05T00:29:40-5bd84dd-81b7ec1/linux-riscv64-unmatched
2021-10-02T17:29:46-1b99300-3bbc823/linux-riscv64-unmatched

CC @cherrymui @4a6f656c

@bcmills bcmills added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker arch-riscv Issues solely affecting the riscv64 architecture. labels Oct 5, 2021
@bcmills bcmills added this to the Go1.18 milestone Oct 5, 2021
@bcmills
Copy link
Contributor Author

bcmills commented Oct 5, 2021

This appears to have broken at CL 345051.

@4a6f656c
Copy link
Contributor

4a6f656c commented Oct 6, 2021

I can reproduce this and it will definitely be a result of the call trampolines change.

That said, there is some pretty strange code generation going on here - the unicode/norm test compiles to a 40MB text object, with the norm.init function being a single function with more than 2MB of machine code (that's more than 500,000 machine instructions). There also seems to be very strange pointless repetition in this code, which I'm investigating/attempting to understand:

  2943f8:       00000617                auipc   a2,0x0
  2943fc:       00060613                mv      a2,a2
  294400:       00000697                auipc   a3,0x0
  294404:       00068693                mv      a3,a3
  294408:       00000717                auipc   a4,0x0
  29440c:       00070713                mv      a4,a4
  294410:       00000797                auipc   a5,0x0
  294414:       00078793                mv      a5,a5
  294418:       00000817                auipc   a6,0x0
  29441c:       00080813                mv      a6,a6

The issue is caused by the fact that we now use a JAL instruction for calls, which is only capable of reaching +/-1MB of text. When we cannot reach a function directly the linker lays out a trampoline after the function, however in the case of a >1MB function, we're not able to even reach the trampoline via JAL. It's pretty strange (but clearly not impossible) to have a 1MB+ function...

From the riscv64 side there are potential a number of options if we want/need to continue to support functions of this size (unfortunately all have complexity/downsides):

  • Add code to the assembler and linker to switch from JAL to AUIPC/JALR where a symbol exceeds 1MB in size (this is probably the most optimal, but also the most complex).

  • Use AUIPC/JALR sequences and rewrite to JAL/NOOP where possible (this fixes the issue, but means we're going to increase binary sizes considerably again).

  • Split text symbols that exceed 1MB so that we can layout trampolines in the middle of the symbol (jumping from the first part to the second part, etc).

  • Layout trampolines both before and after the function text (that would get us to ~2MB for a single symbol, as you would jump backwards from the first 1MB and forwards from the last 1MB).

  • Revert the call trampolines all together and go back to using AUIPC/JALR for everything (slower and more bloated code).

@cherrymui
Copy link
Member

@4a6f656c thanks for the investigation.

Add code to the assembler and linker to switch from JAL to AUIPC/JALR where a symbol exceeds 1MB in size (this is probably the most optimal, but also the most complex).

This doesn't sound too bad to me. I think we can do it in the assembler without much complexity. At the PC assignment loop https://cs.opensource.google/go/go/+/master:src/cmd/internal/obj/riscv/obj.go;l=589 we'll know the size of a function. If it is too large we can just redo the PC assignment while rewriting JAL to AUIPC/JALR.

@4a6f656c
Copy link
Contributor

For the record, it turns out that the strange code are address calculations that have not yet had relocations applied. Overall, it looks like the main code bloat results from initialisation of sparse arrays - I would have expected this to be data driven with minimal code, where as instead it seems to be almost entirely code driven.

@4a6f656c
Copy link
Contributor

@4a6f656c thanks for the investigation.

Add code to the assembler and linker to switch from JAL to AUIPC/JALR where a symbol exceeds 1MB in size (this is probably the most optimal, but also the most complex).

This doesn't sound too bad to me. I think we can do it in the assembler without much complexity. At the PC assignment loop https://cs.opensource.google/go/go/+/master:src/cmd/internal/obj/riscv/obj.go;l=589 we'll know the size of a function. If it is too large we can just redo the PC assignment while rewriting JAL to AUIPC/JALR.

Right, we can do that - it adds some complexity and interferes with some of the next steps to further clean up the assembler and eliminate obj.Prog rewriting, but I'll deal with that in due course.

@gopherbot
Copy link

Change https://golang.org/cl/356250 mentions this issue: cmd/internal/obj/riscv: fix trampoline calls from large functions

@golang golang locked and limited conversation to collaborators Oct 19, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-riscv Issues solely affecting the riscv64 architecture. FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker
Projects
None yet
Development

No branches or pull requests

4 participants