Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: go1.21rc2: relocation R_X86_64_PC32 out of range error due to inliner changes #61218

Closed
r-hang opened this issue Jul 6, 2023 · 7 comments
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@r-hang
Copy link

r-hang commented Jul 6, 2023

What version of Go are you using (go version)?

$ go version
go version go1.21rc2 linux/amd64

Does this issue reproduce with the latest release?

Builds successfully on go version go1.20.4 linux/amd64 and go1.19.x Fails on go1.21rc2

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GOARCH='amd64'
GOBIN='/home/user/go-code/bin'
GOCACHE='/home/user/.cache/go-build'
GOENV='/home/user/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/home/user/go-code/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/user/go-code'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/home/user/.cache/bazel/_bazel_rhang/b97476d719d716accead0f2d5b93104f/external/go_sdk_linux_amd64'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/home/user/.cache/bazel/_bazel_rhang/b97476d719d716accead0f2d5b93104f/external/go_sdk_linux_amd64/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.21rc2'
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD=''
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS="-O2 -g"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build2749149267=/tmp/go-build -gno-record-gcc-switches"

What did you do?

In testing our code against go1.21rc2 we found that a large go_binary target (we use Bazel) fails to build only on go1.21rc2. git bisect points to this commit as the cause. The build time for this target also regresses significantly before the build exits compared to go1.20.4. Investigation details are posted in the last section of this ticket.

What did you expect to see?

Build pass. Program successfully builds.

What did you see instead?

external/go_sdk/pkg/tool/linux_amd64/link: running external/zig_sdk/tools/x86_64-linux-gnu.2.19/c++ failed: exit status 1
ld.lld: error: /tmp/go-link-3543165859/go.o:(function
some.internal.code._List_IncentiveOpsSupportData_Read: .text+0x38f310f0): relocation R_X86_64_PC32 out of range: -2147553908 is not in [-2147483648, 2147483647]; references type:*
>>> referenced by go.go
>>> defined in /tmp/go-link-3543165859/go.o

// various forms of this error message are repeated until the end…

ld.lld: error: /tmp/go-link-3543165859/go.o:(function some.internal.code/base/xsync.(*TypedMap[go.shape.string,go.shape.*uint8]).Range.func1: .text+0x3b7ecbab): relocation R_X86_64_PC32 out of range: -2175929103 is not in [-2147483648, 2147483647]; references type:*
>>> referenced by go.go
>>> defined in /tmp/go-link-3543165859/go.o

ld.lld: error: /tmp/go-link-3543165859/go.o:(function main.main: .text+0x3b7ed315): relocation R_X86_64_PC32 out of range: -2180777081 is not in [-2147483648, 2147483647]; references type:*
>>> referenced by go.go
>>> defined in /tmp/go-link-3543165859/go.o

link: error running subcommand external/go_sdk/pkg/tool/linux_amd64/link: exit status 2

We're not able to post a reproducible program here since the code is internal but we have investigation notes that we hope the Go team will find helpful.

To narrow down what code patterns might cause issues in go1.21rc2, we searched for build artifacts that got significantly larger between go1.20 vs go1.21rc2 and found some.

(go1.20)
 % du -ch $GOPATH/bazel-bin/src/some.internal.code/go_default_library.a
576M   
(go1.21)
% du -ch $GOPATH/bazel-bin/src/some.internal.code/go_default_library.a
709M   

The generated packages are enormous but are built with generated code that follows a particular pattern that we believe does not interact well with go1.21rc2

The generated code makes extensive use of generics and anonymous function closures. Internally, we were able to rewrite the code generator to drop the use of generics and anonymous function closures and build the application successfully with go1.21rc2.
Unfortunately, we can’t share the exact code of the application but we can share the output of the code generator on a toy input before and after changes that allow the application to build on go1.21rc2.

Current go1.21rc2 failing toy generator output
// generated generic function to be called in HandleFoo
func _cffFlowmagic_38_9[
	t1 any,
	t2 any,
	t3 any,
	t4 any,
	t5 any,
	t6 any,
	t7 any,
	t8 any,
](
	ctx context.Context,
	mmagic39_3 func() t1,
	mmagic40_3 func() *t2,
	mmagic41_3 func() int,
	mmagic42_3 func() cff.Emitter,
	mmagic43_3 func() cff.Emitter,
	mmagic44_3 func() string,
	mmagic46_3 func() func(t1) (t3, t4),
	mmagic54_3 func() (func(context.Context, t3) (t5, error), time.Duration),
	mmagic57_3 func() func(t6) (t7, error),
	mmagic58_3 func() func(t7) t2,
	mmagic67_3 func() (func(t4) (t8, error), func(t3) bool, t8, string),
	mmagic75_3 func() (func(t5, t8) t6, func(t3) bool, string),
	mmagic88_3 func() (func(t6) error, bool),
) (err error) {
	mmagic39_14 := mmagic39_3()
	_ = mmagic39_14 // possibly unused.
	mmagic40_15 := mmagic40_3()
	// leaving out the rest of the implementation …
}

// HandleFoo calls the generated generic function above with instantiated types.
func (h *fooHandler) HandleFoo(ctx context.Context, req *Request) (*Response, error) {
var res *ResponseV2
err := _cffFlowmagic_38_9(ctx,
_cffParamsmagic_39_3(req),
_cffResultsmagic_40_3(&res),
_cffConcurrencymagic_2515107422(8),
_cffWithEmittermagic_3224879298(cffemit.Metrics(h.scope)),
_cffWithEmittermagic_3224879298(cffemit.Logs(h.logger)),
_cffInstrumentFlowmagic_398550328("HandleFoo"),
_cffTaskmagic_13613616[func(req *RequestV2) (*GetManagerRequestV2, *ListUsersRequestV2)](
func(req *RequestV2) (*GetManagerRequestV2, *ListUsersRequestV2) {
return &GetManagerRequestV2{
LDAPGroup: req.LDAPGroup,
}, &ListUsersRequestV2{
LDAPGroup: req.LDAPGroup,
}
}),
// leaving out the rest HandleFoo ...
}

// the other _cff generated functions follow this form.
func _cffParamsmagic_39_3[t1 any](mmagic39_14 t1) func() t1 {
return func() t1 { return mmagic39_14 }
}

func _cffResultsmagic_40_3[t9 any](mmagic40_15 t9) func() t9 {
return func() t9 { return mmagic40_15 }
}

Updated go1.21rc2 passing toy generator output
// This HandleFoo generated code is much simpler as it drops the use of generics and function closures to “pass through” values // in the scope of HandlerFoo to generated implementation functions defined at the bottom of the file.
// For context, the more complicated code generation scheme was created to avoid shifting line positions between source code
// and generated code to help with ends like Bazel-based code coverage and debugging in generated code.
func (h *fooHandler) HandleFoo(ctx context.Context, req *Request) (*Response, error) {
	var res *Response
	err := func() (err error) {
		/*line magic.go:37:18*/
		_37_18 := ctx
		/*line magic.go:38:14*/
		_38_14 := req
		/*line magic.go:39:15*/
		_39_15 := &res
		/*line magic.go:40:19*/
		_40_19 := 8
		/*line magic.go:41:19*/
		_41_19 := cffemit.Metrics(h.scope)
		/*line magic.go:42:19*/
		_42_19 := cffemit.Logs(h.logger)
		/*line magic.go:43:22*/
		_43_22 := "HandleFoo"
		/*line magic.go:46:4*/
		_46_4 := func(req *Request) (*GetManagerRequest, *ListUsersRequest) {
			return &GetManagerRequest{
					LDAPGroup: req.LDAPGroup,
				}, &ListUsersRequest{
					LDAPGroup: req.LDAPGroup,
				}
		}
                 // leaving out the rest of the implementation …
}

For the notable packages in the application, the symbol table dump is larger in go1.21c2 than in go1.20.

(go1.20)
$ go tool nm v1/go_default_library.a > /tmp/st1    
$ du -ch /tmp/st1 (204M)

(go1.21rc2)
go tool nm v2/go_default_library.a > /tmp/st2    
$ du -ch /tmp/st2 (240M)

Trying to understand what new symbols might be generated in go1.21rc2, we picked a specific function name (e.g. _cffFallbackWithcff_100088_4) and searched for all instances in the symbol table of the archive in go1.20 and go1.21rc2.

(go1.20)
2251c4d5 R some.internal.code..dict._cffFallbackWithcff_100088_4[*generated/example.code.User]
212faac4 r some.internal.code._cffFallbackWithcff_100088_4[go.shape.*uint8_0].arginfo1<1>
212faac7 r some.internal.code._cffFallbackWithcff_100088_4[go.shape.*uint8_0].argliveinfo<1>
18546595 T some.internal.code._cffFallbackWithcff_100088_4[go.shape.*uint8_0].func1

(go1.21rc2)
99be885 T some.internal.code._cffFallbackWithcff_100088_4[*generated/example.code.User]
1db76b1d T some.internal.code._cffFallbackWithcff_100088_4[*generated/example.code.User]._cffFallbackWithcff_100088_4[go.shape.*uint8].func1
292dc4fa r some.internal.code._cffFallbackWithcff_100088_4[*generated/example.code.User].arginfo1<1>
292dc4fb r some.internal.code._cffFallbackWithcff_100088_4[*generated/example.code.User].argliveinfo<1>
299be7ff T some.internal.code._cffFallbackWithcff_100088_4[go.shape.*uint8]
292dc4f5 r some.internal.code._cffFallbackWithcff_100088_4[go.shape.*uint8].argi
nfo1<1>
292dc4f8 r some.internal.code._cffFallbackWithcff_100088_4[go.shape.*uint8].argl
iveinfo<1>
1db76b18 T some.internal.code._cffFallbackWithcff_100088_4[go.shape.*uint8].func1
2a7543c5 ? go:info.some.internal.code._cffFallbackWithcff_100088_4[go.shape.*uint8]$abstract
1d5260a4 T some.internal.code.(*userContextController).userContextRecentContactsValueOrderOrderBillingDataBillingDataEntityRef._cffFallbackWithcff_100088_4[go.shape.*uint8].func5
2a618793 R some.internal.code..dict._cffFallbackWithcff_100088_4[*generated/example.code.User]
@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 6, 2023
@ianlancetaylor ianlancetaylor changed the title cmd/link: go1.21rc2: relocation R_X86_64_PC32 out of range error cmd/compile: go1.21rc2: relocation R_X86_64_PC32 out of range error due to inliner changes Jul 7, 2023
@bcmills bcmills added this to the Go1.21 milestone Jul 10, 2023
@bcmills bcmills added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jul 10, 2023
@thanm
Copy link
Contributor

thanm commented Jul 12, 2023

Thanks for the report.

The specific CL that your bisection points to does increase the amount of inlining, so it is not too surprising that the code generated by the Go compiler is getting bigger, especially if you make heavy use of closures.

One thing that would be helpful here I think is if we had some understanding about just how much bigger your code is getting. If you could link your program with some special flags (as below) and then capture the size of the intermediates (both with the old Go version and with 1.21rc2, that would be great. Here is an example on how to gather the data (this uses a build of "himom.go", but hopefully you can do something along these lines for your program):

$ cat script.sh
#!/bin/sh
set -x
set -e
rm -rf /tmp/xxx
mkdir /tmp/xxx
rm -f himom
go version
go build -ldflags="-linkmode=external -tmpdir=/tmp/xxx" himom.go
ls /tmp/xxx/go.o
size /tmp/xxx/go.o
$
$ // first build with 1.20
$ PATH=/ssd2/go1.20/bin:${PATH} sh script.sh
+ set -e
+ rm -rf /tmp/xxx
+ mkdir /tmp/xxx
+ rm -f himom
+ go version
go version go1.20.5 linux/amd64
+ go build '-ldflags=-linkmode=external -tmpdir=/tmp/xxx' himom.go
+ ls /tmp/xxx/go.o
/tmp/xxx/go.o
+ size /tmp/xxx/go.o
   text	   data	    bss	    dec	    hex	filename
1128573	  98096	 203032	1429701	 15d0c5	/tmp/xxx/go.o
$ 
$ // Then with 1.21rc2
$ PATH=/ssd2/go1.21/bin:${PATH} sh script.sh
+ set -e
+ rm -rf /tmp/xxx
+ mkdir /tmp/xxx
+ rm -f himom
+ go version
go version go1.21rc2 linux/amd64
+ go build '-ldflags=-linkmode=external -tmpdir=/tmp/xxx' himom.go
+ ls /tmp/xxx/go.o
/tmp/xxx/go.o
+ size /tmp/xxx/go.o
   text	   data	    bss	    dec	    hex	filename
1129686	  38416	 203824	1371926	 14ef16	/tmp/xxx/go.o
$

The "go.o" file is the output from the Go linker, prior to the point where it gets passed to the external linker.

@thanm thanm added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Jul 12, 2023
@r-hang
Copy link
Author

r-hang commented Jul 14, 2023

Hey @thanm, thank you for taking a look.

I'm having a bit of trouble generating an archive or object file that's readable with the 'size' command. I first tried wrangling with our Bazel setup, but then moved on to a simpler go program and was still unsuccessful.

I created a simple go program

--- project/main.go ---
package main

import "my.experiment/project/other"

func main() {
    ohter.Fn()
}

--- project/other/other.go ---
packager other

func Fn() {}

$ cd project/other
$ go build -o outfile -ldflags="-linkmode=external -tmpdir=/tmp/xxx" ./other.go
$ ls
other.go outfile
$ size outfile
size: __.PKGDEF: file format not recognized
size: go.o: file format not recognized
$ go tool pack x outfile
go.o other.go outfile __.PKGDEF
% size go.o
size: go.o: file format not recognized
$ ls /tmp/xxx
ls: cannot access '/tmp/xxx': No such file or directory

However go tool objdump is able to succesfuly read project/other/outfile and project/other/_go_.o

I tried searching the docs in the go toolchain and the the closest thing I found was that go tool nm takes a -size option, but that seems to provide information per symbol. Is there an alternative command to run? I could try writing a go program to manually parse what size might give me? I hope i'm not missing anything.

@ianlancetaylor
Copy link
Contributor

The size program will work on an executable built from a main package. It won't work on the object file created by compiling a non-main package.

If you follow @thanm 's script above you will get an object file in a temporary directory that size will work with. Note that he is building a main package.

@r-hang
Copy link
Author

r-hang commented Jul 16, 2023

Thank you for the help!

With go1.20.5

 size /tmp/xxx/go.o
   text    data     bss     dec     hex filename
2130518528      27599283        2127160 2160244971      80c2b8eb        /tmp/xxx/go.o

With go1.21rc3

 % size /tmp/xxx/go.o
   text    data     bss     dec     hex filename
2429481255      27079443        2097744 2458658442      928c268a        /tmp/xxx/go.o

A small note for anyone in the future who may encounter something similar with Bazel. I added the -tmpdir option to gc_linkopts attr of the go_binary rule.

@thanm thanm removed the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Jul 17, 2023
@thanm
Copy link
Contributor

thanm commented Jul 17, 2023

Thanks for that. Looks like your text size is up considerably, 15% or so,but more importantly you are working with a binary that it is already enormous and very close to being in the "danger zone" with respect to relocations (from text to data) not reaching.

I would suggest that you try to attack the problem from the angle of reducing the size of your generated code overall. In particular, in your generated code where you have functions that contain closures or return closures, e.g.

func _cffParamsmagic_39_3[t1 any](mmagic39_14 t1) func() t1 {

return func() t1 { return mmagic39_14 }

}

I would suggest changing your code generator to prefix these sorts of functions with "//go:noinline". This should (at least from what I can tell without being able to look at the complete source code) help restore the previous inliner behavior.

Hope this helps.

@r-hang
Copy link
Author

r-hang commented Jul 17, 2023

Thanks @thanm,

Understood, we just wanted to report this issue in the case there was an increase in size out of the bounds the Go team was expecting.

@gopherbot gopherbot modified the milestones: Go1.21, Go1.22 Aug 8, 2023
@thanm
Copy link
Contributor

thanm commented Nov 22, 2023

Circling back on this bug: we don't have any plans to revert CL 492017, hence I think it makes sense to try to approach this problem via changes to your generated code. Closing this bug out, please reopen if needed. Thanks.

@thanm thanm closed this as completed Nov 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

5 participants