Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd: support mapping symbols for ARM64 #47908

Open
vpachkov opened this issue Aug 23, 2021 · 14 comments · May be fixed by #47786
Open

cmd: support mapping symbols for ARM64 #47908

vpachkov opened this issue Aug 23, 2021 · 14 comments · May be fixed by #47786
Labels
arch-arm64 FeatureRequest NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Milestone

Comments

@vpachkov
Copy link
Contributor

What version of Go are you using (go version)?

$ go version
go version devel go1.18-8b471db71b Wed Aug 18 08:26:44 2021 +0000 darwin/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE="off"
GOARCH="arm64"
GOBIN=""
GOCACHE="/Users/slava/Library/Caches/go-build"
GOENV="/Users/slava/Library/Application Support/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/slava/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/Users/slava/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/Users/slava/dev/mygo/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/Users/slava/dev/mygo/go/pkg/tool/darwin_amd64"
GOVCS=""
GOVERSION="devel go1.18-8b471db71b Wed Aug 18 08:26:44 2021 +0000"
GCCGO="gccgo"
AR="ar"
CC="/usr/bin/clang"
CXX="/usr/bin/clang++"
CGO_ENABLED="0"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/ld/pkx3km8x53qc7qb6wwwvxcr80000gn/T/go-build3739783046=/tmp/go-build -gno-record-gcc-switches"

What did you do?

$ go build

What did you expect to see?

The special mapping symbols appear in the symbol table.
Readlef:

67: 0000000000018dc4     0 NOTYPE  LOCAL  DEFAULT    1 $d
68: 0000000000018df0     0 NOTYPE  LOCAL  DEFAULT    1 $x

Objdump:

   18db4:	f94007e0 	ldr	x0, [sp, #8]
   18db8:	f9400be1 	ldr	x1, [sp, #16]
   18dbc:	17fffe99 	b	18820 
   18dc0:	14000000 	b	18dc0 
   18dc4:	00010198 	.inst	0x00010198 ; undefined
   18dc8:	000101f8 	.inst	0x000101f8 ; undefined
   18dcc:	000101f0 	.inst	0x000101f0 ; undefined
   18dd0:	000101e8 	.inst	0x000101e8 ; undefined
   18dd4:	000101c0 	.inst	0x000101c0 ; undefined
   18dd8:	000169e8 	.inst	0x000169e8 ; undefined
   18ddc:	000169b8 	.inst	0x000169b8 ; undefined
   18de0:	000169d0 	.inst	0x000169d0 ; undefined
   18de4:	d503201f 	nop
   18de8:	d503201f 	nop
   18dec:	d503201f 	nop

0000000000018df0 runtime.sysReserveAligned:
   18df0:	f9400b90 	ldr	x16, [x28, #16]
   18df4:	910003f1 	mov	x17, sp
   18df8:	eb10023f 	cmp	x17, x16

What did you see instead?

The lack of $x and $d arm mapping symbols inside the symbol table and a regular zeroed padding
Objfump:

   18db4:	f94007e0 	ldr	x0, [sp, #8]
   18db8:	f9400be1 	ldr	x1, [sp, #16]
   18dbc:	17fffe99 	b	18820 
   18dc0:	14000000 	b	18dc0 
   18dc4:	00010198 	.inst	0x00010198 ; undefined
   18dc8:	000101f8 	.inst	0x000101f8 ; undefined
   18dcc:	000101f0 	.inst	0x000101f0 ; undefined
   18dd0:	000101e8 	.inst	0x000101e8 ; undefined
   18dd4:	000101c0 	.inst	0x000101c0 ; undefined
   18dd8:	000169e8 	.inst	0x000169e8 ; undefined
   18ddc:	000169b8 	.inst	0x000169b8 ; undefined
   18de0:	000169d0 	.inst	0x000169d0 ; undefined
	...

0000000000018df0 runtime.sysReserveAligned:
   18df0:	f9400b90 	ldr	x16, [x28, #16]
   18df4:	910003f1 	mov	x17, sp
   18df8:	eb10023f 	cmp	x17, x16

ELF for the Arm® 64-bit Architecture (AArch64): Mapping symbols chapter
requires that the special symbols are inserted into object files:
$x - At the start of a region of code containing AArch64 instructions.
$d - At the start of a region of data.

I propose to add this functionality since it's a part of a standard and already supported by other languages.

Also I think it's reasonable to use NOPs for function aligning instead of zeroing. There was no purpose of doing it before, but now this's needed to not generate $x and $d for every function and place them just in transitions. In other words, this is an optimization that minimizes the amount of mapping symbols inside the symbol table.

@vpachkov
Copy link
Contributor Author

Also please take a look at #47786 PR. It contains a possible implementation of mapping symbols functionality.

@ALTree ALTree added NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. FeatureRequest labels Aug 23, 2021
@ALTree
Copy link
Member

ALTree commented Aug 23, 2021

cc @cherrymui @thanm since they requested OP to open an issue on the CL.

@gopherbot
Copy link

Change https://golang.org/cl/343150 mentions this issue: cmd: support mapping symbols for ARM64

@cherrymui
Copy link
Member

What are the benefits exactly? It seems the only difference is it makes objdump output nicer? And only for that three NOPs?

Also I think it's reasonable to use NOPs for function aligning instead of zeroing

I think that is fine (and can be done independently). Or maybe we should use a trap instruction.

@vpachkov
Copy link
Contributor Author

vpachkov commented Oct 7, 2021

What are the benefits exactly? It seems the only difference is it makes objdump output nicer? And only for that three NOPs?

Also I think it's reasonable to use NOPs for function aligning instead of zeroing

I think that is fine (and can be done independently). Or maybe we should use a trap instruction.

The reason is - it lowers the amount of generated mapping symbols inside a symbol table. "$d" symbol should be created for every transition from code (actual instructions) to data (something that's not an actual instruction e.g. padding zeros at the bottom of a function). If we used NOPs for padding, additional "$d" wouldn't be required since NOP is a correct instruction.

@cherrymui
Copy link
Member

What are the benefits for those symbols at the first place? Why does it matter if it is instruction or data?

@thanm
Copy link
Contributor

thanm commented Oct 7, 2021

The rationale from the ARM document says "Linkers, file decoders and other tools need to map binaries correctly", for what that is worth.

It would be interesting to see what other tools out there besides objdump actually make use of the symbols. I thought maybe they might be used in something like dynamorio or BOLT, but I can't seem to find any code there that uses them.

@yota9
Copy link

yota9 commented Oct 7, 2021

Hello @thanm @cherrymui . llvm-bolt project indeed uses mapping symbols, that's why we need this patch. For example during the function disassemble stage we need to check if it is the constant island on the particular function offset, otherwise we will try to disassemble it as the instruction. JFYI The data offsets for functions are filled here

@thanm
Copy link
Contributor

thanm commented Oct 7, 2021

Thanks @yota9, I stand corrected. My search wasn't very thorough apparently.

@cherrymui
Copy link
Member

llvm-bolt project indeed uses mapping symbols, that's why we need this patch

Could you explain more? From "[it] uses mapping symbols" to "we need this patch" there are many steps in between. What happens if we don't have them?

try to disassemble it as the instruction

What is the problem for this? (FWIW, currently, we don't support and expect any tool post-editing a Go binary.)

@yota9
Copy link

yota9 commented Oct 12, 2021

What is the problem for this? (FWIW, currently, we don't support and expect any tool post-editing a Go binary.)

If it is not the instruction it will fail to disassemble it. Since the data in constant island is the part of the function we need to know exactly where are the instructions and where are the data to process it correctly.

As for the second part I'm working on golang support for llvm-bolt tool. I hope it will be open sourced soon.

@cherrymui
Copy link
Member

See #49031 (comment) about binary post-editing. Is there any other reason we want to do this? Thanks.

@vpachkov
Copy link
Contributor Author

My opinion is that the main reason why we want to do this is it's a part of the ARM64 ELF standard. Optimizers, linkers, debuggers, profiling and disassembling tools need to map images correctly and they rely on that standard. So, answering your question, binary post-editing isn't the only reason for doing this. For example, setting a breakpoint at the literal pool location, can crash the debugging process since without mapping symbols a debugger tool is going to treat that area as instructions.

@thanm
Copy link
Contributor

thanm commented Jan 18, 2022

Related: elderly issue #9118.

@seankhliao seankhliao added this to the Unplanned milestone Aug 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-arm64 FeatureRequest NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants