cmd/link: allow asking linker to not eliminate a "dead" function #35055

hugelgupf · 2019-10-21T20:30:48Z

We have some assembly code in our Go binary that is never directly called by any Go code in userspace. (It's intended to be handed to the kernel to execute during a kexec in ring 0. Kernel hands execution to this piece of Go assembly, Go assembly hands it to a kernel we're booting.)

Normally, this code would be optimized out of the final binary by the linker. So far, we can get around that with this: https://github.com/u-root/u-root/blob/ac7ae682c648c26a7c90ac73a2a8b4597bdbeb78/pkg/multiboot/internal/trampoline/trampoline_linux_amd64.go#L39

Arguably a hack that will eventually be optimized away by the compiler.

Can we have a directive to not optimize away a piece of code? "//go:nodelete"? This is the assembly in question that shouldn't be optimized away: https://github.com/u-root/u-root/blob/ac7ae682c648c26a7c90ac73a2a8b4597bdbeb78/pkg/multiboot/internal/trampoline/trampoline_linux_amd64.s#L29

The text was updated successfully, but these errors were encountered:

randall77 · 2019-10-21T20:51:03Z

How are you referencing start to give its address to the kernel? Groveling the symbol table or DWARF, perhaps?

You should probably use a reference to start that the linker can see, so you don't have to fake a cal to it.

We do this in the runtime a fair amount. See runtime/proc.go:funcPC. You would do (with your own copy of funcPC):

    pc := funcPC(start)
    callKernel(..., pc, ...)

hugelgupf · 2019-10-21T21:01:00Z

At the moment, we use a hack: strings placed at the beginning and the end of the assembly file, so the whole segment can be given to the kernel: https://github.com/u-root/u-root/blob/ac7ae682c648c26a7c90ac73a2a8b4597bdbeb78/pkg/multiboot/internal/trampoline/trampoline_linux_amd64.go#L67

That's what the "begin" and "end" symbols in the .s file are.

This makes another assumption that I'm not sure will remain true: that the assembly stuff will all be contiguously compiled together.

given that, would you still recommend something like funcPC?

hugelgupf · 2019-10-21T21:06:17Z

I should add -- this is because we don't only want start to be passed, but we need start, boot, farjump{32,64}, and gdt.

Two other strings are in there to mark places where some values need to be inserted. (Those could probably be replaced by funcPC easily.)

randall77 · 2019-10-21T21:28:17Z

This makes another assumption that I'm not sure will remain true: that the assembly stuff will all be contiguously compiled together.

Yes, that's not something you should rely on. Although we've not done anything like this yet, we've been contemplating feedback-directed optimization which would (among other things) reorder function layout.

given that, would you still recommend something like funcPC?

Yes, you could make that work. It doesn't require any hacks except for funcPC itself. That hack is likely to keep working, as the runtime uses it itself.

Note that a //go:nodelete directive won't work. The Go code that this comment would precede is just a declaration, not a definition. You'd really want such a directive in the assembly, which is where the function is defined. That could be done, ala NOSPLIT, but as I said this probably isn't the way to solve your problem.

hugelgupf · 2019-10-21T21:32:49Z

This makes another assumption that I'm not sure will remain true: that the assembly stuff will all be contiguously compiled together.

Yes, that's not something you should rely on. Although we've not done anything like this yet, we've been contemplating feedback-directed optimization which would (among other things) reorder function layout.

When that happens, what do you recommend? What is a better way to solve this problem for us?

given that, would you still recommend something like funcPC?

Yes, you could make that work. It doesn't require any hacks except for funcPC itself. That hack is likely to keep working, as the runtime uses it itself.

I'll replace all the string hacks with funcPC. That should at least keep the linker from getting rid of all that stuff. That just leaves the other problem (contiguous addressing).

randall77 · 2019-10-21T21:43:10Z

When that happens, what do you recommend? What is a better way to solve this problem for us?

I'm not entirely sure what you are asking. Why do you need contiguous? You should pass the values of funcPC(start), funcPC(boot), funcPC(farjump64), etc. to whomever needs them. Then they don't need to be contiguous.

hugelgupf · 2019-10-21T21:51:17Z

When that happens, what do you recommend? What is a better way to solve this problem for us?

I'm not entirely sure what you are asking. Why do you need contiguous? You should pass the values of funcPC(start), funcPC(boot), funcPC(farjump64), etc. to whomever needs them. Then they don't need to be contiguous.

Contiguous is nice, because it allows us to allocate one segment of physical memory to the functions. They are currently written to be relocatable, so it'd be a bit of a bummer if start is at the beginning of a 10MB binary, and boot at the end, because then we have to waste 10MB of space just on those functions since they have to be relative to each other. This means we have to support a trampoline that's potentially more than just a page of memory.

In other words, it's desirable to keep the byte slice returned from here (which contains the entire relocatable trampoline) small.

randall77 · 2019-10-21T22:19:31Z

If they are relocatable you can just copy them out into a new slice.
You'd need to know their length, though, and that's not easily obtainable. But maybe you can invent a reasonable upper bound.

hugelgupf · 2019-10-21T22:56:02Z

If they are relocatable you can just copy them out into a new slice.
You'd need to know their length, though, and that's not easily obtainable. But maybe you can invent a reasonable upper bound.

I'm sorry, I don't fully understand. Copy them out into a new slice? Multiple slices are not the problem, but the fact that we have to take up more address space in a sparse fashion.

And yeah, to do what you suggest I also need to know the length of each function, rather than the length of the whole segment.

randall77 · 2019-10-21T23:32:30Z

I'm sorry, I don't fully understand. Copy them out into a new slice? Multiple slices are not the problem, but the fact that we have to take up more address space in a sparse fashion.

s := funcPC(start)
b := funcPC(boot)
n := 1024 // a reasonable length bound for the functions
x := make([]byte, 2*n)
... copy n bytes from s to &x[0]
... copy n bytes from b to &x[n]
pass x to the kernel

You're going to have your Go binary mapped into the address space regardless, so you're not wasting any address space. Or, you're wasting the same space whether start and boot are adjacent or not.

Maybe you'd have to allocate x with mmap so you can make it executable?

hugelgupf · 2019-10-22T00:18:09Z

Oh, the copying out and/or the userspace allocation don't matter. We already move it around / copy it like that. But the bootloader also allocates physical ring 0 address space to lay out the trampoline, the kernel being loaded, and other data structures to be passed to the kernel, within the constraints of the memory available on the physical system.

For the trampoline, right now, we only have to allocate 1-2 pages, and we copy that memory from userspace to kernel space in that 1-2 pages.

But if it's not all contiguous, and the code is all relative to each other, then I have to have several scattered allocations across the physical address space, exactly /as much apart/ as the userspace layout of those functions, which will make the kernel harder to lay out.

If the userspace binary is 20M, and start and boot are at opposite ends, that'd probably lead to us allocating 20M of physical address space just to the trampoline. In practice that'll probably not matter for a while, unless we end up running on really low memory systems or unless our bootloader binaries get really big.

hugelgupf · 2019-10-22T00:18:59Z

Still, we'd have to know the length of each function. For the purposes of this, we can probably just assume that they're less than a page and just call the length 4096. (It just results in unnecessary code copied into ring 0, but alas...)

Rather than opening /proc/self/exe and finding a string within the binary, we use the Go runtime to tell us where the trampoline functions are located. (Must already be mapped in address space, since it's part of our own executable.) Based on discussion in golang/go#35055 Signed-off-by: Chris Koch <chrisko@google.com>

meme · 2020-01-05T01:36:07Z

Any movement on this one, or am I still stuck with the hack suggested by hugelgupf?

advancedwebdeveloper · 2020-10-18T16:59:43Z

I can confirm I am experiencing the same bug, while using gollvm

hugelgupf changed the title ~~Allow asking linter to *not* eliminate a "dead" function~~ Allow asking linker to *not* eliminate a "dead" function Oct 22, 2019

ianlancetaylor changed the title ~~Allow asking linker to *not* eliminate a "dead" function~~ cmd/link: allow asking linker to *not* eliminate a "dead" function Oct 22, 2019

dmitshur added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Oct 22, 2019

gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 13, 2022

seankhliao added this to the Unplanned milestone Aug 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cmd/link: allow asking linker to not eliminate a "dead" function #35055

cmd/link: allow asking linker to not eliminate a "dead" function #35055

hugelgupf commented Oct 21, 2019

randall77 commented Oct 21, 2019

hugelgupf commented Oct 21, 2019

hugelgupf commented Oct 21, 2019

randall77 commented Oct 21, 2019

hugelgupf commented Oct 21, 2019

randall77 commented Oct 21, 2019

hugelgupf commented Oct 21, 2019

randall77 commented Oct 21, 2019

hugelgupf commented Oct 21, 2019

randall77 commented Oct 21, 2019

hugelgupf commented Oct 22, 2019

hugelgupf commented Oct 22, 2019

meme commented Jan 5, 2020

advancedwebdeveloper commented Oct 18, 2020

cmd/link: allow asking linker to *not* eliminate a "dead" function #35055

cmd/link: allow asking linker to *not* eliminate a "dead" function #35055

Comments

hugelgupf commented Oct 21, 2019

randall77 commented Oct 21, 2019

hugelgupf commented Oct 21, 2019

hugelgupf commented Oct 21, 2019

randall77 commented Oct 21, 2019

hugelgupf commented Oct 21, 2019

randall77 commented Oct 21, 2019

hugelgupf commented Oct 21, 2019

randall77 commented Oct 21, 2019

hugelgupf commented Oct 21, 2019

randall77 commented Oct 21, 2019

hugelgupf commented Oct 22, 2019

hugelgupf commented Oct 22, 2019

meme commented Jan 5, 2020

advancedwebdeveloper commented Oct 18, 2020

cmd/link: allow asking linker to not eliminate a "dead" function #35055

cmd/link: allow asking linker to not eliminate a "dead" function #35055