Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/link: allow asking linker to *not* eliminate a "dead" function #35055

Open
hugelgupf opened this issue Oct 21, 2019 · 14 comments
Open

cmd/link: allow asking linker to *not* eliminate a "dead" function #35055

hugelgupf opened this issue Oct 21, 2019 · 14 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@hugelgupf
Copy link
Contributor

We have some assembly code in our Go binary that is never directly called by any Go code in userspace. (It's intended to be handed to the kernel to execute during a kexec in ring 0. Kernel hands execution to this piece of Go assembly, Go assembly hands it to a kernel we're booting.)

Normally, this code would be optimized out of the final binary by the linker. So far, we can get around that with this: https://github.com/u-root/u-root/blob/ac7ae682c648c26a7c90ac73a2a8b4597bdbeb78/pkg/multiboot/internal/trampoline/trampoline_linux_amd64.go#L39

Arguably a hack that will eventually be optimized away by the compiler.

Can we have a directive to not optimize away a piece of code? "//go:nodelete"? This is the assembly in question that shouldn't be optimized away: https://github.com/u-root/u-root/blob/ac7ae682c648c26a7c90ac73a2a8b4597bdbeb78/pkg/multiboot/internal/trampoline/trampoline_linux_amd64.s#L29

@randall77
Copy link
Contributor

How are you referencing start to give its address to the kernel? Groveling the symbol table or DWARF, perhaps?

You should probably use a reference to start that the linker can see, so you don't have to fake a cal to it.

We do this in the runtime a fair amount. See runtime/proc.go:funcPC. You would do (with your own copy of funcPC):

    pc := funcPC(start)
    callKernel(..., pc, ...)

@hugelgupf
Copy link
Contributor Author

At the moment, we use a hack: strings placed at the beginning and the end of the assembly file, so the whole segment can be given to the kernel: https://github.com/u-root/u-root/blob/ac7ae682c648c26a7c90ac73a2a8b4597bdbeb78/pkg/multiboot/internal/trampoline/trampoline_linux_amd64.go#L67

That's what the "begin" and "end" symbols in the .s file are.

This makes another assumption that I'm not sure will remain true: that the assembly stuff will all be contiguously compiled together.

given that, would you still recommend something like funcPC?

@hugelgupf
Copy link
Contributor Author

I should add -- this is because we don't only want start to be passed, but we need start, boot, farjump{32,64}, and gdt.

Two other strings are in there to mark places where some values need to be inserted. (Those could probably be replaced by funcPC easily.)

@randall77
Copy link
Contributor

This makes another assumption that I'm not sure will remain true: that the assembly stuff will all be contiguously compiled together.

Yes, that's not something you should rely on. Although we've not done anything like this yet, we've been contemplating feedback-directed optimization which would (among other things) reorder function layout.

given that, would you still recommend something like funcPC?

Yes, you could make that work. It doesn't require any hacks except for funcPC itself. That hack is likely to keep working, as the runtime uses it itself.

Note that a //go:nodelete directive won't work. The Go code that this comment would precede is just a declaration, not a definition. You'd really want such a directive in the assembly, which is where the function is defined. That could be done, ala NOSPLIT, but as I said this probably isn't the way to solve your problem.

@hugelgupf
Copy link
Contributor Author

This makes another assumption that I'm not sure will remain true: that the assembly stuff will all be contiguously compiled together.

Yes, that's not something you should rely on. Although we've not done anything like this yet, we've been contemplating feedback-directed optimization which would (among other things) reorder function layout.

When that happens, what do you recommend? What is a better way to solve this problem for us?

given that, would you still recommend something like funcPC?

Yes, you could make that work. It doesn't require any hacks except for funcPC itself. That hack is likely to keep working, as the runtime uses it itself.

I'll replace all the string hacks with funcPC. That should at least keep the linker from getting rid of all that stuff. That just leaves the other problem (contiguous addressing).

@randall77
Copy link
Contributor

When that happens, what do you recommend? What is a better way to solve this problem for us?

I'm not entirely sure what you are asking. Why do you need contiguous? You should pass the values of funcPC(start), funcPC(boot), funcPC(farjump64), etc. to whomever needs them. Then they don't need to be contiguous.

@hugelgupf
Copy link
Contributor Author

When that happens, what do you recommend? What is a better way to solve this problem for us?

I'm not entirely sure what you are asking. Why do you need contiguous? You should pass the values of funcPC(start), funcPC(boot), funcPC(farjump64), etc. to whomever needs them. Then they don't need to be contiguous.

Contiguous is nice, because it allows us to allocate one segment of physical memory to the functions. They are currently written to be relocatable, so it'd be a bit of a bummer if start is at the beginning of a 10MB binary, and boot at the end, because then we have to waste 10MB of space just on those functions since they have to be relative to each other. This means we have to support a trampoline that's potentially more than just a page of memory.

In other words, it's desirable to keep the byte slice returned from here (which contains the entire relocatable trampoline) small.

@randall77
Copy link
Contributor

If they are relocatable you can just copy them out into a new slice.
You'd need to know their length, though, and that's not easily obtainable. But maybe you can invent a reasonable upper bound.

@hugelgupf
Copy link
Contributor Author

If they are relocatable you can just copy them out into a new slice.
You'd need to know their length, though, and that's not easily obtainable. But maybe you can invent a reasonable upper bound.

I'm sorry, I don't fully understand. Copy them out into a new slice? Multiple slices are not the problem, but the fact that we have to take up more address space in a sparse fashion.

And yeah, to do what you suggest I also need to know the length of each function, rather than the length of the whole segment.

@randall77
Copy link
Contributor

I'm sorry, I don't fully understand. Copy them out into a new slice? Multiple slices are not the problem, but the fact that we have to take up more address space in a sparse fashion.

s := funcPC(start)
b := funcPC(boot)
n := 1024 // a reasonable length bound for the functions
x := make([]byte, 2*n)
... copy n bytes from s to &x[0]
... copy n bytes from b to &x[n]
pass x to the kernel

You're going to have your Go binary mapped into the address space regardless, so you're not wasting any address space. Or, you're wasting the same space whether start and boot are adjacent or not.

Maybe you'd have to allocate x with mmap so you can make it executable?

@hugelgupf
Copy link
Contributor Author

Oh, the copying out and/or the userspace allocation don't matter. We already move it around / copy it like that. But the bootloader also allocates physical ring 0 address space to lay out the trampoline, the kernel being loaded, and other data structures to be passed to the kernel, within the constraints of the memory available on the physical system.

For the trampoline, right now, we only have to allocate 1-2 pages, and we copy that memory from userspace to kernel space in that 1-2 pages.

But if it's not all contiguous, and the code is all relative to each other, then I have to have several scattered allocations across the physical address space, exactly /as much apart/ as the userspace layout of those functions, which will make the kernel harder to lay out.

If the userspace binary is 20M, and start and boot are at opposite ends, that'd probably lead to us allocating 20M of physical address space just to the trampoline. In practice that'll probably not matter for a while, unless we end up running on really low memory systems or unless our bootloader binaries get really big.

@hugelgupf
Copy link
Contributor Author

Still, we'd have to know the length of each function. For the purposes of this, we can probably just assume that they're less than a page and just call the length 4096. (It just results in unnecessary code copied into ring 0, but alas...)

@hugelgupf hugelgupf changed the title Allow asking linter to *not* eliminate a "dead" function Allow asking linker to *not* eliminate a "dead" function Oct 22, 2019
hugelgupf added a commit to hugelgupf/u-root that referenced this issue Oct 22, 2019
Rather than opening /proc/self/exe and finding a string within the
binary, we use the Go runtime to tell us where the trampoline functions
are located. (Must already be mapped in address space, since it's part
of our own executable.)

Based on discussion in golang/go#35055

Signed-off-by: Chris Koch <chrisko@google.com>
@ianlancetaylor ianlancetaylor changed the title Allow asking linker to *not* eliminate a "dead" function cmd/link: allow asking linker to *not* eliminate a "dead" function Oct 22, 2019
@dmitshur dmitshur added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Oct 22, 2019
hugelgupf added a commit to hugelgupf/u-root that referenced this issue Oct 23, 2019
Rather than opening /proc/self/exe and finding a string within the
binary, we use the Go runtime to tell us where the trampoline functions
are located. (Must already be mapped in address space, since it's part
of our own executable.)

Based on discussion in golang/go#35055

Signed-off-by: Chris Koch <chrisko@google.com>
hugelgupf added a commit to u-root/u-root that referenced this issue Oct 23, 2019
Rather than opening /proc/self/exe and finding a string within the
binary, we use the Go runtime to tell us where the trampoline functions
are located. (Must already be mapped in address space, since it's part
of our own executable.)

Based on discussion in golang/go#35055

Signed-off-by: Chris Koch <chrisko@google.com>
@meme
Copy link

meme commented Jan 5, 2020

Any movement on this one, or am I still stuck with the hack suggested by hugelgupf?

@advancedwebdeveloper
Copy link

I can confirm I am experiencing the same bug, while using gollvm

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 13, 2022
@seankhliao seankhliao added this to the Unplanned milestone Aug 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
Status: Triage Backlog
Development

No branches or pull requests

7 participants