New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/objdump: freezes computer on large executable #24725
Comments
If you have time, can you see if this is a regression since Go 1.9.x? That is, was Go 1.9.x better in this regard? |
It isn't really an option, because last time my computer became unresponsive and I had to force shutdown, so I wouldn't like to be doing this a second time. |
In the meantime, I tested this on a Linux/amd64 machine and it works as intended, producing the assembly code quickly. |
Interesting. Or maybe a bug in Windows linking. If you're willing to risk/experiment a little, you could try again on windows with: package main
const big = 10
func main() {
x := [big]int{1}
_ = x
} And see what happens as you increase the size of |
Sure. I can reproduce this. My computer gets pretty slow, but I kill objdump with ctrl+c and then everything goes back to normal. While frozen my CPU is around 15%, but my both memory and disk usage go full. I have Windows 10 with 8G of memory and flimsy hard disk.
The last successful run of this program was with big of 1000000. I used github.com/alexbrainman/time to measure the run, and it outputs:
I do not have time to debug this today or in the near future. But happy to try suggestions. Thank you. Alex |
I can go to ~2000000 (with memory usage at 80%) and it takes ~30s. |
Sounds like the next step might be to add a -memprofile flag to objdump so that we can find out where the allocs are coming from. I’m AFK for a bit, but can do so in a bit if no one else beats me to it. |
I have a lead on optimizing cmd/objdump generally. I'll mail some CLs soon, but I have a lot on my plate and not much Go time. In the meantime, I noticed something windows-specific that perhaps Alex could take a look at.
Though the darwin and windows executables are similar in size, the windows objdump output is 4x larger. Basically all of the difference appears to be output for a |
Same here. I will try and debug this on weekend. Alex |
Change https://golang.org/cl/106697 mentions this issue: |
As an example of why this might happen, consider this code from cmd/internal/objfile: // Expand literal "$GOROOT" rewritten by obj.AbsFile() filename = filepath.Clean(os.ExpandEnv(filename)) In this case, filename might not contain "$GOROOT", in which case we can skip the buffer entirely. name old time/op new time/op delta Expand/noop-8 46.7ns ± 1% 12.9ns ± 1% -72.47% (p=0.000 n=9+9) Expand/multiple-8 139ns ± 1% 137ns ± 1% -1.36% (p=0.001 n=10+10) The Expand/multiple improvement is probably noise. This speeds up cmd/objdump detectably, if not much. Using "benchcmd ObjdumpCompile go tool objdump `go tool -n compile`": name old time/op new time/op delta ObjdumpCompile 9.35s ± 2% 9.07s ± 3% -3.00% (p=0.000 n=18+18) Updates #24725 Change-Id: Id31ec6a9b8dfb3c0f1db58fe1f958e11c39e656c Reviewed-on: https://go-review.googlesource.com/106697 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Change https://golang.org/cl/106798 mentions this issue: |
This cuts the allocated space while executing go tool objdump -S `go tool -n compile` by over 10%. It also speeds it up slightly: name old time/op new time/op delta ObjdumpSCompiler 9.03s ± 1% 8.88s ± 1% -1.59% (p=0.000 n=20+20) Updates #24725 Change-Id: Ic6ef8e273ede589334ab6e07099ac2e5bdf990c9 Reviewed-on: https://go-review.googlesource.com/106798 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Oops—forgot to tag this issue from https://go-review.googlesource.com/c/go/+/106978 |
@josharian I am not convinced that the problem is objdump or windows. I think the problem is how Go compiler generates the executable.
Note how, as I make The Alex |
That is #24724. But objdump only processes text symbols (types T and t), and main.statictmp_0 is a readonly data symbol (R). So it is not responsible for this particular issue. I have some as-yet-mailed CLs to significantly cut objdump’s memory usage. They will help. But the reason windows is currently so much worse than Linux is because the |
Actually, if you would, can you try the same comparison I did, but running on windows: cross compile the code for both darwin and windows and objdump it, and compare the speeds? |
You are correct, objdump is much slower on windows executable:
I will see if I can understand why. Alex |
I explained the difference between the two binaries here: #24725 (comment) As for why that 4x size difference impacts memory usage so much, that's the partly fault of text/tabwriter. I'm working on that. |
I can see these symbols
on windows. And they are not present on darwin. I will try to understand why linker is adding these symbols on windows. Alex |
Go linker can really put symbols into one of 2 sections:
Alex |
Change https://golang.org/cl/106979 mentions this issue: |
I believe so. |
.text section stores code and read-only data on windows. .data stores data that changes during program execution - initialized and zero-initialized. Why do you think objdump is slow because Alex |
The tabwriter tracks cells on a line-by-line basis. This can be memory-hungry when working with large input. This change adds two optimizations. First, when there's an existing cell slice for a line, don't overwrite it by appending. This helps when re-using a Writer, or when the output is broken into groups, e.g. by a blank line. We now re-use that existing cell slice. Second, we predict that the number of cells in a line will probably match those of the previous line, since tabwriter is most often used to format tables. This has a noticeable impact on cmd/objdump (#24725). It reduces allocated space by about 55%. It also speeds it up some. Using "benchcmd -n 10 Objdump go tool objdump `which go`": name old time/op new time/op delta ObjdumpCompile 9.03s ± 1% 8.51s ± 1% -5.81% (p=0.000 n=10+10) It might also imaginably speed up gofmt on some large machine-generated code. name old time/op new time/op delta Table/1x10/new-8 2.89µs ± 1% 2.39µs ± 1% -17.39% (p=0.000 n=13+14) Table/1x10/reuse-8 2.13µs ± 1% 1.29µs ± 2% -39.58% (p=0.000 n=14+15) Table/1x1000/new-8 203µs ± 0% 147µs ± 1% -27.45% (p=0.000 n=13+14) Table/1x1000/reuse-8 194µs ± 1% 113µs ± 2% -42.01% (p=0.000 n=14+15) Table/1x100000/new-8 33.1ms ± 1% 27.5ms ± 2% -17.08% (p=0.000 n=15+15) Table/1x100000/reuse-8 22.0ms ± 3% 11.8ms ± 1% -46.23% (p=0.000 n=14+12) Table/10x10/new-8 8.51µs ± 0% 6.52µs ± 1% -23.48% (p=0.000 n=13+15) Table/10x10/reuse-8 7.41µs ± 0% 4.59µs ± 3% -38.03% (p=0.000 n=14+15) Table/10x1000/new-8 749µs ± 0% 521µs ± 1% -30.39% (p=0.000 n=12+15) Table/10x1000/reuse-8 732µs ± 1% 448µs ± 2% -38.79% (p=0.000 n=15+14) Table/10x100000/new-8 102ms ± 2% 74ms ± 2% -28.05% (p=0.000 n=14+15) Table/10x100000/reuse-8 96.2ms ± 4% 55.4ms ± 3% -42.36% (p=0.000 n=15+15) Table/100x10/new-8 50.3µs ± 1% 43.3µs ± 1% -13.87% (p=0.000 n=14+15) Table/100x10/reuse-8 47.6µs ± 1% 36.1µs ± 1% -24.09% (p=0.000 n=14+14) Table/100x1000/new-8 5.17ms ± 1% 4.11ms ± 1% -20.40% (p=0.000 n=14+13) Table/100x1000/reuse-8 5.00ms ± 1% 3.73ms ± 1% -25.46% (p=0.000 n=14+14) Table/100x100000/new-8 654ms ± 2% 531ms ± 2% -18.86% (p=0.000 n=13+14) Table/100x100000/reuse-8 709ms ± 1% 505ms ± 2% -28.77% (p=0.000 n=12+15) Pyramid/10-8 4.22µs ± 1% 4.21µs ± 1% ~ (p=0.067 n=14+14) Pyramid/100-8 378µs ± 0% 378µs ± 0% +0.17% (p=0.022 n=13+13) Pyramid/1000-8 133ms ± 3% 132ms ± 3% ~ (p=0.148 n=15+15) Ragged/10-8 6.10µs ± 0% 5.16µs ± 0% -15.38% (p=0.000 n=14+15) Ragged/100-8 54.5µs ± 0% 43.8µs ± 0% -19.59% (p=0.000 n=14+15) Ragged/1000-8 532µs ± 0% 424µs ± 0% -20.25% (p=0.000 n=14+14) name old alloc/op new alloc/op delta Table/1x10/new-8 1.76kB ± 0% 1.52kB ± 0% -13.64% (p=0.000 n=15+15) Table/1x10/reuse-8 800B ± 0% 0B -100.00% (p=0.000 n=15+15) Table/1x1000/new-8 131kB ± 0% 99kB ± 0% -24.30% (p=0.000 n=15+15) Table/1x1000/reuse-8 80.0kB ± 0% 0.0kB ± 0% -99.99% (p=0.000 n=15+15) Table/1x100000/new-8 23.1MB ± 0% 19.9MB ± 0% -13.85% (p=0.000 n=15+15) Table/1x100000/reuse-8 8.30MB ± 0% 0.20MB ± 0% -97.60% (p=0.000 n=13+12) Table/10x10/new-8 8.94kB ± 0% 5.06kB ± 0% -43.47% (p=0.000 n=15+15) Table/10x10/reuse-8 7.52kB ± 0% 0.00kB -100.00% (p=0.000 n=15+15) Table/10x1000/new-8 850kB ± 0% 387kB ± 0% -54.50% (p=0.000 n=13+15) Table/10x1000/reuse-8 752kB ± 0% 0kB ± 0% -99.98% (p=0.000 n=13+15) Table/10x100000/new-8 95.7MB ± 0% 49.3MB ± 0% -48.50% (p=0.000 n=14+15) Table/10x100000/reuse-8 76.2MB ± 0% 2.5MB ± 0% -96.77% (p=0.000 n=13+15) Table/100x10/new-8 66.3kB ± 0% 38.0kB ± 0% -42.65% (p=0.000 n=15+15) Table/100x10/reuse-8 61.3kB ± 0% 0.0kB -100.00% (p=0.000 n=15+15) Table/100x1000/new-8 6.69MB ± 0% 3.25MB ± 0% -51.37% (p=0.000 n=15+15) Table/100x1000/reuse-8 6.13MB ± 0% 0.01MB ± 0% -99.89% (p=0.000 n=15+15) Table/100x100000/new-8 684MB ± 0% 340MB ± 0% -50.29% (p=0.000 n=14+15) Table/100x100000/reuse-8 648MB ± 0% 170MB ± 0% -73.78% (p=0.000 n=14+13) Pyramid/10-8 4.40kB ± 0% 4.40kB ± 0% ~ (all equal) Pyramid/100-8 652kB ± 0% 652kB ± 0% ~ (p=0.715 n=15+15) Pyramid/1000-8 96.7MB ± 0% 96.7MB ± 0% ~ (p=0.084 n=15+14) Ragged/10-8 5.17kB ± 0% 4.51kB ± 0% -12.69% (p=0.000 n=15+15) Ragged/100-8 50.2kB ± 0% 41.1kB ± 0% -18.04% (p=0.000 n=15+15) Ragged/1000-8 492kB ± 0% 401kB ± 0% -18.61% (p=0.000 n=15+15) name old allocs/op new allocs/op delta Table/1x10/new-8 29.0 ± 0% 21.0 ± 0% -27.59% (p=0.000 n=15+15) Table/1x10/reuse-8 20.0 ± 0% 0.0 -100.00% (p=0.000 n=15+15) Table/1x1000/new-8 2.02k ± 0% 1.02k ± 0% -49.38% (p=0.000 n=15+15) Table/1x1000/reuse-8 2.00k ± 0% 0.00k -100.00% (p=0.000 n=15+15) Table/1x100000/new-8 200k ± 0% 100k ± 0% -49.98% (p=0.000 n=15+15) Table/1x100000/reuse-8 200k ± 0% 1k ± 0% -99.50% (p=0.000 n=14+15) Table/10x10/new-8 66.0 ± 0% 31.0 ± 0% -53.03% (p=0.000 n=15+15) Table/10x10/reuse-8 50.0 ± 0% 0.0 -100.00% (p=0.000 n=15+15) Table/10x1000/new-8 5.03k ± 0% 1.04k ± 0% -79.36% (p=0.000 n=15+15) Table/10x1000/reuse-8 5.00k ± 0% 0.00k -100.00% (p=0.000 n=15+15) Table/10x100000/new-8 500k ± 0% 100k ± 0% -79.99% (p=0.000 n=15+15) Table/10x100000/reuse-8 500k ± 0% 5k ± 0% -99.00% (p=0.000 n=15+15) Table/100x10/new-8 102 ± 0% 40 ± 0% -60.78% (p=0.000 n=15+15) Table/100x10/reuse-8 80.0 ± 0% 0.0 -100.00% (p=0.000 n=15+15) Table/100x1000/new-8 8.04k ± 0% 1.05k ± 0% -86.91% (p=0.000 n=15+15) Table/100x1000/reuse-8 8.00k ± 0% 0.00k ± 0% -99.98% (p=0.000 n=15+15) Table/100x100000/new-8 800k ± 0% 100k ± 0% -87.49% (p=0.000 n=15+12) Table/100x100000/reuse-8 800k ± 0% 50k ± 0% -93.74% (p=0.000 n=14+13) Pyramid/10-8 20.0 ± 0% 20.0 ± 0% ~ (all equal) Pyramid/100-8 50.0 ± 0% 50.0 ± 0% ~ (all equal) Pyramid/1000-8 109 ± 0% 109 ± 0% ~ (all equal) Ragged/10-8 54.0 ± 0% 34.0 ± 0% -37.04% (p=0.000 n=15+15) Ragged/100-8 422 ± 0% 188 ± 0% -55.45% (p=0.000 n=15+15) Ragged/1000-8 4.03k ± 0% 1.66k ± 0% -58.80% (p=0.000 n=15+15) Change-Id: I0c0a392b02d5148a0a4b8ad4eaf98fa343980962 Reviewed-on: https://go-review.googlesource.com/106979 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
So windows doesn't have an |
I can see it myself now. And there are of couple other symbols that are huge:
Obviously, they are not code, so objdump should not attempt disassembling them.
Like I said before, we only have .data and .text sections at the moment. If we put them into data, then they won't be read-only. So we would have create new section for these. But perhaps there is an alternative way to mark them as not code for objdump - I did not get to that part of the code yet.
No. Go executables never had read-only-data section. We never felt the need. Perhaps we should Alex |
It wouldn't be the end of the world to just put those symbols in |
Do you mean, for pe file writer in cmd/link to adjust both .text and .data sections to make .data section bigger to include read-only symbols? I am not sure that would be much easier than creating new read-only-data section between .text and .data sections. Another alternative is to adjust We would also need some test. Not sure about that, so suggestions are welcome. Alex |
Right, put the readonly symbols in the Your suggestion sounds fine as well. (With the modfication that we use R for symbols greater than etext?) This assumes that we do correctly segregate code from readonly data in the |
Change https://golang.org/cl/108595 mentions this issue: |
@marigonzes and @josharian please try https://go-review.googlesource.com/#/c/go/+/108595 It helps a lot, but I can still see it is not as efficient as with ELF file reading. My CL does not make Go linker write smaller PE ".text" section, so there is some quite large overhead reading huge ".text" section generated by example from #24725 (comment) I don't see how we can remove that allocation, because debug/pe.Section.Data returns []byte. So the only way to overcome that problem is to change Go linker to stop writing non-text symbols into PE ".text" section. @randall77 and @josharian let me know if my CL is good enough. If not, we don't want my CL submitted, and I should try and change the linker. Thank you Alex |
My two cents is that, from the perspective of this issue, your CL suffices. It brings windows objdump performance more or less in line with linux/darwin. (I haven't looked at the code, though, just its effect.) From a long term perspective, I do think we should put readonly data in a readonly section. It looks like they exist. A bit of googling finds something like a PE spec, which contains this line:
which seems about like what we're looking for. |
Sure, it can be done. I am just worried it might affect / break more things. And I am not sure how long it will take me to make the change. But I will spent some time to see how complicated it gets. Alex |
I tried implementing separate (from .text symbols) read-only data PE section. And everything seems to work (with some problems here and there), except external linker. When I build executable using external linker, and then run the app, the app crashes because runtime.checkASM returns false. When I print But I do not see how they get aligned. Where is the code that tells external linker to have them aligned? Mind you I don't see who align them with internal linker either. They are aligned with internal linker, but maybe that is just a fluke. How do I align them with external linker? What would be a good approach? Maybe @ianlancetaylor you could help me here. Thank you. Alex |
@ianlancetaylor do you have any suggestions for #24725 (comment) ? I still hope I could implement read-only data PE section for go1.11. Thank you. Alex |
@ianlancetaylor please reply to my #24725 (comment) Thank you. Alex |
Sorry for the slow reply. I'm well behind on mail. When using cmd/link, large variables like At least, that's how it works for ELF. When I look at the PE code, though, it looks like PE always sets the alignment of each output section to be 32 bytes, regardless of the input alignment requirements. I'm basing that on the use of So my first guess is that you aren't setting |
Same. :-)
So I should see
I read this as section with
Yes, that is what I see too. All go.o sections are tagged with
I am pretty sure I do mark my new session with Thank you. Alex |
Change https://golang.org/cl/115975 mentions this issue: |
What version of Go are you using (
go version
)?go version go1.10.1 windows/amd64
Does this issue reproduce with the latest release?
Yes.
What operating system and processor architecture are you using (
go env
)?set GOARCH=amd64
set GOBIN=
set GOEXE=.exe
set GOHOSTARCH=amd64
set GOHOSTOS=windows
set GOOS=windows
set GORACE=
set GOROOT=C:\Go
set GOTMPDIR=
set GOTOOLDIR=C:\Go\pkg\tool\windows_amd64
set GCCGO=gccgo
set CC=gcc
set CXX=g++
set CGO_ENABLED=1
set CGO_CFLAGS=-g -O2
set CGO_CPPFLAGS=
set CGO_CXXFLAGS=-g -O2
set CGO_FFLAGS=-g -O2
set CGO_LDFLAGS=-g -O2
set PKG_CONFIG=pkg-config
set GOGCCFLAGS=-m64 -mthreads -fmessage-length=0 -fdebug-prefix-map=...
What did you do and what happened?
I'm learning Go and I decided to make a few little programs to get comfortable with the language. I compiled the following program and it generated an executable of ~800MB (already opened an issue about the size of the executable in #24724). Then, I used 'go tool objdump ... > out.s' on it, the memory (8GB) usage started climbing (reaching 100%) and then my computer froze.
https://play.golang.org/p/EK8zDu5vVQN
What did you expect to see?
I expected objdump to give me the assembly code without freezing the computer.
The text was updated successfully, but these errors were encountered: