Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/objdump: freezes computer on large executable #24725

Closed
mariecurried opened this issue Apr 6, 2018 · 38 comments
Closed

cmd/objdump: freezes computer on large executable #24725

mariecurried opened this issue Apr 6, 2018 · 38 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Windows
Milestone

Comments

@mariecurried
Copy link

What version of Go are you using (go version)?

go version go1.10.1 windows/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

set GOARCH=amd64
set GOBIN=
set GOEXE=.exe
set GOHOSTARCH=amd64
set GOHOSTOS=windows
set GOOS=windows
set GORACE=
set GOROOT=C:\Go
set GOTMPDIR=
set GOTOOLDIR=C:\Go\pkg\tool\windows_amd64
set GCCGO=gccgo
set CC=gcc
set CXX=g++
set CGO_ENABLED=1
set CGO_CFLAGS=-g -O2
set CGO_CPPFLAGS=
set CGO_CXXFLAGS=-g -O2
set CGO_FFLAGS=-g -O2
set CGO_LDFLAGS=-g -O2
set PKG_CONFIG=pkg-config
set GOGCCFLAGS=-m64 -mthreads -fmessage-length=0 -fdebug-prefix-map=...

What did you do and what happened?

I'm learning Go and I decided to make a few little programs to get comfortable with the language. I compiled the following program and it generated an executable of ~800MB (already opened an issue about the size of the executable in #24724). Then, I used 'go tool objdump ... > out.s' on it, the memory (8GB) usage started climbing (reaching 100%) and then my computer froze.
https://play.golang.org/p/EK8zDu5vVQN

What did you expect to see?

I expected objdump to give me the assembly code without freezing the computer.

@ALTree ALTree added this to the Unplanned milestone Apr 6, 2018
@bradfitz bradfitz added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Apr 6, 2018
@bradfitz
Copy link
Contributor

bradfitz commented Apr 6, 2018

If you have time, can you see if this is a regression since Go 1.9.x? That is, was Go 1.9.x better in this regard?

@mariecurried
Copy link
Author

It isn't really an option, because last time my computer became unresponsive and I had to force shutdown, so I wouldn't like to be doing this a second time.

@mariecurried
Copy link
Author

In the meantime, I tested this on a Linux/amd64 machine and it works as intended, producing the assembly code quickly.
This leads me into thinking that there is some sort of memory leak in the objdump code in Windows.

@josharian
Copy link
Contributor

Interesting. Or maybe a bug in Windows linking.

If you're willing to risk/experiment a little, you could try again on windows with:

package main

const big = 10

func main() {
	x := [big]int{1}
	_ = x
}

And see what happens as you increase the size of big little by little.

@josharian
Copy link
Contributor

cc @alexbrainman

@alexbrainman
Copy link
Member

Then, I used 'go tool objdump ... > out.s' on it, the memory (8GB) usage started climbing (reaching 100%) and then my computer froze.
https://play.golang.org/p/EK8zDu5vVQN

Sure. I can reproduce this. My computer gets pretty slow, but I kill objdump with ctrl+c and then everything goes back to normal. While frozen my CPU is around 15%, but my both memory and disk usage go full. I have Windows 10 with 8G of memory and flimsy hard disk.

If you're willing to risk/experiment a little, you could try again on windows with:

The last successful run of this program was with big of 1000000. I used github.com/alexbrainman/time to measure the run, and it outputs:

real	0m1.074s
user	0m0.016s
sys	0m0.000s

I do not have time to debug this today or in the near future. But happy to try suggestions.

Thank you.

Alex

@mariecurried
Copy link
Author

I can go to ~2000000 (with memory usage at 80%) and it takes ~30s.

@josharian
Copy link
Contributor

Sounds like the next step might be to add a -memprofile flag to objdump so that we can find out where the allocs are coming from. I’m AFK for a bit, but can do so in a bit if no one else beats me to it.

@josharian
Copy link
Contributor

I have a lead on optimizing cmd/objdump generally. I'll mail some CLs soon, but I have a lot on my plate and not much Go time.

In the meantime, I noticed something windows-specific that perhaps Alex could take a look at.

$ GOOS=darwin GOARCH=amd64 go build -o big_darwin bigobj.go
$ GOOS=windows GOARCH=amd64 go build -o big_windows bigobj.go
$ go tool objdump big_windows | wc -l
  315135
$ go tool objdump big_darwin | wc -l
   72496
$ ls -l big_darwin big_windows 
-rwxr-xr-x  1 josh  staff  1403040 Apr 11 12:57 big_darwin
-rwxr-xr-x  1 josh  staff  1371648 Apr 11 12:57 big_windows

Though the darwin and windows executables are similar in size, the windows objdump output is 4x larger. Basically all of the difference appears to be output for a TEXT type.*(SB) symbol, which is present on windows and not on darwin. Seems like that symbol should be DATA or RODATA, not TEXT; maybe the windows linker is getting this wrong?

@alexbrainman
Copy link
Member

I have a lot on my plate and not much Go time.

Same here. I will try and debug this on weekend.

Alex

@gopherbot
Copy link

Change https://golang.org/cl/106697 mentions this issue: os: allocate buffer lazily in Expand

gopherbot pushed a commit that referenced this issue Apr 12, 2018
As an example of why this might happen,
consider this code from cmd/internal/objfile:

// Expand literal "$GOROOT" rewritten by obj.AbsFile()
filename = filepath.Clean(os.ExpandEnv(filename))

In this case, filename might not contain "$GOROOT",
in which case we can skip the buffer entirely.

name               old time/op    new time/op    delta
Expand/noop-8        46.7ns ± 1%    12.9ns ± 1%   -72.47%  (p=0.000 n=9+9)
Expand/multiple-8     139ns ± 1%     137ns ± 1%    -1.36%  (p=0.001 n=10+10)

The Expand/multiple improvement is probably noise.

This speeds up cmd/objdump detectably, if not much.
Using "benchcmd ObjdumpCompile go tool objdump `go tool -n compile`":

name            old time/op       new time/op       delta
ObjdumpCompile        9.35s ± 2%        9.07s ± 3%  -3.00%  (p=0.000 n=18+18)

Updates #24725

Change-Id: Id31ec6a9b8dfb3c0f1db58fe1f958e11c39e656c
Reviewed-on: https://go-review.googlesource.com/106697
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@gopherbot
Copy link

Change https://golang.org/cl/106798 mentions this issue: debug/gosym: intern LineTable strings

gopherbot pushed a commit that referenced this issue Apr 13, 2018
This cuts the allocated space while executing

go tool objdump -S `go tool -n compile`

by over 10%.

It also speeds it up slightly:

name              old time/op       new time/op       delta
ObjdumpSCompiler        9.03s ± 1%        8.88s ± 1%  -1.59%  (p=0.000 n=20+20)

Updates #24725

Change-Id: Ic6ef8e273ede589334ab6e07099ac2e5bdf990c9
Reviewed-on: https://go-review.googlesource.com/106798
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@josharian
Copy link
Contributor

Oops—forgot to tag this issue from https://go-review.googlesource.com/c/go/+/106978

@alexbrainman
Copy link
Member

@josharian I am not convinced that the problem is objdump or windows. I think the problem is how Go compiler generates the executable.

$ cat a.go
package main

const big = 1000000

func main() {
        x := [big]int{1}
        _ = x
}
$ go build -o a && go tool nm -size -sort=size a | head -n5
  478700     189114 r runtime.pclntab
  4b59a0      65744 D runtime.trace
  4b1ae0      16064 D runtime.semtable
  4ae460      13944 D runtime.mheap_
  4ac460       8192 D runtime.timers
go tool nm: signal: broken pipe
$ cat a.go
package main

const big = 10000000

func main() {
        x := [big]int{1}
        _ = x
}
$ go build -o a && go tool nm -size -sort=size a | head -n5
  477f00   80000000 R main.statictmp_0
 50c3b20     189122 r runtime.pclntab
 51009a0      65744 D runtime.trace
 50fcae0      16064 D runtime.semtable
 50f9460      13944 D runtime.mheap_
go tool nm: signal: broken pipe
$ cat a.go
package main

const big = 100000000

func main() {
        x := [big]int{1}
        _ = x
}
$ go build -o a && go tool nm -size -sort=size a | head -n5
  477f00  800000000 R main.statictmp_0
2ff68f20     189122 r runtime.pclntab
2ffa69a0      65744 D runtime.trace
2ffa2ae0      16064 D runtime.semtable
2ff9f460      13944 D runtime.mheap_
go tool nm: signal: broken pipe
$

Note how, as I make big const bigger by, we see growing in size main.statictmp_0 symbols appearing in executable. Is that expected? Any command will stop working, if we are trying to use the command to generate or parse file that is larger and larger.

The main.statictmp_0 symbols is generated by cmd/compile command - you can see it if you run go tool compile -o /dev/null -S a.go. I don't know much about cmd/compile. Perhaps you have some ideas why it is happening.

Alex

@josharian
Copy link
Contributor

That is #24724. But objdump only processes text symbols (types T and t), and main.statictmp_0 is a readonly data symbol (R). So it is not responsible for this particular issue.

I have some as-yet-mailed CLs to significantly cut objdump’s memory usage. They will help. But the reason windows is currently so much worse than Linux is because the type.* symbol is incorrectly marked as text; of that I am reasonably confident.

@josharian
Copy link
Contributor

Actually, if you would, can you try the same comparison I did, but running on windows: cross compile the code for both darwin and windows and objdump it, and compare the speeds?

@alexbrainman
Copy link
Member

Actually, if you would, can you try the same comparison I did, but running on windows: cross compile the code for both darwin and windows and objdump it, and compare the speeds?

You are correct, objdump is much slower on windows executable:

c:\Users\Alex\dev\src\issue\go\24725>set GOOS=darwin

c:\Users\Alex\dev\src\issue\go\24725>go build -o big_darwin a.go

c:\Users\Alex\dev\src\issue\go\24725>set GOOS=windows

c:\Users\Alex\dev\src\issue\go\24725>go build -o big_windows a.go

c:\Users\Alex\dev\src\issue\go\24725>go tool objdump big_darwin > NUL

c:\Users\Alex\dev\src\issue\go\24725>go tool objdump big_windows > NUL

c:\Users\Alex\dev\src\issue\go\24725>dir big_*
 Volume in drive C has no label.
 Volume Serial Number is 9012-A870

 Directory of c:\Users\Alex\dev\src\issue\go\24725

14/04/2018  03:53 PM        81,414,560 big_darwin
14/04/2018  03:54 PM        81,383,424 big_windows
               2 File(s)    162,797,984 bytes
               0 Dir(s)  398,134,525,952 bytes free

c:\Users\Alex\dev\src\issue\go\24725>

I will see if I can understand why.

Alex

@josharian
Copy link
Contributor

I will see if I can understand why.

I explained the difference between the two binaries here: #24725 (comment)

As for why that 4x size difference impacts memory usage so much, that's the partly fault of text/tabwriter. I'm working on that.

@alexbrainman
Copy link
Member

I explained the difference between the two binaries here: #24725 (comment)

I can see these symbols

44cb00     114848 T runtime.rodata
44cb00     114848 T type.*        
44cb00     114848 T runtime.types 

on windows. And they are not present on darwin. I will try to understand why linker is adding these symbols on windows.

Alex

@alexbrainman
Copy link
Member

Seems like that symbol should be DATA or RODATA, not TEXT; maybe the windows linker is getting this wrong?

Go linker can really put symbols into one of 2 sections: .text or .data

a:     file format pei-x86-64

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .text         04ceaef7  0000000000401000  0000000000401000  00000600  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, CODE, DATA
  1 .data         00002200  00000000050ec000  00000000050ec000  04ceb600  2**4
                  CONTENTS, ALLOC, LOAD, DATA
  2 .debug_abbrev 000001b5  000000000510c000  000000000510c000  04ced800  2**0
                  CONTENTS, READONLY, DEBUGGING
  3 .debug_line   0000c262  000000000510d000  000000000510d000  04ceda00  2**0
                  CONTENTS, READONLY, DEBUGGING
  4 .debug_frame  00009b2c  000000000511a000  000000000511a000  04cf9e00  2**0
                  CONTENTS, READONLY, DEBUGGING
  5 .debug_pubnames 000021ae  0000000005124000  0000000005124000  04d03a00  2**0
                  CONTENTS, READONLY, DEBUGGING
  6 .debug_pubtypes 0000746e  0000000005127000  0000000005127000  04d05c00  2**0
                  CONTENTS, READONLY, DEBUGGING
  7 .debug_gdb_scripts 00000027  000000000512f000  000000000512f000  04d0d200  2**0
                  CONTENTS, READONLY, DEBUGGING
  8 .debug_info   00037e3a  0000000005130000  0000000005130000  04d0d400  2**0
                  CONTENTS, READONLY, DEBUGGING
  9 .debug_loc    0003bfc2  0000000005168000  0000000005168000  04d45400  2**0
                  CONTENTS, READONLY, DEBUGGING
 10 .debug_ranges 0000e110  00000000051a4000  00000000051a4000  04d81400  2**0
                  CONTENTS, READONLY, DEBUGGING
 11 .idata        000003c4  00000000051b3000  00000000051b3000  04d8f600  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 12 .symtab       0000cfe7  00000000051b4000  00000000051b4000  04d8fa00  2**2
                  CONTENTS, READONLY

type.* is in .text section at this moment. Should we put it into .data ?

Alex

@gopherbot
Copy link

Change https://golang.org/cl/106979 mentions this issue: text/tabwriter: reduce allocations from tracking cells

@josharian
Copy link
Contributor

type.* is in .text section at this moment. Should we put it into .data ?

I believe so. .text is usually used for executable code; .data is used for, well, data. type.* is data.

@alexbrainman
Copy link
Member

I believe so. .text is usually used for executable code; .data is used for, well, data. type.* is data.

.text section stores code and read-only data on windows. .data stores data that changes during program execution - initialized and zero-initialized.

Why do you think objdump is slow because type.* symbol is in .text section? I still did not have time to actually see objdump performance profile.

Alex

gopherbot pushed a commit that referenced this issue Apr 14, 2018
The tabwriter tracks cells on a line-by-line basis.
This can be memory-hungry when working with large input.

This change adds two optimizations.

First, when there's an existing cell slice for a line,
don't overwrite it by appending.
This helps when re-using a Writer,
or when the output is broken into groups,
e.g. by a blank line.
We now re-use that existing cell slice.

Second, we predict that the number of cells in a line
will probably match those of the previous line,
since tabwriter is most often used to format tables.

This has a noticeable impact on cmd/objdump (#24725).
It reduces allocated space by about 55%.
It also speeds it up some.
Using "benchcmd -n 10 Objdump go tool objdump `which go`":

name            old time/op       new time/op       delta
ObjdumpCompile        9.03s ± 1%        8.51s ± 1%  -5.81%  (p=0.000 n=10+10)

It might also imaginably speed up gofmt on some
large machine-generated code.

name                old time/op    new time/op    delta
Table/1x10/new-8            2.89µs ± 1%    2.39µs ± 1%   -17.39%  (p=0.000 n=13+14)
Table/1x10/reuse-8          2.13µs ± 1%    1.29µs ± 2%   -39.58%  (p=0.000 n=14+15)
Table/1x1000/new-8           203µs ± 0%     147µs ± 1%   -27.45%  (p=0.000 n=13+14)
Table/1x1000/reuse-8         194µs ± 1%     113µs ± 2%   -42.01%  (p=0.000 n=14+15)
Table/1x100000/new-8        33.1ms ± 1%    27.5ms ± 2%   -17.08%  (p=0.000 n=15+15)
Table/1x100000/reuse-8      22.0ms ± 3%    11.8ms ± 1%   -46.23%  (p=0.000 n=14+12)
Table/10x10/new-8           8.51µs ± 0%    6.52µs ± 1%   -23.48%  (p=0.000 n=13+15)
Table/10x10/reuse-8         7.41µs ± 0%    4.59µs ± 3%   -38.03%  (p=0.000 n=14+15)
Table/10x1000/new-8          749µs ± 0%     521µs ± 1%   -30.39%  (p=0.000 n=12+15)
Table/10x1000/reuse-8        732µs ± 1%     448µs ± 2%   -38.79%  (p=0.000 n=15+14)
Table/10x100000/new-8        102ms ± 2%      74ms ± 2%   -28.05%  (p=0.000 n=14+15)
Table/10x100000/reuse-8     96.2ms ± 4%    55.4ms ± 3%   -42.36%  (p=0.000 n=15+15)
Table/100x10/new-8          50.3µs ± 1%    43.3µs ± 1%   -13.87%  (p=0.000 n=14+15)
Table/100x10/reuse-8        47.6µs ± 1%    36.1µs ± 1%   -24.09%  (p=0.000 n=14+14)
Table/100x1000/new-8        5.17ms ± 1%    4.11ms ± 1%   -20.40%  (p=0.000 n=14+13)
Table/100x1000/reuse-8      5.00ms ± 1%    3.73ms ± 1%   -25.46%  (p=0.000 n=14+14)
Table/100x100000/new-8       654ms ± 2%     531ms ± 2%   -18.86%  (p=0.000 n=13+14)
Table/100x100000/reuse-8     709ms ± 1%     505ms ± 2%   -28.77%  (p=0.000 n=12+15)
Pyramid/10-8                4.22µs ± 1%    4.21µs ± 1%      ~     (p=0.067 n=14+14)
Pyramid/100-8                378µs ± 0%     378µs ± 0%    +0.17%  (p=0.022 n=13+13)
Pyramid/1000-8               133ms ± 3%     132ms ± 3%      ~     (p=0.148 n=15+15)
Ragged/10-8                 6.10µs ± 0%    5.16µs ± 0%   -15.38%  (p=0.000 n=14+15)
Ragged/100-8                54.5µs ± 0%    43.8µs ± 0%   -19.59%  (p=0.000 n=14+15)
Ragged/1000-8                532µs ± 0%     424µs ± 0%   -20.25%  (p=0.000 n=14+14)

name                old alloc/op   new alloc/op   delta
Table/1x10/new-8            1.76kB ± 0%    1.52kB ± 0%   -13.64%  (p=0.000 n=15+15)
Table/1x10/reuse-8            800B ± 0%        0B       -100.00%  (p=0.000 n=15+15)
Table/1x1000/new-8           131kB ± 0%      99kB ± 0%   -24.30%  (p=0.000 n=15+15)
Table/1x1000/reuse-8        80.0kB ± 0%     0.0kB ± 0%   -99.99%  (p=0.000 n=15+15)
Table/1x100000/new-8        23.1MB ± 0%    19.9MB ± 0%   -13.85%  (p=0.000 n=15+15)
Table/1x100000/reuse-8      8.30MB ± 0%    0.20MB ± 0%   -97.60%  (p=0.000 n=13+12)
Table/10x10/new-8           8.94kB ± 0%    5.06kB ± 0%   -43.47%  (p=0.000 n=15+15)
Table/10x10/reuse-8         7.52kB ± 0%    0.00kB       -100.00%  (p=0.000 n=15+15)
Table/10x1000/new-8          850kB ± 0%     387kB ± 0%   -54.50%  (p=0.000 n=13+15)
Table/10x1000/reuse-8        752kB ± 0%       0kB ± 0%   -99.98%  (p=0.000 n=13+15)
Table/10x100000/new-8       95.7MB ± 0%    49.3MB ± 0%   -48.50%  (p=0.000 n=14+15)
Table/10x100000/reuse-8     76.2MB ± 0%     2.5MB ± 0%   -96.77%  (p=0.000 n=13+15)
Table/100x10/new-8          66.3kB ± 0%    38.0kB ± 0%   -42.65%  (p=0.000 n=15+15)
Table/100x10/reuse-8        61.3kB ± 0%     0.0kB       -100.00%  (p=0.000 n=15+15)
Table/100x1000/new-8        6.69MB ± 0%    3.25MB ± 0%   -51.37%  (p=0.000 n=15+15)
Table/100x1000/reuse-8      6.13MB ± 0%    0.01MB ± 0%   -99.89%  (p=0.000 n=15+15)
Table/100x100000/new-8       684MB ± 0%     340MB ± 0%   -50.29%  (p=0.000 n=14+15)
Table/100x100000/reuse-8     648MB ± 0%     170MB ± 0%   -73.78%  (p=0.000 n=14+13)
Pyramid/10-8                4.40kB ± 0%    4.40kB ± 0%      ~     (all equal)
Pyramid/100-8                652kB ± 0%     652kB ± 0%      ~     (p=0.715 n=15+15)
Pyramid/1000-8              96.7MB ± 0%    96.7MB ± 0%      ~     (p=0.084 n=15+14)
Ragged/10-8                 5.17kB ± 0%    4.51kB ± 0%   -12.69%  (p=0.000 n=15+15)
Ragged/100-8                50.2kB ± 0%    41.1kB ± 0%   -18.04%  (p=0.000 n=15+15)
Ragged/1000-8                492kB ± 0%     401kB ± 0%   -18.61%  (p=0.000 n=15+15)

name                old allocs/op  new allocs/op  delta
Table/1x10/new-8              29.0 ± 0%      21.0 ± 0%   -27.59%  (p=0.000 n=15+15)
Table/1x10/reuse-8            20.0 ± 0%       0.0       -100.00%  (p=0.000 n=15+15)
Table/1x1000/new-8           2.02k ± 0%     1.02k ± 0%   -49.38%  (p=0.000 n=15+15)
Table/1x1000/reuse-8         2.00k ± 0%     0.00k       -100.00%  (p=0.000 n=15+15)
Table/1x100000/new-8          200k ± 0%      100k ± 0%   -49.98%  (p=0.000 n=15+15)
Table/1x100000/reuse-8        200k ± 0%        1k ± 0%   -99.50%  (p=0.000 n=14+15)
Table/10x10/new-8             66.0 ± 0%      31.0 ± 0%   -53.03%  (p=0.000 n=15+15)
Table/10x10/reuse-8           50.0 ± 0%       0.0       -100.00%  (p=0.000 n=15+15)
Table/10x1000/new-8          5.03k ± 0%     1.04k ± 0%   -79.36%  (p=0.000 n=15+15)
Table/10x1000/reuse-8        5.00k ± 0%     0.00k       -100.00%  (p=0.000 n=15+15)
Table/10x100000/new-8         500k ± 0%      100k ± 0%   -79.99%  (p=0.000 n=15+15)
Table/10x100000/reuse-8       500k ± 0%        5k ± 0%   -99.00%  (p=0.000 n=15+15)
Table/100x10/new-8             102 ± 0%        40 ± 0%   -60.78%  (p=0.000 n=15+15)
Table/100x10/reuse-8          80.0 ± 0%       0.0       -100.00%  (p=0.000 n=15+15)
Table/100x1000/new-8         8.04k ± 0%     1.05k ± 0%   -86.91%  (p=0.000 n=15+15)
Table/100x1000/reuse-8       8.00k ± 0%     0.00k ± 0%   -99.98%  (p=0.000 n=15+15)
Table/100x100000/new-8        800k ± 0%      100k ± 0%   -87.49%  (p=0.000 n=15+12)
Table/100x100000/reuse-8      800k ± 0%       50k ± 0%   -93.74%  (p=0.000 n=14+13)
Pyramid/10-8                  20.0 ± 0%      20.0 ± 0%      ~     (all equal)
Pyramid/100-8                 50.0 ± 0%      50.0 ± 0%      ~     (all equal)
Pyramid/1000-8                 109 ± 0%       109 ± 0%      ~     (all equal)
Ragged/10-8                   54.0 ± 0%      34.0 ± 0%   -37.04%  (p=0.000 n=15+15)
Ragged/100-8                   422 ± 0%       188 ± 0%   -55.45%  (p=0.000 n=15+15)
Ragged/1000-8                4.03k ± 0%     1.66k ± 0%   -58.80%  (p=0.000 n=15+15)

Change-Id: I0c0a392b02d5148a0a4b8ad4eaf98fa343980962
Reviewed-on: https://go-review.googlesource.com/106979
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@randall77
Copy link
Contributor

So windows doesn't have an rodata section?
That's where type.* goes on linux.

@alexbrainman
Copy link
Member

Basically all of the difference appears to be output for a TEXT type.*(SB) symbol, which is present on windows and not on darwin.

I can see it myself now. And there are of couple other symbols that are huge:

TEXT type.*(SB)
TEXT runtime.rodata(SB)
TEXT runtime.types(SB)

Obviously, they are not code, so objdump should not attempt disassembling them.

Seems like that symbol should be DATA or RODATA, not TEXT; maybe the windows linker is getting this wrong?

Like I said before, we only have .data and .text sections at the moment. If we put them into data, then they won't be read-only. So we would have create new section for these. But perhaps there is an alternative way to mark them as not code for objdump - I did not get to that part of the code yet.

So windows doesn't have an rodata section?
That's where type.* goes on linux.

No. Go executables never had read-only-data section. We never felt the need. Perhaps we should

Alex

@randall77
Copy link
Contributor

It wouldn't be the end of the world to just put those symbols in .data and call it a day.
Longer term it would be nice to introduce read-only data.

@alexbrainman
Copy link
Member

It wouldn't be the end of the world to just put those symbols in .data and call it a day.

Do you mean, for pe file writer in cmd/link to adjust both .text and .data sections to make .data section bigger to include read-only symbols? I am not sure that would be much easier than creating new read-only-data section between .text and .data sections.

Another alternative is to adjust cmd/internal/objfile/peFile.text function to return not complete .text section, but only the beginning up until runtime.etext symbol. That would be trivial enough to do. While we are there we could also adjust cmd/internal/objfile/peFile.symbols to use 'R' for symbols in .text section, if their address is less than address of runtime.etext. What do you think?

We would also need some test. Not sure about that, so suggestions are welcome.

Alex

@randall77
Copy link
Contributor

randall77 commented Apr 16, 2018

Do you mean, for pe file writer in cmd/link to adjust both .text and .data sections to make .data section bigger to include read-only symbols?

Right, put the readonly symbols in the .data section instead of the .text section.

Your suggestion sounds fine as well. (With the modfication that we use R for symbols greater than etext?) This assumes that we do correctly segregate code from readonly data in the .text section.

@gopherbot
Copy link

Change https://golang.org/cl/108595 mentions this issue: cmd/internal/objfile: stop counting text symbols at runtime.etext

@alexbrainman
Copy link
Member

@marigonzes and @josharian please try https://go-review.googlesource.com/#/c/go/+/108595

It helps a lot, but I can still see it is not as efficient as with ELF file reading. My CL does not make Go linker write smaller PE ".text" section, so there is some quite large overhead reading huge ".text" section generated by example from #24725 (comment) I don't see how we can remove that allocation, because debug/pe.Section.Data returns []byte. So the only way to overcome that problem is to change Go linker to stop writing non-text symbols into PE ".text" section.

@randall77 and @josharian let me know if my CL is good enough. If not, we don't want my CL submitted, and I should try and change the linker.

Thank you

Alex

@josharian
Copy link
Contributor

My two cents is that, from the perspective of this issue, your CL suffices. It brings windows objdump performance more or less in line with linux/darwin. (I haven't looked at the code, though, just its effect.)

From a long term perspective, I do think we should put readonly data in a readonly section. It looks like they exist. A bit of googling finds something like a PE spec, which contains this line:

.rdata , Read-only initialized data , IMAGE_SCN_CNT_INITIALIZED_DATA | IMAGE_SCN_MEM_READ

which seems about like what we're looking for.

@alexbrainman
Copy link
Member

From a long term perspective, I do think we should put readonly data in a readonly section. It looks like they exist.

Sure, it can be done. I am just worried it might affect / break more things. And I am not sure how long it will take me to make the change. But I will spent some time to see how complicated it gets.

Alex

@alexbrainman
Copy link
Member

I tried implementing separate (from .text symbols) read-only data PE section. And everything seems to work (with some problems here and there), except external linker.

When I build executable using external linker, and then run the app, the app crashes because runtime.checkASM returns false. When I print masks and shifts symbol addresses, I can see that they are not aligned to 16 bytes.

But I do not see how they get aligned. Where is the code that tells external linker to have them aligned? Mind you I don't see who align them with internal linker either. They are aligned with internal linker, but maybe that is just a fluke. How do I align them with external linker? What would be a good approach?

Maybe @ianlancetaylor you could help me here. Thank you.

Alex

@alexbrainman
Copy link
Member

@ianlancetaylor do you have any suggestions for #24725 (comment) ? I still hope I could implement read-only data PE section for go1.11.

Thank you.

Alex

@alexbrainman
Copy link
Member

@ianlancetaylor please reply to my #24725 (comment)

Thank you.

Alex

@ianlancetaylor
Copy link
Contributor

Sorry for the slow reply. I'm well behind on mail.

When using cmd/link, large variables like masks are aligned to the maximum alignment of the architecture, as set by the Maxalign field of ld.Arch initliazed in cmd/link/internal/*/asm.go. For amd64 this is 32, so all variables larger than 32 bytes will be aligned to a 32 byte boundary. The code for this is the symalign function in cmd/link/internal/ld/data.go. When using external linking these symbols should be properly aligned within each section, and the section's alignment should be set to the maximum alignment of the symbols placed in that section. The external linker is then expected to honor that alignment.

At least, that's how it works for ELF. When I look at the PE code, though, it looks like PE always sets the alignment of each output section to be 32 bytes, regardless of the input alignment requirements. I'm basing that on the use of IMAGE_SCN_ALIGN_32BYTES and on the lack of any reference to sect.Align.

So my first guess is that you aren't setting IMAGE_SCN_ALIGN_32BYTES in the new section that you are creating.

@alexbrainman
Copy link
Member

Sorry for the slow reply. I'm well behind on mail.

Same. :-)

When using external linking these symbols should be properly aligned within each section

So I should see masks aligned to 32 inside whatever section of go.o it lives?

the section's alignment should be set to the maximum alignment of the symbols placed in that section.

I read this as section with masks symbol should be marked as IMAGE_SCN_ALIGN_32BYTES. Correct?

When I look at the PE code, though, it looks like PE always sets the alignment of each output section to be 32 bytes, regardless of the input alignment requirements. I'm basing that on the use of IMAGE_SCN_ALIGN_32BYTES and on the lack of any reference to sect.Align.

Yes, that is what I see too. All go.o sections are tagged with IMAGE_SCN_ALIGN_32BYTES.

So my first guess is that you aren't setting IMAGE_SCN_ALIGN_32BYTES in the new section that you are creating.

I am pretty sure I do mark my new session with IMAGE_SCN_ALIGN_32BYTES. But I will check my code and post my findings here when I have time. Unless you have other ideas.

Thank you.

Alex

@gopherbot
Copy link

Change https://golang.org/cl/115975 mentions this issue: cmd/link: split pe .text section into .text and .rdata

@golang golang locked and limited conversation to collaborators Jun 9, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Windows
Projects
None yet
Development

No branches or pull requests

8 participants