proposal: cmd/link: redesign format of intermediate object files #11123

mwhudson · 2015-06-09T05:14:19Z

https://golang.org/cl/10833 is a simplistic change that changes the object files consumed by the linker to have a symbol table rather that writing the symbol names out each time they are referenced and it improves all.bash times by 30s on one laptop. We should investigate further when 1.6 opens.

mwhudson · 2015-06-18T22:59:54Z

Let's change this to follow the proposal process. I've done some more hacking and have a branch (https://github.com/mwhudson/go/tree/symbol-table-2) that uses an ad-hoc format and makes linking godoc about 30% faster, reduces linker memory consumption by about 10% and has negligible effect on the compiler runtime or memory usage.

This seems like enough to me to make the effort of changing format worthwhile. An open question is whether to use a more established format like ELF or macho or PE (the formats Go has general support for; I only really understand ELF of these) or something more tailored to Go's needs. The main advantage of using a standard format that I can see would be being able to use standard tools to inspect them, which would be nice but not super compelling. Maybe I'm missing something else though.

josharian · 2015-06-18T23:08:49Z

I'm curious whether this is a consequence of the object file format or of the mechanically translated linker.

Do you see comparable speed-ups using cmd/newlink? (It might be a fair amount of work to make similar changes to cmd/newlink. I hate to ask you do that unnecessarily, so if you see another good way to answer the original question, by all means do that instead.)

mwhudson · 2015-06-18T23:19:35Z

I think it's really the format. I guess it would be possible to transplant my code into cmd/internal/goobj and write some more focused benchmarks.

I've been a bit guilty of making a whole bunch of changes and not measuring the impact of each one, but I'm pretty sure the biggest change is using a symbol table, which means many fewer calls to expandpkg and many fewer map lookups and consequent string hashing. It also makes the .a files quite a bit (20%?) smaller, which obviously reduces the IO required to read them...

Having a string block so you are converting []byte to string just the once and allocating all the strings/data read from the file as slices of one big string/data block certainly helps a bit too, and those are dependent on changing the format too.

josharian · 2015-06-18T23:42:14Z

I should be clear that in all of this, I will defer to @rsc, @ianlancetaylor, @minux, and others who know the linker well.

I think it's really the format. I guess it would be possible to transplant my code into cmd/internal/goobj and write some more focused benchmarks.

This might be a good idea. Given that there is lots of value in object file format stability, it makes sense to make one big change rather than many small ones. The easier it is to experiment and generate good numbers, the easier it will be to have confidene that we've settled on a good design.

It is probably also worth surveying to see whether there are other changes to make at the same time. For example, the current object file format doesn't support jump tables. And there are several issues that might be helped by object file format changes (or might not--I haven't thought it through very far): #7384, #7599, #6993, #10170 (and thus #6643).

It also makes the .a files quite a bit (20%?) smaller, which obviously reduces the IO required to read them...

I'm surprised that there was so much headroom available, since IIRC, the current design was intended to be optimized for I/O. I also seem to recall (but can't seem to find a reference or details now) that there was a goal to keep some important info (what?) at the beginning of the file, to enable peeking. Does the symbol table impact that? (I guess I need to figure out the details to answer that.)

Is it safe to assume that you have looked at the Go 1.3 Linker Overhaul doc and the discussion of the file format there?

Having a string block so you are converting []byte to string just the once and allocating all the strings/data read from the file as slices of one big string/data block certainly helps a bit too, and those are dependent on changing the format too.

Good to know. When I played with this a while ago, I got the sense that the many []byte/string conversions were just a limitation of the code base, not the file format, but I am definitely prepared to believe that I was wrong. :)

Thanks for pushing on this. My comments are questions here stem not from resistance but from wanting the next generation (if there is one) to be awesome.

mwhudson · 2015-06-19T00:57:55Z

On 19 June 2015 at 11:42, Josh Bleecher Snyder notifications@github.com
wrote:

I should be clear that in all of this, I will defer to @rsc
https://github.com/rsc, @ianlancetaylor
https://github.com/ianlancetaylor, @minux https://github.com/minux,
and others who know the linker well.

Me too :-)

I think it's really the format. I guess it would be possible to
transplant my code into cmd/internal/goobj and write some more focused
benchmarks.

This might be a good idea.

I tried and failed (that's what I get for hacking instead of thinking, I
guess). Will try again in a bit.

Given that there is lots of value in object file format stability, it
makes sense to make one big change rather than many small ones. The easier
it is to experiment and generate good numbers, the easier it will be to
have confidene that we've settled on a good design.

There is a tension here of course, it's easier to see the benefit of each
change you make if you can make them independently. I guess I (or whoever
does this work) can prepare a series of CLs and they can be landed in one
hit. Or one could declare unstable season on .a files for a short part of
the 1.6 cycle, or something.

It is probably also worth surveying to see whether there are other changes
to make at the same time. For example, the current object file format
doesn't support jump tables. And there are several issues that might be
helped by object file format changes (or might not--I haven't thought it
through very far): #7384 #7384, #7599
#7599, #6993
#6993, #10170
#10170 (and thus #6643
https://github.com/gol%20ang/go/i%20ssues/6643).

I guess. Those are all more about the content of the files and I've been
mostly thinking about the container format really...

It also makes the .a files quite a bit (20%?) smaller, which obviously
reduces the IO required to read them...

I'm surprised that there was so much headroom available,

Yes, I was surprised how much difference it makes. We make multiple
references to symbols a lot in the object files, it seems.

since IIRC, the current design was intended to be optimized for I/O. I
also seem to recall (but can't seem to find a reference or details now)
that there was a goal to keep some important info (what?) at the beginning
of the file, to enable peeking. Does the symbol table impact that? (I guess
I need to figure out the details to answer that.)

I don't know how my proposed changes affect your extremely vague concern,
I'm sorry :-)

Is it safe to assume that you have looked at the Go 1.3 Linker Overhaul
doc
https://docs.google.com/document/d/1xN-g6qjjWflecSP08LNgh2uFsKjWb-rR9KA11ip_DIE/
and the discussion of the file format there?

It was sort of in the back of my mind, but really, I was just trying to
make the linker faster! Staring at pprof etc. I found these comments in
objfile.go after I'd started:

// TODO(rsc): The file format is good for a first pass but needs work.
// - There are SymID in the object file that should really just be strings.
// - The actual symbol memory images are interlaced with the symbol
// metadata. They should be separated, to reduce the I/O required to
// load just the metadata.
// - The symbol references should be shortened, either with a symbol
// table or by using a simple backward index to an earlier mentioned symbol.

and they must have been on my mind too, because basically I've addressed
the latter two.

Having a string block so you are converting []byte to string just the
once and allocating all the strings/data read from the file as slices of
one big string/data block certainly helps a bit too, and those are
dependent on changing the format too.

Good to know. When I played with this a while ago, I got the sense that
the many []byte/string conversions were just a limitation of the code base,
not the file format, but I am definitely prepared to believe that I was
wrong. :)

Well, unless there is a way to say to Go "I know this []byte is mutable,
but I promise I'm not going to touch it, so don't bother copying the memory
when I convert it to a string", I don't see any other way to reduce them.

There are lots of ways you could speed things up if you are willing to give
up memory safety :-)

Thanks for pushing on this. My comments are questions here stem not from
resistance but from wanting the next generation (if there is one) to be
awesome.

Understood.

mwhudson · 2015-06-19T02:59:27Z

I made my goobj changes work (see previously mentioned branch):

mwhudson@glamdring:src$ go test cmd/internal/goobj -test.bench . 
PASS
BenchmarkLoadRuntime-4        30      38951883 ns/op
ok      cmd/internal/goobj  1.226s
mwhudson@glamdring:src$ go test cmd/internal/goobj -test.bench . 
PASS
BenchmarkLoadRuntime-4       100      14326216 ns/op
ok      cmd/internal/goobj  1.460s

That's 2.7x faster :-)

rsc · 2015-06-29T17:06:33Z

We're not ready to process proposals under the new proposal process yet, but once we are this would be a good one. I never intended the current object file format to last this long. There are many things that can be fixed up to make processing them faster, and I'd love to see this happen.

FWIW, I don't believe it makes snse to ELF or some other standard object file format. Shoehorning Go's needs into that object file will, I expect, cause more problems than it would bring benefits. But if you can make the case otherwise, I'm certainly willing to listen.

mwhudson · 2015-08-21T04:07:00Z

So now we're in very early 1.6, I feel I should ask: should I polish up my somewhat ad-hoc format for submission, or is there someone who wants to do some serious design? I'm not sure I'm the right person to think about how to represent jump tables in the .a files, for example.

crawshaw · 2015-08-21T12:24:49Z

/cc @griesemer

griesemer · 2015-08-23T05:28:15Z

Just FYI: I'm currently experimenting with a redesigned export data format for the export data (found in .a files) for more compact size, and more efficient reading. Whatever the new format, it should be very efficient to find that export data section since it's read by the compiler when processing an import declaration.

mwhudson · 2015-08-23T08:20:23Z

I was only thinking about the .o files here I guess. Hadn't thought about changing the .a part at all, although I guess it shouldn't be completely off the table.

griesemer · 2015-08-23T17:02:34Z

@mwhudson I was just mentioning this since .a files were also brought up before.

(And please ignore the close-reopening of the issue. I always press the wrong button in this UI.)

tiborvass · 2015-09-15T22:36:33Z

Why doesn't the ELF format suit Go's requirements well? Just curious.

minux · 2015-09-16T01:02:03Z

Go uses its own object fornat because it's much simpler than ELF and also means newer Go versions could completely redesign the object file format without being limited by ELF's feature set.

adg · 2015-09-25T05:05:12Z

@mwhudson are you still interested in writing a proposal for this?

mwhudson · 2015-09-27T20:16:38Z

I'm interested but I don't know that I'll get to it any time soon. Is it OK to leave it open for now?

adg · 2015-09-27T22:36:29Z

Sure

On 28 September 2015 at 06:17, Michael Hudson-Doyle <
notifications@github.com> wrote:

I'm interested but I don't know that I'll get to it any time soon. Is it
OK to leave it open for now?

—
Reply to this email directly or view it on GitHub
#11123 (comment).

rsc · 2015-10-24T01:30:10Z

Please keep this proposal about the object part of the file. The container format is a separate issue. I intend to replace both the inner .o structure and the outer .a structure with standard zip.

gopherbot · 2016-03-22T09:00:14Z

CL https://golang.org/cl/21020 mentions this issue.

crawshaw · 2016-03-28T14:07:41Z

Several piecemeal changes that have been proposed to the object format. Some of them are already submitted, one, cl/21020, is effectively blocked on this. @mwhudson do you want to redesign the object format from some new principle, or should we stick to small changes with individually measurable performance improvement?

rsc · 2016-03-28T16:38:53Z

I objected to CL 21020 not because of this issue but because it ignores and
is not clearly compatible with my proposal in
#14386.

On Mon, Mar 28, 2016 at 10:08 AM David Crawshaw notifications@github.com
wrote:

Several piecemeal changes that have been proposed to the object format.
Some of them are already submitted, one, cl/21020, is effectively blocked
on this. @mwhudson https://github.com/mwhudson do you want to redesign
the object format from some new principle, or should we stick to small
changes with individually measurable performance improvement?

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#11123 (comment)

crawshaw · 2016-03-28T16:43:01Z

I'm sorry, I made a typo which is adding to the confusion. I agree that CL 21020 should be on hold.

I meant CL 21026. It is compatible with your proposal, but Ian objected to piecemeal object format changes while this proposal for a redesign exists. I think CL 21026 is reasonable and would like to see it move forward, but how it fits with this should be sorted first.

mwhudson · 2016-03-28T22:36:28Z

I think most of the changes I had in mind have been done by now and there's not a lot of value in keeping this open, so I'm going to close it.

skohanim · 2016-03-29T09:17:19Z

Although this is closed now, @ianlancetaylor wanted me to also clear up the direction i'm taking with future changes to the format and this seems like the right place to do it.
Some directions i would like to explore are:

Split out a strings section (Michael's proposol includes this as well).
Define section boundries, it would be nice to do this with a standard format such as zip (uncompressed for starts).
Add some more sizing information in the spirit of the mentioned 21026.
Add some information that would make parsing in multiple goroutines possible, probably splitting some of the sections into chunks. The chunks would be read sequentially but parsed in parallel. I believe inserting symbols into the map is the only thing that would need to be done serially.
Add length prefixing to entire symbols so they can be discarded easier.
Explore if adding a type field to symbols and deduplicating the content part of the symbol name is worthwhile (Isn't too complex and gives a good space saving). As an example go.string."some string" and go.string.hdr."same string" would have a different type but the same string.

skohanim · 2016-03-29T09:33:47Z

@rsc would it work to have the object file kept uncompressed in the .a file? That way seeking could happen without extracting the whole file beforehand. Did you have any other incompatibilities in mind?

rsc · 2016-03-29T18:53:52Z

I need to look more carefully at the CL you prepared. I will try to find
time to do that in the next week or two.

On Tue, Mar 29, 2016 at 5:34 AM skohanim notifications@github.com wrote:

@rsc https://github.com/rsc would it work to have the object file kept
uncompressed in the .a file? That way seeking could happen without
extracting the whole file beforehand. Did you have any other
incompatibilities in mind?

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#11123 (comment)

skohanim · 2016-03-29T19:51:28Z

That would be great.

gopherbot · 2016-04-08T21:00:29Z

CL https://golang.org/cl/21752 mentions this issue.

skohanim · 2016-04-09T04:04:50Z

I've sent in CL https://golang.org/cl/21752 which splits up the object file sections into small independent units.
This would make it possible to parse an object file in parallel.
Does this sound like a good approach?

rsc · 2016-12-19T21:05:46Z

This is a good idea, but right now we don't have the bandwidth to help redesign the object format and support the resulting churn in the code base. We should revisit the object format at some point, but it won't be soon. Marking this declined for now, with the expectation that we may come back to this later.

ianlancetaylor added this to the Go1.6 milestone Jun 9, 2015

mwhudson changed the title ~~cmd/internal/obj, cmd/link: consider using symbol table in intermediate object files for go 1.6~~ proposal: redesign format of intermediate object files Jun 18, 2015

rsc modified the milestones: Go1.6Early, Go1.6 Jun 29, 2015

mikioh added the Proposal label Aug 13, 2015

griesemer closed this as completed Aug 23, 2015

griesemer reopened this Aug 23, 2015

rsc modified the milestones: Proposal, Go1.6Early Oct 24, 2015

rsc changed the title ~~proposal: redesign format of intermediate object files~~ proposal: cmd/link: redesign format of intermediate object files Oct 24, 2015

mwhudson mentioned this issue Feb 29, 2016

proposal: use zip format inside .a and .o files #14386

Closed

mwhudson closed this as completed Mar 28, 2016

mwhudson reopened this Mar 29, 2016

rsc added the Proposal-Declined label Dec 19, 2016

rsc closed this as completed Dec 19, 2016

golang locked and limited conversation to collaborators Dec 19, 2017

gopherbot added the FrozenDueToAge label Dec 19, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proposal: cmd/link: redesign format of intermediate object files #11123

proposal: cmd/link: redesign format of intermediate object files #11123

mwhudson commented Jun 9, 2015

mwhudson commented Jun 18, 2015

josharian commented Jun 18, 2015

mwhudson commented Jun 18, 2015

josharian commented Jun 18, 2015

mwhudson commented Jun 19, 2015

mwhudson commented Jun 19, 2015

rsc commented Jun 29, 2015

mwhudson commented Aug 21, 2015

crawshaw commented Aug 21, 2015

griesemer commented Aug 23, 2015

mwhudson commented Aug 23, 2015

griesemer commented Aug 23, 2015

tiborvass commented Sep 15, 2015

minux commented Sep 16, 2015 via email

adg commented Sep 25, 2015

mwhudson commented Sep 27, 2015

adg commented Sep 27, 2015

rsc commented Oct 24, 2015

gopherbot commented Mar 22, 2016

crawshaw commented Mar 28, 2016

rsc commented Mar 28, 2016

crawshaw commented Mar 28, 2016

mwhudson commented Mar 28, 2016

skohanim commented Mar 29, 2016

skohanim commented Mar 29, 2016

rsc commented Mar 29, 2016

skohanim commented Mar 29, 2016

gopherbot commented Apr 8, 2016

skohanim commented Apr 9, 2016

rsc commented Dec 19, 2016

proposal: cmd/link: redesign format of intermediate object files #11123

proposal: cmd/link: redesign format of intermediate object files #11123

Comments

mwhudson commented Jun 9, 2015

mwhudson commented Jun 18, 2015

josharian commented Jun 18, 2015

mwhudson commented Jun 18, 2015

josharian commented Jun 18, 2015

mwhudson commented Jun 19, 2015

mwhudson commented Jun 19, 2015

rsc commented Jun 29, 2015

mwhudson commented Aug 21, 2015

crawshaw commented Aug 21, 2015

griesemer commented Aug 23, 2015

mwhudson commented Aug 23, 2015

griesemer commented Aug 23, 2015

tiborvass commented Sep 15, 2015

minux commented Sep 16, 2015 via email

adg commented Sep 25, 2015

mwhudson commented Sep 27, 2015

adg commented Sep 27, 2015

rsc commented Oct 24, 2015

gopherbot commented Mar 22, 2016

crawshaw commented Mar 28, 2016

rsc commented Mar 28, 2016

crawshaw commented Mar 28, 2016

mwhudson commented Mar 28, 2016

skohanim commented Mar 29, 2016

skohanim commented Mar 29, 2016

rsc commented Mar 29, 2016

skohanim commented Mar 29, 2016

gopherbot commented Apr 8, 2016

skohanim commented Apr 9, 2016

rsc commented Dec 19, 2016