-
Notifications
You must be signed in to change notification settings - Fork 18k
cmd/link: don't put StringHeader into go.string.* #7384
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Labels
Milestone
Comments
you have 235886 strings, total string size (not including overhead) is 2257223. Then the bundledDictionary slice has 235886 entry, each entry contains a StringHeader, which is 16-byte on amd64, so you will need to add at least 235886 * 16 bytes of overhead (which is 3774176 bytes). so actually you bundledDictionary will use at least 7918487 bytes of space, far larger than the string data themselves. I suggest you concatenate the strings to make a bigger one, and split at runtime (or save the index of each string; or even better, use trie to store the strings, so that it doesn't need any initialization at runtime) PS: the reason why Go binary uses ~4MB more is that every string is actually stored as a separate StringHeader in the Go string table, so each strings adds further 16 bytes of overhead (3774176 bytes). Taking that into account, now the binary uses 11692663 bytes to store all the strings and the slice, and this number explains 11MB increase in binary size. I don't think we can do anything here. Don't use too many strings. Status changed to WorkingAsIntended. |
ok. That means the linker must keep track of which stringheader is referenced, seems doable but it will need big changes to go.string.* section. but perhaps we should first solve the 1.5GB RSS problem of 6g first. (also cmd/6g will generate a 33040637 bytes rw.6!) also, it turns out our go.string.* has fairly big space inefficiency problem. each Go string is stored like this in the binary in the go.string.* section: pointer *byte // points to the payload, always this address + sizeof(uintptr)*2, redundant len uintptr // not redundant, but the compiler could almost always inline this number. payload byte[len+1] // always has 0-termination. why?? padding byte[padlen] // pad to next uintptr-aligned for next StringHeader. which means that each (8*n+k) byte string will take (8-k) bytes more. (e.g. a 8 byte string will take 8 extra bytes of padding. Yes. you read this correctly.) The problem seems to be just don't store StringHeader and always either construct in runtime or (in this case) store in the temporary statictmp_0020. which means we can concatenate all strings in the go.string.* section, and this also enables substring reuse optimization (e.g. if there are both "a" and "ab" strings in a Go program, we only store "ab" in the string section) Too big a change for the current linker, let's consider this when cmd/link takes over. Labels changed: added release-go1.4, repo-main. Status changed to Accepted. |
I don't know if this also belong here: https://github.com/fiam/gounidecode It have big map in https://github.com/fiam/gounidecode/blob/master/unidecode/table.go: `var transliterations = map[rune]string` Over 750KB input file make output binary bigger by almost 5 MB. Fell free to remove this comment. Best regards, Dobrosław Żybort |
CL https://golang.org/cl/11698 mentions this issue. |
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
by tjarratt@pivotallabs.com:
The text was updated successfully, but these errors were encountered: