-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/link, embed: store only one copy of an embedded []byte #52533
Comments
We can't do that for this example. When you use |
How about this? The first It's not a matter of doing it ourself in an Essentially this proposal is about reducing executable's file size in exchange for app start-up time - for cases where single files are duplicated. |
This doesn't sound like a very common problem, and seems it would be better served by the dependency exposing the embedded data, either there or as a separate package, so the data doesn't get out of sync. |
As a reply to Ian: perhaps the deduplication could happen when embedded as a string. There is another advantage with strings for embedding already: they can live in read-only memory. |
@mvdan I believe we already deduplicate when embedding strings. |
Just ran a quick test. Yeah, it does. Running Edit: Realized that I didn't test it with identical, but separate, files embedded in two separate packages, so I did that. Same result. |
I'm not sure if all the suggestions regarding deduplication already taking place for strings is a suggestion that the feature already exists if I choose to just embed as a string instead - OR if the suggestion is that it's already been done for strings and therefore for consistency, this feature should also be implemented for If it's the former, we have no control over what dependencies are doing internally. The proposal is about minimising executable file size. |
I believe that it's actually being used as evidence that it was deliberately not implemented for |
I have updated the proposal to treat As such, all |
It is not immediately clear to me that it is worth doing. You're basically trading off binary size for startup time and final memory usage. If you embed the same []byte twice, you now have to copy that data twice on startup, and then you've got 3 copies of it in memory, at least until/if the original string copy gets uncached/paged out/whatever. In the current scheme (embed it twice in the data section), then the binary is bigger but the startup time, and the ram use, is smaller. @pjebs It sounds like you are more concerned about binary size than the other two, but maybe that's not true for everyone. |
@randall77 That's correct. I am more concerned with file size. Perhaps it's not true for everyone. Also bear in mind that go is supporting WASM, where on the browser, file size download is probably the bigger concern. Also in the current scheme, it could be embedded multiple times > twice. If memory is the issue, under the proposal, the developer has control over clearing the []bytes if it's not truely necessary. In the current scheme, the developer has no control over minimising file size. |
That sounds like this should be a special optimization just for WASM, then. Or make it an option in the compiler with a different default for WASM output, perhaps. |
isn't wasm usually served compressed (eg gzip over http) in which case simple duplication is mostly eliminated? |
I've never heard anyone complaining about Go's start-up time, but I've heard plenty complaining about file sizes. |
As I understand the current proposal, there is on API change here, so this doesn't to go through the proposal process. The suggestion here is that when the same file is embedded more than once for a |
@ianlancetaylor The idea is for Go compiler to treat all |
Thanks, I think we are saying the same thing. |
I am using the
caire
package: https://github.com/esimov/caire/blob/master/process.go#L26It embeds a file called
facefinder
required for a common dependency calledpigo
.My application also uses the
pigo
package and independently needs to embed the same file.Currently the same file's data is being embedded twice. Perhaps as an optimisation, it can calculate the hashes of each embedded file (including inside all the dependencies), and store the data only once for duplicates.
This proposal is only for single file embeds and not directories.
The text was updated successfully, but these errors were encountered: