Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/idna: recent change adds ~200KB of machine code to any program using x/net/http2 #18582

Closed
tsuna opened this issue Jan 9, 2017 · 6 comments
Labels
FrozenDueToAge NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Milestone

Comments

@tsuna
Copy link
Contributor

tsuna commented Jan 9, 2017

Change #34770 (golang/net@67957fd) adds a bunch of stuff to x/net/idna and also adds a couple dependencies on golang.org/x/text/secure/bidirule and golang.org/x/text/unicode/norm, which in turn are pulling in golang.org/x/text/transform and golang.org/x/text/unicode/bidi.

This totals around 20k SLOC (excluding tests etc), which translates into an additional 230749 bytes of machine code (for linux/386, and 172896 bytes when compiled with -ldflags="-s -w"). For linux/amd64 it's an extra 269317 bytes (197664 bytes when compiled with -ldflags="-s -w").

I'm all for the IDNA code to be complete and correct, and I understand that this stuff is more complicated than it seems, but that seems like a hefty tax to add to all the binaries that transitively depend on this code, many of which are backend binaries that don't actually need this stuff.

I'm not sure what's the best way to trim the fat here but @bradfitz suggested to file a bug to continue the discussion, so here we are.

cc @mpvl @nigeltao

@bradfitz bradfitz added the NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. label Jan 9, 2017
@bradfitz bradfitz added this to the Go1.9 milestone Jan 9, 2017
@bradfitz
Copy link
Contributor

bradfitz commented Jan 9, 2017

We'll at least figure this out before Go 1.9 which was slated to depend on this (#17268).

The IDNA used by Go 1.8 doesn't involve tables and probably only covers like 99.9% of the use cases, which is probably sufficient. @mpvl?

Maybe we need some sort of opt-in mechanism for the tables.

@mpvl
Copy link
Contributor

mpvl commented Feb 2, 2017

It will be hard to reduce the size of these tables. Go 1.8 includes some of these tables, probably adding about 60-80k. I estimate the the full implementation adds about 90k of data plus code.
So the size increase is somewhat surprising.

There are two options to reduce this

  1. See why the addition is larger than expected and if it can be fixed. If successful, it will still add about 100k (although further rather complicated optimizations could probably reduce it by another 30k).
  2. Make some of the accuracy optional. I should be possible to significantly improve on the current implementation by adding 30-50k, while making the rest optional. The improvement would mean handling most languages properly, rather than handling mostly English properly. :)

I don't think keeping the old implementation and the new on as an option is a good idea. The old implementation differs in hard-to-document ways. Also, it pulls in different tables, meaning that someone using the new implementation will get an additional penalty making it even more expensive to "do the right thing".

I don't have a good idea on how to make the additional accuracy optional, though. Probably something along the lines of having precise and imprecise Profiles in idna and then making the profile an option in Transport. The idna package would then be designed such that tables only associated with profiles that are not used are dropped by the linker.

@mpvl
Copy link
Contributor

mpvl commented Feb 2, 2017

Correction, I was assuming you were comparing to the current implementation in core. In 1.8, core http wraps idna and adds a few tables to make it more correct. Compared to this, the new implementation adds about 50k.

I ran some tests. Very roughly, the breakdown of bulky stuff used in the new implementation is:
tables in idna: 40k
norm: 124k
bidi(rule): 31k

So norm seems to be the culprit. The bad news is that norm is rather important. The good news is that norm is already used in 1.8 so that effectively this implementation adds only about 50k (it eliminates the width table).

There does seem to be one possibility: the idna tables already contain some of the normalization information. It may be possible to interleave some of the norm information into these tables and then reimplement the normalization algorithm for this purpose. A very rough estimate is that this will save about 60-80k. This is a very complicated optimization, though. It also adds up to 60k of data to applications that are already using norm somewhere anyway. I'm skeptical this is worth it.

The bidirule could be made optional. As it only saves 15% of the total tables used here it seems not worth the trouble. But if people think so anyway it is worth considering.

@mpvl
Copy link
Contributor

mpvl commented Feb 8, 2017

Please take a look at https://go-review.googlesource.com/#/c/36512/.

I've now adopted the semantics that a zero Profile only does Punycode and that any validation is optional. A few pre-made profiles are provided for Lookup and Registration cases of the IDNA RFC. One could argue that at least a normalization check should at least always be in there, but that's also the largest table. If validation is not used, tables are not linked in.

The default ToUnicode and ToASCII are now wrappers for idna.Punycode.{ToUnicode|ToASCII}. A bit odd, but it is justifiable to take the zero profile as the default. This means the interpretation of these functions is virtually left unchanged with this elaborate update while still providing all the UTS 46 goodness.

@gopherbot
Copy link

CL https://golang.org/cl/36512 mentions this issue.

gopherbot pushed a commit to golang/text that referenced this issue Feb 9, 2017
ToASCII and ToUnicode now closely mimic the current
definitions in x/net/idna: doing a bare Punycode conversion.
This may be odd, but preserves backwards compatibility.
The changes also ensure that tables used by unused
validations are dropped.

The new semantics is that the zero Profile does a bare
Punycode transformation. The user can define new
Profiles with additional restrictions. Two high-level
options are provided to mimic the Registration and
Lookup use cases defined in the RFC.

validateAndMap is identical to the code that was
removed from process.

Updates golang/go#18567
Updates golang/go#18582

Change-Id: If9e1a1037f968758c1b3f150860773770d032f4e
Reviewed-on: https://go-review.googlesource.com/36512
Run-TryBot: Marcel van Lohuizen <mpvl@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@gopherbot
Copy link

CL https://golang.org/cl/37050 mentions this issue.

@golang golang locked and limited conversation to collaborators Mar 24, 2018
@rsc rsc unassigned mpvl Jun 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Projects
None yet
Development

No branches or pull requests

4 participants