-
Notifications
You must be signed in to change notification settings - Fork 18k
x/net/idna: backward compatibility issues with recent change #18567
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
/cc @mpvl |
Additionally, this change causes ToASCII() to lowercase domains where it didn't before. Given: domain, err := idna.ToASCII("Example")
fmt.Println(domain) Before: After: This isn't necessarily a bad thing, but was an unexpected behavioral change. |
Looks like RFC 4343 states that the case folding for internationalized labels is not part of DNS, and PRECIS as defined in RFC 7564 uses only the keywords "RECOMMENDED" and "NOT RECOMMENDED" in the sections describing the case mapping/folding. I'm not sure whether the change is intentional, but feel like the clarification on ToASCII and ToUnicode is nice to avoid unnecessary confusion; for example, idna.ToASCII doesn't do case preservation and the returned sequence of labels is not suitable for DNS messages and sources (such as master files or cache for authoritative/recursive DNS servers) because of the violation of RFC 4343. |
@mpvl is on paternity leave, and I don't have many spare cycles at the moment. Perhaps the best course of action is to roll back https://go-review.googlesource.com/34770 (@bradfitz, others, WDYT?), and when @mpvl has some free time again (ha!), start a conversation on golang-dev about re-rolling it forward? |
See also #18582 "x/net/idna: recent change adds ~200KB of machine code to any program using x/net/http2". |
+1 for rolling back for now – although good luck waiting until @mpvl has "free time again", haha, I went through this recently and I can tell you that you can wait a long time :P. |
I mailed out a "git revert" CL at https://go-review.googlesource.com/#/c/35270/ |
CL https://golang.org/cl/35270 mentions this issue. |
This reverts commit 67957fd. Updates golang/go#18567 Change-Id: I4a9da509eb95949d2e3ab08763274abf6706f6f8 Reviewed-on: https://go-review.googlesource.com/35270 Run-TryBot: Nigel Tao <nigeltao@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Benoit Sigoure <tsunanet@gmail.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
CL https://golang.org/cl/35370 mentions this issue. |
This reverts commit 67957fd. Updates golang/go#18567. Change-Id: If4e07f09c9460fda4150201a11a8a3844067b824 Reviewed-on: https://go-review.googlesource.com/35370 Run-TryBot: Mikio Hara <mikioh.mikioh@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Picking this up from my absence. The ToASCII documentation states that it will return an error when it encounters one (i.e. if the input is not valid IDNA label). *.uk is an invalid domain name by IDNA2003, IDNA2008, and UTS 46, so I would expect it to return an error. Am I missing something? As for the empty string, labels are required to be length 1-63. Arguable, the empty string is a sequence of zero labels, so this requirement should not apply. If there is good arguments in favor I could see we should adopt this interpretation. |
Thanks for the feedback @mpl. The main argument I had is that the code suddenly break, whereas the commit message was explicitly mentioning BC. If BC is not guaranteed, then I think this should be explicit. I know for sure a few implementations that will break with this change (one of them is code I directly have to maintain), hence we'll have to plan accordingly. The empty string is not a big deal, if that's not going to be valid, I can handle the case specifically. Although I would suggest to consider simply returning an empty string as it may be simplifying usage (specifically, avoid to wrap any single execution around an empty string check). I understand this is a kind of selfish request as a consumer, that may not fit into the POV of a provider. As for *.uk, what I can tell is that in pretty much all the cases I've seen around (in Go or different languages), passing an ASCII string into an UnicodeToPunycode converter, would simply return the string itself (as it's already ASCII hence Punycode). Conversely, if the string contains any Unicode char, this is converted into Punycode. It seems to me that that was also the original implementation (or the current) of the Go IDNA library. However, the change you make seems to be more inclined to consider the input as a domain name itself, rather than considering the chars that compose the string. This may pose interesting questions. Let's suppose I want to convert |
The backwards compatibility applies to the API, not which strings are accepted. The old implementation violated the rules of the IDNA spec. The latest change basically fixed these bugs. I understand what you're saying and see where the issue is coming from. There is a discrepancy in the old implementation between the documentation and the implementation. The package documentation states the packages implements the various IDNA RFCs (or at least is supposed to). The ToASCII comment makes clear it is operating on domain names, as does the package name. However, the implementation merely split on dots and then applies a punycode conversion. So indeed it (almost) behaved like a simple punycode converter if you pass it a label. But if you follow the package documentation this is a bug. So we can do two things here:
Unless there is strong objections, I think it should be 2. idna.ToASCII strongly suggests it implements the RFC spec mentioned before. If it is redefined to be a punycode converter, the actual idna API is likely going to be awkward and confusing. As for *.faß.de being valid, I've been going with both what browers do and what http://unicode.org/cldr/utility/idna.jsp spews out. The latter, for example, seems unhappy with such input. Note however that the new idna package would still do the conversion for you as suggested (returning *.fass.de and *.xn--ses554g), it would just also report the input is erroneous. This is the behavior prescribed by UTS 46. |
Can we back up a bit? I did the original x/net/idna implementation many years ago, but haven't followed the latest developments that closely. The original implementation was purely algorithmic, with no tables. Can you give some examples of IDNA names that that algorithm gets 'wrong', and we now need tables to understand? Is this one of those situations where a newer RFC deprecates an older one? On #18582, you also mention comparing the code size increase relative to $THIS_THING instead of $THAT_THING. So there are actually two sets of tables ('old' and 'new')?? Again, I'm not familiar with the context here. Can you give some example inputs where the two implementations give different output? |
I encountered panics following the merge of the now-reverted CL from within the library in some unit tests checking for edge cases. Prior to the revert, Couple of other examples that trigger panics in this example snippet below:
It's fine that these are invalid; I just would prefer the library not panic when we can return an error back instead. cc @cee-dub |
@mikioh: the package has always claims to implement RFC 5890, RFC 5891, RFC 5892, RFC 5893 and RFC 5894, not 4343 or 7564. The latter two are related but different. For example, putting the strings in NFC is mentioned as a MUST. And indeed not doing so can lead to very unexpected results or even security issues. @nigeltao: More context: I've not changed what the package claims to implement, I've merely implemented what it claimed to implement. Note that these RFCs prescribe not just punycode and the old implementation was quite incomplete. I assume, given the package name, that the intention was to really implement these RFCs and that the list of RFCs supported is not a documentation bug. |
@alextoombs that is simply a bug. See https://go-review.googlesource.com/#/c/36280/. It seems that such input is okay. With idna.VerifyDNSLength can enable stricter checking. |
CL https://golang.org/cl/36280 mentions this issue. |
I remain of the opinion that the new implementation, albeit a big change, fixes (many) bugs of the old one. It is unfortunate that people have relied on, IMHO, unintended and undocumented semantics. Exposing the raw Punycode algorithm seems one solution. However, from what I gather what people want is not to process raw labels, but rather still domain names but with special characters that are normally illegal. So a nicer API may be to allow an option in the idna package to permit a set of otherwise disallowed runes, like |
Thanks for the explanation and references, especially https://tools.ietf.org/html/rfc5894#section-4.4. For the case preservation on ToASCII with LDH/A-labels, I think it's fine as it is, or with a brief summary of https://tools.ietf.org/html/rfc5894#section-4.4 to notify people who work on DNS infrastructure applications. |
The previous code hardwired the NonTransitional profile for certain operations. This had the effect of erasing the verifyDNSLength flag meaning labels were almost never checked for proper length. Fixing this exposed various issues. Also fixes panic for empty strings. Updates golang/go#18567. Change-Id: I54763d913fbd63a50d5f7a93d3cf36ef6270abfd Reviewed-on: https://go-review.googlesource.com/36280 Run-TryBot: Marcel van Lohuizen <mpvl@golang.org> Reviewed-by: Nigel Tao <nigeltao@golang.org>
@mikioh: I seem you suggest to preserve case on ToASCII. Here is my proposal to accommodate that. In addition to the Lookup and Register cases of IDNA, we support a "Raw" or "Punycode" profile which preserves case and does minimal checking. The existing ToUnicode and ToASCII functions will use this profile. This may be a bit of an odd choice, but supports maximum compatibility and allows table growth to be kept at a minimum. Users can use the Lookup or other profiles, or create their own, for different purposes. |
Nope, sorry I wasn't clear. My comment |
@mikioh Ah okay. Please take a look at https://go-review.googlesource.com/#/c/36512/. (not sure why this issue wasn't updated). I've now adopted the semantics that a zero Profile only does Punycode and that any validation is optional. A few pre-made profiles are provided for Lookup and Registration cases of the idna. One could argue that at least a normalization check should at least always be in there, but that's also the largest table. So this semantics also addresses #18582. If validation is not used, tables are not linked. I think this is an acceptable API whilst having the highest backwards compatibility and still being within the spirit of IDNA. If one squints one's eyes a bit, this is what the RFC intended. :) |
CL https://golang.org/cl/36512 mentions this issue. |
ToASCII and ToUnicode now closely mimic the current definitions in x/net/idna: doing a bare Punycode conversion. This may be odd, but preserves backwards compatibility. The changes also ensure that tables used by unused validations are dropped. The new semantics is that the zero Profile does a bare Punycode transformation. The user can define new Profiles with additional restrictions. Two high-level options are provided to mimic the Registration and Lookup use cases defined in the RFC. validateAndMap is identical to the code that was removed from process. Updates golang/go#18567 Updates golang/go#18582 Change-Id: If9e1a1037f968758c1b3f150860773770d032f4e Reviewed-on: https://go-review.googlesource.com/36512 Run-TryBot: Marcel van Lohuizen <mpvl@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
CL https://golang.org/cl/37050 mentions this issue. |
What version of Go are you using (
go version
)?What operating system and processor architecture are you using (
go env
)?What did you do?
It looks like golang/net@67957fd (change 34770) broke backward compatibility. Given the commit message says otherwise, I'm filing this as a bug.
There are at least 2 strings that are current causing failures:
*
I detailed the issue at weppos/publicsuffix-go#56, along with some code to reproduce it.
Here's the simplest example:
and here's the log
What did you expect to see?
I'd expect the following string conversion:
What did you see instead?
The empty string fails with error
idna: invalid label ""
.The string "*.uk" fails with error
idna: disallowed rune U+002E
/cc @nigeltao
The text was updated successfully, but these errors were encountered: