Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/idna: backward compatibility issues with recent change #18567

Closed
weppos opened this issue Jan 8, 2017 · 24 comments
Closed

x/net/idna: backward compatibility issues with recent change #18567

weppos opened this issue Jan 8, 2017 · 24 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@weppos
Copy link

weppos commented Jan 8, 2017

What version of Go are you using (go version)?

go version go1.7.3 darwin/amd64

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/weppos/go"
GORACE=""
GOROOT="/usr/local/Cellar/go/1.7.3/libexec"
GOTOOLDIR="/usr/local/Cellar/go/1.7.3/libexec/pkg/tool/darwin_amd64"
CC="clang"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/rk/llwxtd9s6jn91p2ky13tr30c0000gn/T/go-build321981550=/tmp/go-build -gno-record-gcc-switches -fno-common"
CXX="clang++"
CGO_ENABLED="1"

What did you do?

It looks like golang/net@67957fd (change 34770) broke backward compatibility. Given the commit message says otherwise, I'm filing this as a bug.

There are at least 2 strings that are current causing failures:

  • the empty string
  • a string starting with a leading *

I detailed the issue at weppos/publicsuffix-go#56, along with some code to reproduce it.

Here's the simplest example:

package main

import (
	"fmt"
	"os"

	"golang.org/x/net/idna"
)

func main() {
	s, err := idna.ToASCII("")
	if err != nil {
		fmt.Printf("Error converting empty string: %v\n", err)
		os.Exit(1)
	}

	fmt.Println("Empty string works!")
	fmt.Printf("Result: `%v`\n", s)
}

and here's the log

➜  publicsuffix-go git:(master) ✗ cd ~/go/src/golang.org/x/net/idna; git co 69d4b8; cd -; go run idna-test.go
Note: checking out '69d4b8'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b <new-branch-name>

HEAD is now at 69d4b8a... http2: remove unnecessary TODO for trailer keys allocation
~/go/src/github.com/weppos/publicsuffix-go
Empty string works!
Result: ``

➜  publicsuffix-go git:(master) ✗ cd ~/go/src/golang.org/x/net/idna; git co 67957f; cd -; go run idna-test.go
Previous HEAD position was 69d4b8a... http2: remove unnecessary TODO for trailer keys allocation
HEAD is now at 67957fd... idna: use code generated by internal x/text package
~/go/src/github.com/weppos/publicsuffix-go
Error converting empty string: idna: invalid label ""
exit status 1

What did you expect to see?

I'd expect the following string conversion:

String "" converts into "" with no errors
String "*.uk" converts into "*uk" with no errors

What did you see instead?

The empty string fails with error idna: invalid label "".
The string "*.uk" fails with error idna: disallowed rune U+002E

/cc @nigeltao

@titanous titanous changed the title Recent changes in net/idna introduced backward compatibility issues x/net/idna: backward compatibility issues with recent change Jan 9, 2017
@titanous titanous added this to the Unreleased milestone Jan 9, 2017
@titanous titanous added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jan 9, 2017
@bradfitz
Copy link
Contributor

bradfitz commented Jan 9, 2017

/cc @mpvl

@tabacco
Copy link

tabacco commented Jan 9, 2017

Additionally, this change causes ToASCII() to lowercase domains where it didn't before.

Given:

domain, err := idna.ToASCII("Example")
fmt.Println(domain)

Before:
Example

After:
example

This isn't necessarily a bad thing, but was an unexpected behavioral change.

@mikioh
Copy link
Contributor

mikioh commented Jan 12, 2017

Looks like RFC 4343 states that the case folding for internationalized labels is not part of DNS, and PRECIS as defined in RFC 7564 uses only the keywords "RECOMMENDED" and "NOT RECOMMENDED" in the sections describing the case mapping/folding.

I'm not sure whether the change is intentional, but feel like the clarification on ToASCII and ToUnicode is nice to avoid unnecessary confusion; for example, idna.ToASCII doesn't do case preservation and the returned sequence of labels is not suitable for DNS messages and sources (such as master files or cache for authoritative/recursive DNS servers) because of the violation of RFC 4343.

@nigeltao
Copy link
Contributor

@mpvl is on paternity leave, and I don't have many spare cycles at the moment.

Perhaps the best course of action is to roll back https://go-review.googlesource.com/34770 (@bradfitz, others, WDYT?), and when @mpvl has some free time again (ha!), start a conversation on golang-dev about re-rolling it forward?

@nigeltao
Copy link
Contributor

See also #18582 "x/net/idna: recent change adds ~200KB of machine code to any program using x/net/http2".

@tsuna
Copy link
Contributor

tsuna commented Jan 13, 2017

+1 for rolling back for now – although good luck waiting until @mpvl has "free time again", haha, I went through this recently and I can tell you that you can wait a long time :P.

@nigeltao
Copy link
Contributor

I mailed out a "git revert" CL at https://go-review.googlesource.com/#/c/35270/

@gopherbot
Copy link

CL https://golang.org/cl/35270 mentions this issue.

gopherbot pushed a commit to golang/net that referenced this issue Jan 14, 2017
This reverts commit 67957fd.

Updates golang/go#18567

Change-Id: I4a9da509eb95949d2e3ab08763274abf6706f6f8
Reviewed-on: https://go-review.googlesource.com/35270
Run-TryBot: Nigel Tao <nigeltao@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Benoit Sigoure <tsunanet@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@gopherbot
Copy link

CL https://golang.org/cl/35370 mentions this issue.

gopherbot pushed a commit to golang/net that referenced this issue Jan 18, 2017
This reverts commit 67957fd.

Updates golang/go#18567.

Change-Id: If4e07f09c9460fda4150201a11a8a3844067b824
Reviewed-on: https://go-review.googlesource.com/35370
Run-TryBot: Mikio Hara <mikioh.mikioh@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@mpvl
Copy link
Contributor

mpvl commented Feb 2, 2017

Picking this up from my absence.

The ToASCII documentation states that it will return an error when it encounters one (i.e. if the input is not valid IDNA label). *.uk is an invalid domain name by IDNA2003, IDNA2008, and UTS 46, so I would expect it to return an error. Am I missing something?

As for the empty string, labels are required to be length 1-63. Arguable, the empty string is a sequence of zero labels, so this requirement should not apply. If there is good arguments in favor I could see we should adopt this interpretation.

@weppos
Copy link
Author

weppos commented Feb 2, 2017

Thanks for the feedback @mpl. The main argument I had is that the code suddenly break, whereas the commit message was explicitly mentioning BC.

If BC is not guaranteed, then I think this should be explicit. I know for sure a few implementations that will break with this change (one of them is code I directly have to maintain), hence we'll have to plan accordingly.

The empty string is not a big deal, if that's not going to be valid, I can handle the case specifically. Although I would suggest to consider simply returning an empty string as it may be simplifying usage (specifically, avoid to wrap any single execution around an empty string check). I understand this is a kind of selfish request as a consumer, that may not fit into the POV of a provider.

As for *.uk, what I can tell is that in pretty much all the cases I've seen around (in Go or different languages), passing an ASCII string into an UnicodeToPunycode converter, would simply return the string itself (as it's already ASCII hence Punycode). Conversely, if the string contains any Unicode char, this is converted into Punycode.

It seems to me that that was also the original implementation (or the current) of the Go IDNA library. However, the change you make seems to be more inclined to consider the input as a domain name itself, rather than considering the chars that compose the string.

This may pose interesting questions. Let's suppose I want to convert *.faß.de to *.fass.de or *.网址 to *.xn--ses554g. Both are valid use cases for the Public Suffix, and any Unicode to Punycode library I've encountered so far is happy to perform the translation. If that's not a valid use case for this library, do you have any suggestion on the recommended approach? Again, the current code is perfectly happy to perform this translation.

@mpvl
Copy link
Contributor

mpvl commented Feb 2, 2017

The backwards compatibility applies to the API, not which strings are accepted. The old implementation violated the rules of the IDNA spec. The latest change basically fixed these bugs.

I understand what you're saying and see where the issue is coming from. There is a discrepancy in the old implementation between the documentation and the implementation. The package documentation states the packages implements the various IDNA RFCs (or at least is supposed to). The ToASCII comment makes clear it is operating on domain names, as does the package name.

However, the implementation merely split on dots and then applies a punycode conversion. So indeed it (almost) behaved like a simple punycode converter if you pass it a label. But if you follow the package documentation this is a bug.

So we can do two things here:

  1. Change the package documentation
  2. Expose the punycode algorithm under a different API within this or another package, and leave the semantics of the API as documented.

Unless there is strong objections, I think it should be 2. idna.ToASCII strongly suggests it implements the RFC spec mentioned before. If it is redefined to be a punycode converter, the actual idna API is likely going to be awkward and confusing.

As for *.faß.de being valid, I've been going with both what browers do and what http://unicode.org/cldr/utility/idna.jsp spews out. The latter, for example, seems unhappy with such input. Note however that the new idna package would still do the conversion for you as suggested (returning *.fass.de and *.xn--ses554g), it would just also report the input is erroneous. This is the behavior prescribed by UTS 46.

@nigeltao
Copy link
Contributor

nigeltao commented Feb 3, 2017

Can we back up a bit? I did the original x/net/idna implementation many years ago, but haven't followed the latest developments that closely.

The original implementation was purely algorithmic, with no tables. Can you give some examples of IDNA names that that algorithm gets 'wrong', and we now need tables to understand?

Is this one of those situations where a newer RFC deprecates an older one?

On #18582, you also mention comparing the code size increase relative to $THIS_THING instead of $THAT_THING. So there are actually two sets of tables ('old' and 'new')?? Again, I'm not familiar with the context here. Can you give some example inputs where the two implementations give different output?

@alextoombs
Copy link

I encountered panics following the merge of the now-reverted CL from within the library in some unit tests checking for edge cases. Prior to the revert, idna.ToUnicode("xn--") panics rather than returning an error, resulting from an index out of range here.

Couple of other examples that trigger panics in this example snippet below:

func TestToUnicode(t *testing.T) {
	testData := map[string]string{
		"xn--":     "",     // panics golang.org/x/net/idna
		"foo.xn--": "foo",  // panics golang.org/x/net/idna
		"xn--.foo": ".foo", // panics golang.org/x/net/idna
	}

	for input, expected := range testData {
		output, err := idna.ToUnicode(input)
		if err != nil {
			t.Errorf("Failed to convert %q to unicode: %s", input, err)
		}

		if output != expected {
			t.Errorf("Expected %q, got %q", expected, output)
		}
	}
}

It's fine that these are invalid; I just would prefer the library not panic when we can return an error back instead.

cc @cee-dub

@mpvl
Copy link
Contributor

mpvl commented Feb 3, 2017

@mikioh: the package has always claims to implement RFC 5890, RFC 5891, RFC 5892, RFC 5893 and RFC 5894, not 4343 or 7564. The latter two are related but different. For example, putting the strings in NFC is mentioned as a MUST. And indeed not doing so can lead to very unexpected results or even security issues.

@nigeltao: More context: I've not changed what the package claims to implement, I've merely implemented what it claimed to implement. Note that these RFCs prescribe not just punycode and the old implementation was quite incomplete. I assume, given the package name, that the intention was to really implement these RFCs and that the list of RFCs supported is not a documentation bug.
Assuming this is correct, there are two ways to implement this: add a bunch of transforms on top of the punycode. This is attempted by net/http in Go 1.8. This is very hard to get right, though, and also is less efficient than doing the alternative. The alternative is to implement UTS 46. Most implementations in browsers, for example, are based on this. It provides tables that removes the need for a lot of intricate code and is faster than applying a bunch of transforms in sequence.
UTS 46 is more permissive than then the RFCs but otherwise fully compliant in non-transitional mode.
A nice demo/playground to view the differences between IDNA2008, IDNA2003 and UTS46 can be found here: http://unicode.org/cldr/utility/idna.jsp.

@mpvl
Copy link
Contributor

mpvl commented Feb 3, 2017

@alextoombs that is simply a bug. See https://go-review.googlesource.com/#/c/36280/.
cc @cee-dub

It seems that such input is okay. With idna.VerifyDNSLength can enable stricter checking.

@gopherbot
Copy link

CL https://golang.org/cl/36280 mentions this issue.

@mpvl
Copy link
Contributor

mpvl commented Feb 3, 2017

I remain of the opinion that the new implementation, albeit a big change, fixes (many) bugs of the old one. It is unfortunate that people have relied on, IMHO, unintended and undocumented semantics.

Exposing the raw Punycode algorithm seems one solution. However, from what I gather what people want is not to process raw labels, but rather still domain names but with special characters that are normally illegal. So a nicer API may be to allow an option in the idna package to permit a set of otherwise disallowed runes, like "*". This will avoid the error in the case of "*". This way the usual error checking is retained while still permitting such cases as *.faß.de. Comments?

@mikioh
Copy link
Contributor

mikioh commented Feb 6, 2017

@mpvl,

Thanks for the explanation and references, especially https://tools.ietf.org/html/rfc5894#section-4.4. For the case preservation on ToASCII with LDH/A-labels, I think it's fine as it is, or with a brief summary of https://tools.ietf.org/html/rfc5894#section-4.4 to notify people who work on DNS infrastructure applications.

gopherbot pushed a commit to golang/text that referenced this issue Feb 7, 2017
The previous code hardwired the NonTransitional profile for
certain operations. This had the effect of erasing the
verifyDNSLength flag meaning labels were almost never checked
for proper length. Fixing this exposed various issues.

Also fixes panic for empty strings.

Updates golang/go#18567.

Change-Id: I54763d913fbd63a50d5f7a93d3cf36ef6270abfd
Reviewed-on: https://go-review.googlesource.com/36280
Run-TryBot: Marcel van Lohuizen <mpvl@golang.org>
Reviewed-by: Nigel Tao <nigeltao@golang.org>
@mpvl
Copy link
Contributor

mpvl commented Feb 7, 2017

@mikioh: I seem you suggest to preserve case on ToASCII.

Here is my proposal to accommodate that. In addition to the Lookup and Register cases of IDNA, we support a "Raw" or "Punycode" profile which preserves case and does minimal checking. The existing ToUnicode and ToASCII functions will use this profile. This may be a bit of an odd choice, but supports maximum compatibility and allows table growth to be kept at a minimum.

Users can use the Lookup or other profiles, or create their own, for different purposes.

@mikioh
Copy link
Contributor

mikioh commented Feb 8, 2017

@mpvl,

I seem you suggest to preserve case on ToASCII.

Nope, sorry I wasn't clear. My comment I think it's fine as it is is just for https://go-review.googlesource.com/34770; I wrote the comment while reading the CL. I agree that the package idna should focus on IDNA (and people who work on DNS infrastructure applications must be aware of the requirements for IDNA-aware applications are different from DNS servers, registrars and registries.)

@mpvl
Copy link
Contributor

mpvl commented Feb 8, 2017

@mikioh Ah okay. Please take a look at https://go-review.googlesource.com/#/c/36512/. (not sure why this issue wasn't updated).

I've now adopted the semantics that a zero Profile only does Punycode and that any validation is optional. A few pre-made profiles are provided for Lookup and Registration cases of the idna. One could argue that at least a normalization check should at least always be in there, but that's also the largest table. So this semantics also addresses #18582. If validation is not used, tables are not linked.

I think this is an acceptable API whilst having the highest backwards compatibility and still being within the spirit of IDNA. If one squints one's eyes a bit, this is what the RFC intended. :)

@gopherbot
Copy link

CL https://golang.org/cl/36512 mentions this issue.

gopherbot pushed a commit to golang/text that referenced this issue Feb 9, 2017
ToASCII and ToUnicode now closely mimic the current
definitions in x/net/idna: doing a bare Punycode conversion.
This may be odd, but preserves backwards compatibility.
The changes also ensure that tables used by unused
validations are dropped.

The new semantics is that the zero Profile does a bare
Punycode transformation. The user can define new
Profiles with additional restrictions. Two high-level
options are provided to mimic the Registration and
Lookup use cases defined in the RFC.

validateAndMap is identical to the code that was
removed from process.

Updates golang/go#18567
Updates golang/go#18582

Change-Id: If9e1a1037f968758c1b3f150860773770d032f4e
Reviewed-on: https://go-review.googlesource.com/36512
Run-TryBot: Marcel van Lohuizen <mpvl@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@gopherbot
Copy link

CL https://golang.org/cl/37050 mentions this issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

10 participants