x/net/idna: backward compatibility issues with recent change #18567

weppos · 2017-01-08T11:59:41Z

What version of Go are you using (`go version`)?

go version go1.7.3 darwin/amd64

What operating system and processor architecture are you using (`go env`)?

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/weppos/go"
GORACE=""
GOROOT="/usr/local/Cellar/go/1.7.3/libexec"
GOTOOLDIR="/usr/local/Cellar/go/1.7.3/libexec/pkg/tool/darwin_amd64"
CC="clang"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/rk/llwxtd9s6jn91p2ky13tr30c0000gn/T/go-build321981550=/tmp/go-build -gno-record-gcc-switches -fno-common"
CXX="clang++"
CGO_ENABLED="1"

What did you do?

It looks like golang/net@67957fd (change 34770) broke backward compatibility. Given the commit message says otherwise, I'm filing this as a bug.

There are at least 2 strings that are current causing failures:

the empty string
a string starting with a leading *

I detailed the issue at weppos/publicsuffix-go#56, along with some code to reproduce it.

Here's the simplest example:

package main

import (
	"fmt"
	"os"

	"golang.org/x/net/idna"
)

func main() {
	s, err := idna.ToASCII("")
	if err != nil {
		fmt.Printf("Error converting empty string: %v\n", err)
		os.Exit(1)
	}

	fmt.Println("Empty string works!")
	fmt.Printf("Result: `%v`\n", s)
}

and here's the log

➜  publicsuffix-go git:(master) ✗ cd ~/go/src/golang.org/x/net/idna; git co 69d4b8; cd -; go run idna-test.go
Note: checking out '69d4b8'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b <new-branch-name>

HEAD is now at 69d4b8a... http2: remove unnecessary TODO for trailer keys allocation
~/go/src/github.com/weppos/publicsuffix-go
Empty string works!
Result: ``

➜  publicsuffix-go git:(master) ✗ cd ~/go/src/golang.org/x/net/idna; git co 67957f; cd -; go run idna-test.go
Previous HEAD position was 69d4b8a... http2: remove unnecessary TODO for trailer keys allocation
HEAD is now at 67957fd... idna: use code generated by internal x/text package
~/go/src/github.com/weppos/publicsuffix-go
Error converting empty string: idna: invalid label ""
exit status 1

What did you expect to see?

I'd expect the following string conversion:

String "" converts into "" with no errors
String "*.uk" converts into "*uk" with no errors

What did you see instead?

The empty string fails with error idna: invalid label "".
The string "*.uk" fails with error idna: disallowed rune U+002E

/cc @nigeltao

The text was updated successfully, but these errors were encountered:

bradfitz · 2017-01-09T16:36:28Z

/cc @mpvl

tabacco · 2017-01-09T19:00:17Z

Additionally, this change causes ToASCII() to lowercase domains where it didn't before.

Given:

domain, err := idna.ToASCII("Example")
fmt.Println(domain)

Before:
Example

After:
example

This isn't necessarily a bad thing, but was an unexpected behavioral change.

mikioh · 2017-01-12T00:38:59Z

Looks like RFC 4343 states that the case folding for internationalized labels is not part of DNS, and PRECIS as defined in RFC 7564 uses only the keywords "RECOMMENDED" and "NOT RECOMMENDED" in the sections describing the case mapping/folding.

I'm not sure whether the change is intentional, but feel like the clarification on ToASCII and ToUnicode is nice to avoid unnecessary confusion; for example, idna.ToASCII doesn't do case preservation and the returned sequence of labels is not suitable for DNS messages and sources (such as master files or cache for authoritative/recursive DNS servers) because of the violation of RFC 4343.

nigeltao · 2017-01-12T03:40:38Z

@mpvl is on paternity leave, and I don't have many spare cycles at the moment.

Perhaps the best course of action is to roll back https://go-review.googlesource.com/34770 (@bradfitz, others, WDYT?), and when @mpvl has some free time again (ha!), start a conversation on golang-dev about re-rolling it forward?

nigeltao · 2017-01-12T03:41:20Z

See also #18582 "x/net/idna: recent change adds ~200KB of machine code to any program using x/net/http2".

tsuna · 2017-01-13T08:05:51Z

+1 for rolling back for now – although good luck waiting until @mpvl has "free time again", haha, I went through this recently and I can tell you that you can wait a long time :P.

nigeltao · 2017-01-14T04:24:59Z

I mailed out a "git revert" CL at https://go-review.googlesource.com/#/c/35270/

gopherbot · 2017-01-14T04:25:40Z

CL https://golang.org/cl/35270 mentions this issue.

This reverts commit 67957fd. Updates golang/go#18567 Change-Id: I4a9da509eb95949d2e3ab08763274abf6706f6f8 Reviewed-on: https://go-review.googlesource.com/35270 Run-TryBot: Nigel Tao <nigeltao@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Benoit Sigoure <tsunanet@gmail.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>

gopherbot · 2017-01-18T00:30:56Z

CL https://golang.org/cl/35370 mentions this issue.

This reverts commit 67957fd. Updates golang/go#18567. Change-Id: If4e07f09c9460fda4150201a11a8a3844067b824 Reviewed-on: https://go-review.googlesource.com/35370 Run-TryBot: Mikio Hara <mikioh.mikioh@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>

mpvl · 2017-02-02T14:07:32Z

Picking this up from my absence.

The ToASCII documentation states that it will return an error when it encounters one (i.e. if the input is not valid IDNA label). *.uk is an invalid domain name by IDNA2003, IDNA2008, and UTS 46, so I would expect it to return an error. Am I missing something?

As for the empty string, labels are required to be length 1-63. Arguable, the empty string is a sequence of zero labels, so this requirement should not apply. If there is good arguments in favor I could see we should adopt this interpretation.

weppos · 2017-02-02T14:58:53Z

Thanks for the feedback @mpl. The main argument I had is that the code suddenly break, whereas the commit message was explicitly mentioning BC.

If BC is not guaranteed, then I think this should be explicit. I know for sure a few implementations that will break with this change (one of them is code I directly have to maintain), hence we'll have to plan accordingly.

The empty string is not a big deal, if that's not going to be valid, I can handle the case specifically. Although I would suggest to consider simply returning an empty string as it may be simplifying usage (specifically, avoid to wrap any single execution around an empty string check). I understand this is a kind of selfish request as a consumer, that may not fit into the POV of a provider.

As for *.uk, what I can tell is that in pretty much all the cases I've seen around (in Go or different languages), passing an ASCII string into an UnicodeToPunycode converter, would simply return the string itself (as it's already ASCII hence Punycode). Conversely, if the string contains any Unicode char, this is converted into Punycode.

It seems to me that that was also the original implementation (or the current) of the Go IDNA library. However, the change you make seems to be more inclined to consider the input as a domain name itself, rather than considering the chars that compose the string.

This may pose interesting questions. Let's suppose I want to convert *.faß.de to *.fass.de or *.网址 to *.xn--ses554g. Both are valid use cases for the Public Suffix, and any Unicode to Punycode library I've encountered so far is happy to perform the translation. If that's not a valid use case for this library, do you have any suggestion on the recommended approach? Again, the current code is perfectly happy to perform this translation.

mpvl · 2017-02-02T17:55:30Z

The backwards compatibility applies to the API, not which strings are accepted. The old implementation violated the rules of the IDNA spec. The latest change basically fixed these bugs.

I understand what you're saying and see where the issue is coming from. There is a discrepancy in the old implementation between the documentation and the implementation. The package documentation states the packages implements the various IDNA RFCs (or at least is supposed to). The ToASCII comment makes clear it is operating on domain names, as does the package name.

However, the implementation merely split on dots and then applies a punycode conversion. So indeed it (almost) behaved like a simple punycode converter if you pass it a label. But if you follow the package documentation this is a bug.

So we can do two things here:

Change the package documentation
Expose the punycode algorithm under a different API within this or another package, and leave the semantics of the API as documented.

Unless there is strong objections, I think it should be 2. idna.ToASCII strongly suggests it implements the RFC spec mentioned before. If it is redefined to be a punycode converter, the actual idna API is likely going to be awkward and confusing.

As for *.faß.de being valid, I've been going with both what browers do and what http://unicode.org/cldr/utility/idna.jsp spews out. The latter, for example, seems unhappy with such input. Note however that the new idna package would still do the conversion for you as suggested (returning *.fass.de and *.xn--ses554g), it would just also report the input is erroneous. This is the behavior prescribed by UTS 46.

nigeltao · 2017-02-03T00:50:42Z

Can we back up a bit? I did the original x/net/idna implementation many years ago, but haven't followed the latest developments that closely.

The original implementation was purely algorithmic, with no tables. Can you give some examples of IDNA names that that algorithm gets 'wrong', and we now need tables to understand?

Is this one of those situations where a newer RFC deprecates an older one?

On #18582, you also mention comparing the code size increase relative to $THIS_THING instead of $THAT_THING. So there are actually two sets of tables ('old' and 'new')?? Again, I'm not familiar with the context here. Can you give some example inputs where the two implementations give different output?

alextoombs · 2017-02-03T01:28:25Z

I encountered panics following the merge of the now-reverted CL from within the library in some unit tests checking for edge cases. Prior to the revert, idna.ToUnicode("xn--") panics rather than returning an error, resulting from an index out of range here.

Couple of other examples that trigger panics in this example snippet below:

func TestToUnicode(t *testing.T) {
	testData := map[string]string{
		"xn--":     "",     // panics golang.org/x/net/idna
		"foo.xn--": "foo",  // panics golang.org/x/net/idna
		"xn--.foo": ".foo", // panics golang.org/x/net/idna
	}

	for input, expected := range testData {
		output, err := idna.ToUnicode(input)
		if err != nil {
			t.Errorf("Failed to convert %q to unicode: %s", input, err)
		}

		if output != expected {
			t.Errorf("Expected %q, got %q", expected, output)
		}
	}
}

It's fine that these are invalid; I just would prefer the library not panic when we can return an error back instead.

cc @cee-dub

mpvl · 2017-02-03T09:51:54Z

@mikioh: the package has always claims to implement RFC 5890, RFC 5891, RFC 5892, RFC 5893 and RFC 5894, not 4343 or 7564. The latter two are related but different. For example, putting the strings in NFC is mentioned as a MUST. And indeed not doing so can lead to very unexpected results or even security issues.

@nigeltao: More context: I've not changed what the package claims to implement, I've merely implemented what it claimed to implement. Note that these RFCs prescribe not just punycode and the old implementation was quite incomplete. I assume, given the package name, that the intention was to really implement these RFCs and that the list of RFCs supported is not a documentation bug.
Assuming this is correct, there are two ways to implement this: add a bunch of transforms on top of the punycode. This is attempted by net/http in Go 1.8. This is very hard to get right, though, and also is less efficient than doing the alternative. The alternative is to implement UTS 46. Most implementations in browsers, for example, are based on this. It provides tables that removes the need for a lot of intricate code and is faster than applying a bunch of transforms in sequence.
UTS 46 is more permissive than then the RFCs but otherwise fully compliant in non-transitional mode.
A nice demo/playground to view the differences between IDNA2008, IDNA2003 and UTS46 can be found here: http://unicode.org/cldr/utility/idna.jsp.

mpvl · 2017-02-03T11:58:01Z

@alextoombs that is simply a bug. See https://go-review.googlesource.com/#/c/36280/.
cc @cee-dub

It seems that such input is okay. With idna.VerifyDNSLength can enable stricter checking.

gopherbot · 2017-02-03T12:01:14Z

CL https://golang.org/cl/36280 mentions this issue.

mpvl · 2017-02-03T12:05:37Z

I remain of the opinion that the new implementation, albeit a big change, fixes (many) bugs of the old one. It is unfortunate that people have relied on, IMHO, unintended and undocumented semantics.

Exposing the raw Punycode algorithm seems one solution. However, from what I gather what people want is not to process raw labels, but rather still domain names but with special characters that are normally illegal. So a nicer API may be to allow an option in the idna package to permit a set of otherwise disallowed runes, like "*". This will avoid the error in the case of "*". This way the usual error checking is retained while still permitting such cases as *.faß.de. Comments?

mikioh · 2017-02-06T07:51:39Z

@mpvl,

Thanks for the explanation and references, especially https://tools.ietf.org/html/rfc5894#section-4.4. For the case preservation on ToASCII with LDH/A-labels, I think it's fine as it is, or with a brief summary of https://tools.ietf.org/html/rfc5894#section-4.4 to notify people who work on DNS infrastructure applications.

The previous code hardwired the NonTransitional profile for certain operations. This had the effect of erasing the verifyDNSLength flag meaning labels were almost never checked for proper length. Fixing this exposed various issues. Also fixes panic for empty strings. Updates golang/go#18567. Change-Id: I54763d913fbd63a50d5f7a93d3cf36ef6270abfd Reviewed-on: https://go-review.googlesource.com/36280 Run-TryBot: Marcel van Lohuizen <mpvl@golang.org> Reviewed-by: Nigel Tao <nigeltao@golang.org>

mpvl · 2017-02-07T11:00:58Z

@mikioh: I seem you suggest to preserve case on ToASCII.

Here is my proposal to accommodate that. In addition to the Lookup and Register cases of IDNA, we support a "Raw" or "Punycode" profile which preserves case and does minimal checking. The existing ToUnicode and ToASCII functions will use this profile. This may be a bit of an odd choice, but supports maximum compatibility and allows table growth to be kept at a minimum.

Users can use the Lookup or other profiles, or create their own, for different purposes.

mikioh · 2017-02-08T00:46:03Z

@mpvl,

I seem you suggest to preserve case on ToASCII.

Nope, sorry I wasn't clear. My comment I think it's fine as it is is just for https://go-review.googlesource.com/34770; I wrote the comment while reading the CL. I agree that the package idna should focus on IDNA (and people who work on DNS infrastructure applications must be aware of the requirements for IDNA-aware applications are different from DNS servers, registrars and registries.)

mpvl · 2017-02-08T10:08:24Z

@mikioh Ah okay. Please take a look at https://go-review.googlesource.com/#/c/36512/. (not sure why this issue wasn't updated).

I've now adopted the semantics that a zero Profile only does Punycode and that any validation is optional. A few pre-made profiles are provided for Lookup and Registration cases of the idna. One could argue that at least a normalization check should at least always be in there, but that's also the largest table. So this semantics also addresses #18582. If validation is not used, tables are not linked.

I think this is an acceptable API whilst having the highest backwards compatibility and still being within the spirit of IDNA. If one squints one's eyes a bit, this is what the RFC intended. :)

gopherbot · 2017-02-08T11:16:27Z

CL https://golang.org/cl/36512 mentions this issue.

ToASCII and ToUnicode now closely mimic the current definitions in x/net/idna: doing a bare Punycode conversion. This may be odd, but preserves backwards compatibility. The changes also ensure that tables used by unused validations are dropped. The new semantics is that the zero Profile does a bare Punycode transformation. The user can define new Profiles with additional restrictions. Two high-level options are provided to mimic the Registration and Lookup use cases defined in the RFC. validateAndMap is identical to the code that was removed from process. Updates golang/go#18567 Updates golang/go#18582 Change-Id: If9e1a1037f968758c1b3f150860773770d032f4e Reviewed-on: https://go-review.googlesource.com/36512 Run-TryBot: Marcel van Lohuizen <mpvl@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>

gopherbot · 2017-02-15T09:36:34Z

CL https://golang.org/cl/37050 mentions this issue.

titanous changed the title ~~Recent changes in net/idna introduced backward compatibility issues~~ x/net/idna: backward compatibility issues with recent change Jan 9, 2017

titanous added this to the Unreleased milestone Jan 9, 2017

titanous added the NeedsInvestigation label Jan 9, 2017

bradfitz assigned mpvl Jan 9, 2017

josharian mentioned this issue Jan 25, 2017

Broken permalinks when host not set gohugoio/hugo#2987

Closed

gopherbot closed this as completed in golang/net@78ebe5c Mar 24, 2017

weppos mentioned this issue Apr 2, 2017

x/net/idna: backward compatibility issues with change 2 #19821

Closed

golang locked and limited conversation to collaborators Mar 24, 2018

gopherbot added the FrozenDueToAge label Mar 24, 2018

rsc unassigned mpvl Jun 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

x/net/idna: backward compatibility issues with recent change #18567

x/net/idna: backward compatibility issues with recent change #18567

weppos commented Jan 8, 2017 •

edited

Loading

bradfitz commented Jan 9, 2017

tabacco commented Jan 9, 2017 •

edited

Loading

mikioh commented Jan 12, 2017 •

edited

Loading

nigeltao commented Jan 12, 2017

nigeltao commented Jan 12, 2017

tsuna commented Jan 13, 2017

nigeltao commented Jan 14, 2017

gopherbot commented Jan 14, 2017

gopherbot commented Jan 18, 2017

mpvl commented Feb 2, 2017

weppos commented Feb 2, 2017

mpvl commented Feb 2, 2017

nigeltao commented Feb 3, 2017 •

edited

Loading

alextoombs commented Feb 3, 2017

mpvl commented Feb 3, 2017

mpvl commented Feb 3, 2017

gopherbot commented Feb 3, 2017

mpvl commented Feb 3, 2017 •

edited by josharian

Loading

mikioh commented Feb 6, 2017

mpvl commented Feb 7, 2017

mikioh commented Feb 8, 2017

mpvl commented Feb 8, 2017

gopherbot commented Feb 8, 2017

gopherbot commented Feb 15, 2017

x/net/idna: backward compatibility issues with recent change #18567

x/net/idna: backward compatibility issues with recent change #18567

Comments

weppos commented Jan 8, 2017 • edited Loading

What version of Go are you using (go version)?

What operating system and processor architecture are you using (go env)?

What did you do?

What did you expect to see?

What did you see instead?

bradfitz commented Jan 9, 2017

tabacco commented Jan 9, 2017 • edited Loading

mikioh commented Jan 12, 2017 • edited Loading

nigeltao commented Jan 12, 2017

nigeltao commented Jan 12, 2017

tsuna commented Jan 13, 2017

nigeltao commented Jan 14, 2017

gopherbot commented Jan 14, 2017

gopherbot commented Jan 18, 2017

mpvl commented Feb 2, 2017

weppos commented Feb 2, 2017

mpvl commented Feb 2, 2017

nigeltao commented Feb 3, 2017 • edited Loading

alextoombs commented Feb 3, 2017

mpvl commented Feb 3, 2017

mpvl commented Feb 3, 2017

gopherbot commented Feb 3, 2017

mpvl commented Feb 3, 2017 • edited by josharian Loading

mikioh commented Feb 6, 2017

mpvl commented Feb 7, 2017

mikioh commented Feb 8, 2017

mpvl commented Feb 8, 2017

gopherbot commented Feb 8, 2017

gopherbot commented Feb 15, 2017

weppos commented Jan 8, 2017 •

edited

Loading

What version of Go are you using (`go version`)?

What operating system and processor architecture are you using (`go env`)?

tabacco commented Jan 9, 2017 •

edited

Loading

mikioh commented Jan 12, 2017 •

edited

Loading

nigeltao commented Feb 3, 2017 •

edited

Loading

mpvl commented Feb 3, 2017 •

edited by josharian

Loading