Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/idna: apply nameprep normalization algorithm #16501

Closed
mna opened this issue Jul 26, 2016 · 2 comments
Closed

x/net/idna: apply nameprep normalization algorithm #16501

mna opened this issue Jul 26, 2016 · 2 comments

Comments

@mna
Copy link

mna commented Jul 26, 2016

Please answer these questions before submitting your issue. Thanks!

  1. What version of Go are you using (go version)?
go version go1.7rc1 darwin/amd64
  1. What operating system and processor architecture are you using (go env)?
GOARCH="amd64"
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
  1. What did you do?
    If possible, provide a recipe for reproducing the error.
    A complete runnable program is good.
    A link on play.golang.org is best.

https://play.golang.org/p/zS-UR4WhIx

func main() {
    in1 := "www.\u00e9tat.com"  // e-acute in one rune
    in2 := "www.e\u0301tat.com" // e-acute in two runes, "e" + acute

    // golang.org/x/net/idna
    got1, err1 := idna.ToASCII(in1)
    got2, err2 := idna.ToASCII(in2)
    fmt.Println(got1, err1) // www.xn--tat-9la.com <nil>
    fmt.Println(got2, err2) // www.xn--etat-vvc.com <nil>

    // github.com/DanielOaks/go-idn/idna2003
    got1, err1 = idna2003.ToASCII(in1)
    got2, err2 = idna2003.ToASCII(in2)
    fmt.Println(got1, err1) // www.xn--tat-9la.com <nil>
    fmt.Println(got2, err2) // www.xn--tat-9la.com <nil>
}
  1. What did you expect to see?

When running idna.ToASCII, it should perform a normalization of unicode before encoding to punycode (https://en.wikipedia.org/wiki/Internationalized_domain_name, section "ToASCII and ToUnicode": "ToASCII will apply the Nameprep algorithm, which converts the label to lowercase and performs other normalization, and will then translate the result to ASCII using Punycode").

The golang.org/x/net/idna does not seem to perform that normalization step, while e.g. the userspace github.com/DanielOaks/go-idn package does.

So running idna.ToASCII on www.état.com and on www.e\u0301tat.com should (if I understand IDNA correctly) return the same punycode form: www.xn--tat-9la.com.

  1. What did you see instead?

The userspace package correctly returns www.xn--tat-9la.com for both inputs, but x/net/idna returns "www.xn--tat-9la.com" and "www.xn--etat-vvc.com".

@mna
Copy link
Author

mna commented Jul 26, 2016

My bad, it seems that rfc-5891 ("Internationalized Domain Names in Applications (IDNA): Protocol") obsoletes the "nameprep" rfc-3491 ("Nameprep: A Stringprep Profile for IDN") and states in "Appendix A. Summary of Major Changes from IDNA2003":

Remove the mapping and normalization steps from the protocol and have them, instead, done by the applications themselves, possibly in a local fashion, before invoking the protocol.

So I guess x/net/idna does the right thing and it is up to the caller to normalize or not. Though it means the caller should know whether a domain in non-normalized form is equivalent to one in normalized form, which I have no idea if it is (maybe it is incosistent in the wild, registration for www.\u00e9tat.com and www.e\u0301tat.com may or may not be separate domains?).

If anyone knows about that last part, I'd love to know (it would be very helpful for the purell normalization package that I maintain), but otherwise this is not an issue for the idna package, so I'll close it.

@mna mna closed this as completed Jul 26, 2016
@mna
Copy link
Author

mna commented Jul 26, 2016

Re-nevermind that last part, rfc-5891 states that:

By the time a string enters the IDNA registration process as
described in this specification, it MUST be in Unicode and in
Normalization Form C (NFC)

@golang golang locked and limited conversation to collaborators Jul 26, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants