Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/url: encoding inconsistency between 1.4 and 1.5 for unicode domain names #12719

Closed
VojtechVitek opened this issue Sep 22, 2015 · 5 comments
Closed
Milestone

Comments

@VojtechVitek
Copy link

Given this code (Go Playground):

package main

import (
    "fmt"
    "net/url"
)

func main() {
    u, _ := url.Parse("http://www.žluťoučký-kůň.cz")
    fmt.Println(u)
}

I get different results in go 1.4 and go 1.5:

$ go version
go version go1.4.2 darwin/amd64

$ go run main.go 
http://www.žluťoučký-kůň.cz
$ go version
go version go1.5.1 darwin/amd64

$ go run main.go 
http://www.%C5%BElu%C5%A5ou%C4%8Dk%C3%BD-k%C5%AF%C5%88.cz

Is this intended (but undocumented) behavior - or is it a bug?

@rakyll
Copy link
Contributor

rakyll commented Sep 22, 2015

I assume the conversion done to maximize the interoperability with the legacy URI resolvers, but the RFC 3987 particularly recommends replacing non-ascii with dashes primarily.

From http://www.ietf.org/rfc/rfc3987.txt,

   The ToASCII operation may fail, but this would mean that the IRI
   cannot be resolved.  This conversion SHOULD be used when the goal is
   to maximize interoperability with legacy URI resolvers.  For example,
   the IRI

   "http://résumé.example.org"

   may be converted to

   "http://xn--rsum-bpad.example.org"

   instead of

   "http://r%C3%A9sum%C3%A9.example.org".

/cc @rsc @bradfitz

@VojtechVitek
Copy link
Author

Well, there's already http://godoc.org/golang.org/x/net/idna#ToASCII for Punycode conversions.

@rakyll
Copy link
Contributor

rakyll commented Sep 23, 2015

In any case, this is a bug. We should never over escape the host name because url.Parse doesn't allow parsing percent-encoded host names now.

u, _ := url.Parse("http://www.žluťoučký-kůň.cz")
if _, err := url.Parse(u.String()); err != nil {
  panic(err)
}

will panic with "panic: parse http://www.%C5%BElu%C5%A5ou%C4%8Dk%C3%BD-k%C5%AF%C5%88.cz: percent-encoded characters in host".

@rakyll rakyll added this to the Go1.6 milestone Sep 26, 2015
@rsc rsc changed the title net/url: Encoding inconsistency between 1.4 and 1.5 for unicode domain names net/url: encoding inconsistency between 1.4 and 1.5 for unicode domain names Nov 5, 2015
@rsc
Copy link
Contributor

rsc commented Dec 4, 2015

RFC 3986 is clear about the need for percent-encoding the host when creating the URL. The parser is a bit lax in accepting the UTF-8 to begin with, but it's probably a mistake to reject it at this point.

I sent a CL to accept the %-encoded form so that we can round-trip the URL. I'm not sure this is a great idea, but we'll see I guess. The main argument is for non-HTTP uses of URLs.

@gopherbot
Copy link

CL https://golang.org/cl/17385 mentions this issue.

@rsc rsc closed this as completed in a6869d1 Dec 7, 2015
@golang golang locked and limited conversation to collaborators Dec 14, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants