net/url: URL.String() hashbang "#!" encodes as "#%21" #19917

dragonsinth · 2017-04-10T23:44:04Z

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (`go version`)?

1.7.4

What did you do?

https://play.golang.org/p/R-4KVNXa9Z

        u, _ := url.Parse("http://example.org/#!wut")
	fmt.Println(u.String())

What did you expect to see?

http://example.org/#!wut

What did you see instead?

http://example.org/#%21wut

Extra info

#6784
#5684

Looks like the first issue was marked a duplicate of the second, but the underlying problem was never fixed.

The text was updated successfully, but these errors were encountered:

odeke-em · 2017-05-10T23:26:46Z

/cc @rsc @bradfitz

@dragonsinth is there some RFC that we could quote/refer to that defines
that expectation on how to encode those values?

jhump · 2017-05-11T20:18:50Z

@odeke-em, the summary for #5684 has the details, from RFC 3986.

dragonsinth · 2017-05-11T20:19:11Z

Sure, in RFC 2396 (https://www.ietf.org/rfc/rfc2396.txt):

2.3. Unreserved Characters

   Data characters that are allowed in a URI but do not have a reserved
   purpose are called unreserved.  These include upper and lower case
   letters, decimal digits, and a limited set of punctuation marks and
   symbols.

      unreserved  = alphanum | mark

      mark        = "-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")"

   Unreserved characters can be escaped without changing the semantics
   of the URI, but this should not be done unless the URI is being used
   in a context that does not allow the unescaped character to appear.

RFC3986 (https://www.ietf.org/rfc/rfc3986.txt) then later redefines several of those characters (including !) as sub-delims in 2.2 and 2.3, which means reserved for using within indivdual components, such as the fragment.

If you read down to appendix A of RFC3986, "Collected ABNF for URI", you'll find the following rules:

fragment      = *( pchar / "/" / "?" )

pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"

sub-delims    = "!" / "$" / "&" / "'" / "(" / ")"
                 / "*" / "+" / "," / ";" / "="

As you can see, sub-delims is distinct from pct-encoded and they are allow in the fragment without encoding.

For further reference, Google Search build a whole spec on it a while ago (though it's now deprecated, but lots of app still use this): https://developers.google.com/webmasters/ajax-crawling/docs/getting-started

fraenkel · 2017-05-12T01:53:29Z

The sub-delims isn't handled properly which affects more than just fragments.

gopherbot · 2017-09-05T14:42:51Z

Change https://golang.org/cl/61650 mentions this issue: net/url: don't escape sub-delims in fragment

tombergan · 2017-09-11T17:25:50Z

@fraenkel The sub-delims isn't handled properly which affects more than just fragments.

Can you provide a specific example?

fraenkel · 2017-09-11T22:17:08Z

Unless my reading of RFC 3986 is incorrect, sub-delims is part of the reserved set as well.
The shouldEscape method lists the set of reserved characters which does not follow rfc 3986.
I am a bit more concerned that some of the others may be different as well. I didn't do an exhaustive comparison but noticed that the rules did change.

tombergan · 2017-09-12T00:57:37Z

@fraenkel I should have been more clear: I meant, are there any examples where programs are broken because Go escapes sub-delims in the Path? You've observed that shouldEscape does not handle sub-delims according to RFC 3986, but see URL.EscapedPath and how that is used in URL.String. I believe Go is correctly handling sub-delims in Path when the caller uses those two methods. If that is wrong, it would help to see a repro.

This issue is about sub-delims in fragments. @dragonsinth has convinced me that we need to support unescaped ! in fragments, at the very least. There are two ways we could do this:

Update shouldEscape to return false for sub-delims when mode is escapeFragment.
Add URL.RawFragment and URL.EscapedFragment(), following the existing pattern for RawPath and EscapedPath. This makes my stomach churn, but at least it is symmetric with the approach we've already taken for Path.

The first approach avoids adding public APIs. Both proposals might possibly break user code, but the second approach breaks less code. The first approach changes the semantics of URL.Fragment, while the second approach keeps the current semantics of URL.Fragment -- a caller must switch to RawFragment and EscapedFragment to get the new escaping behavior. The only code potentially broken by the second approach is code like the following, where the caller is expecting that u.String() will never contain an unescaped sub-delim:

u, err := url.Parse(...)
foo(u.String())

I lean towards the second approach, despite my general dislike of the API, because it's the same pattern as RawPath and that change seemed to work out OK. Also, I'm paranoid of breaking user code. Thoughts?

/cc @rsc for the API question

dragonsinth · 2017-09-12T01:13:19Z

i strongly prefer the first approach; a user should have no reasonable expectation that u.String() won't contain unescaped sub-delims in the fragment portion, and in fact the current problematic u.String() rendering is exactly what we've had to code around in our system.

Here's how we're working around this today:

func fixUrlEscaping(ctx context.Context, urlString string) string {
	// Parse + String preserves the original encoding.
	u, err := url.Parse(urlString)
	if err != nil {
		log.Warningf(ctx, "url could not be parsed: %s", urlString)
		return urlString
	}
	// Special fixup for spaces in query string.
	// TODO: something more sophisticated?
	u.RawQuery = strings.Replace(u.RawQuery, " ", "+", -1)

	// Go's  built-in fragment escaping is overly ambitious, replace theirs with a minimalist escaping
	if u.Fragment != "" {
		frag := u.Fragment
		u.Fragment = ""
		return fmt.Sprintf("%s#%s", u.String(), fragmentEscape(frag))
	}

	return u.String()
}

// Scavanged from net/url/url.go
func fragmentEscape(f string) string {
	shouldEscape := func(c byte, curr int, s string) bool {
		// §2.3 Unreserved characters (alphanum)
		if 'A' <= c && c <= 'Z' || 'a' <= c && c <= 'z' || '0' <= c && c <= '9' {
			return false
		}
		switch c {
		case '-', '_', '.', '~': // §2.3 Unreserved characters (mark)
			return false
		case '$', '&', '+', ',', '/', ':', ';', '=', '?', '@', '#', '[', ']', '!', '\'', '(', ')', '*':
			return false
		}

		return true
	}

	escape := func(s string) string {
		spaceCount, hexCount := 0, 0
		for i := 0; i < len(s); i++ {
			c := s[i]
			if shouldEscape(c, i, s) {
				hexCount++
			}
		}

		if spaceCount == 0 && hexCount == 0 {
			return s
		}

		t := make([]byte, len(s)+2*hexCount)
		j := 0
		for i := 0; i < len(s); i++ {
			switch c := s[i]; {
			case shouldEscape(c, i, s):
				t[j] = '%'
				t[j+1] = "0123456789ABCDEF"[c>>4]
				t[j+2] = "0123456789ABCDEF"[c&15]
				j += 3
			default:
				t[j] = s[i]
				j++
			}
		}
		return string(t)
	}

	return escape(f)
}

tombergan · 2017-09-12T01:29:57Z

In case I wasn't clear, in both proposals, u.String() will contain unescaped fragment sub-delims when u is generated from url.Parse. Your munging of Fragment in fixUrlEscaping will be unnecessary with both proposals. The only difference is that, with the second proposal, you will need to access EscapedFragment or RawFragment to see the unescaped sub-delims. This is the same way this works today for Path. With the first proposal, you can use Fragment directly (at the potential cost of breaking other users who are not expecting this behavior).

Also, here is some code that would be broken by the first proposal:

var u url.URL
u.Fragment = `'x`
fmt.Sprintf("<a href='%s'>", u.String()) // prints <a href=''x'> because ' is a sub-delim

You could argue that no one should write code like this. I mention this because we actually found code similar to the above in net/http/fs.go. It's possible (likely, even) that code like the above exists in the wild today. This code happens to work today because u.String() will escape the single quote. My first proposal will break the above code. My second proposal will not.

dragonsinth · 2017-09-12T01:43:40Z

Ahh, that makes sense. Thanks! In that case, Fragment + RawFragment seems good to me; it mirrors Path and Query. It seems like a tiny break in behavior to make it net more sensible. But I'm not super strongly opinionated here as long as u.String() is great.

fraenkel · 2017-09-12T05:59:36Z

I can make something up.

 u, _ := url.Parse("http://a!:b!@example.org/#!wut")
fmt.Println(u.String())

userinfo should not escape ! since it too allows sub-delims. I haven't seen anyone complain.
I think the likelihood of a ! in a path is rare which is why we haven't seen anything break there.
And IPVFuture is another case where we won't see any issue.

tombergan · 2017-09-12T06:56:31Z

Sub-delims in Path work just fine. That was fixed in go1.5, via #5684, which added RawPath and EncodedPath, similarly to a solution proposed above for Fragment. I am not concerned about userinfo given that userinfo is essentially deprecated.

fraenkel · 2017-09-12T07:44:21Z

What would be the replacement for userinfo?

tombergan · 2017-09-12T23:51:48Z

I would wait for an actual report (rather than a hypothetical) before doing anything about userinfo, especially given that userinfo is essentially deprecated.

namusyaka · 2017-09-15T04:55:05Z

Vote to second approach. Looks reasonable.

rsc · 2017-11-22T20:31:41Z

I didn't get to thinking about this for Go 1.10, sorry.

To go down the RawFragment/EscapedFragment path I think we would need to have evidence that something cares about the distinction between #!wut and #%21wut. Does anything care? It really really really should not.

(This is in contrast to paths, where the RFCs explicitly allow caring about the distinction between %2F and /, even though it's impossible to tell them apart if you only record a single string.)

dragonsinth · 2017-11-22T20:52:54Z

I super care, because browsers / javascript see the two as different values.

https://www.google.com/#!wat
https://www.google.com/#%21wat

(Ultimately that means round tripping a url from a browser -> go URL -> browser changes the value seen by javascript code.)

crvv · 2017-11-23T02:23:58Z

caring about the distinction between %2F and /

There is an issue for this.
#21955

namusyaka · 2018-02-15T00:37:25Z

Any progress on this?

dragonsinth · 2018-07-13T05:10:54Z

Sweet!

In Go 1.11, the issue golang/go#19917 has been resolved, and "#!" is now encoded as "#!" rather than "#%21". Follows golang/go@8a33045.

bradfitz added this to the Go1.10 milestone May 10, 2017

rsc modified the milestones: Go1.10, Go1.11 Nov 22, 2017

gopherbot closed this as completed in 8a33045 Jul 13, 2018

golang locked and limited conversation to collaborators Jul 13, 2019

gopherbot added the FrozenDueToAge label Jul 13, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

net/url: URL.String() hashbang "#!" encodes as "#%21" #19917

net/url: URL.String() hashbang "#!" encodes as "#%21" #19917

dragonsinth commented Apr 10, 2017 •

edited

odeke-em commented May 10, 2017

jhump commented May 11, 2017

dragonsinth commented May 11, 2017 •

edited

fraenkel commented May 12, 2017 •

edited

gopherbot commented Sep 5, 2017

tombergan commented Sep 11, 2017

fraenkel commented Sep 11, 2017

tombergan commented Sep 12, 2017 •

edited

dragonsinth commented Sep 12, 2017

tombergan commented Sep 12, 2017 •

edited

dragonsinth commented Sep 12, 2017 •

edited

fraenkel commented Sep 12, 2017

tombergan commented Sep 12, 2017

fraenkel commented Sep 12, 2017

tombergan commented Sep 12, 2017

namusyaka commented Sep 15, 2017

rsc commented Nov 22, 2017

dragonsinth commented Nov 22, 2017 •

edited

crvv commented Nov 23, 2017

namusyaka commented Feb 15, 2018

dragonsinth commented Jul 13, 2018

net/url: URL.String() hashbang "#!" encodes as "#%21" #19917

net/url: URL.String() hashbang "#!" encodes as "#%21" #19917

Comments

dragonsinth commented Apr 10, 2017 • edited

What version of Go are you using (go version)?

What did you do?

What did you expect to see?

What did you see instead?

Extra info

odeke-em commented May 10, 2017

jhump commented May 11, 2017

dragonsinth commented May 11, 2017 • edited

fraenkel commented May 12, 2017 • edited

gopherbot commented Sep 5, 2017

tombergan commented Sep 11, 2017

fraenkel commented Sep 11, 2017

tombergan commented Sep 12, 2017 • edited

dragonsinth commented Sep 12, 2017

tombergan commented Sep 12, 2017 • edited

dragonsinth commented Sep 12, 2017 • edited

fraenkel commented Sep 12, 2017

tombergan commented Sep 12, 2017

fraenkel commented Sep 12, 2017

tombergan commented Sep 12, 2017

namusyaka commented Sep 15, 2017

rsc commented Nov 22, 2017

dragonsinth commented Nov 22, 2017 • edited

crvv commented Nov 23, 2017

namusyaka commented Feb 15, 2018

dragonsinth commented Jul 13, 2018

dragonsinth commented Apr 10, 2017 •

edited

What version of Go are you using (`go version`)?

dragonsinth commented May 11, 2017 •

edited

fraenkel commented May 12, 2017 •

edited

tombergan commented Sep 12, 2017 •

edited

tombergan commented Sep 12, 2017 •

edited

dragonsinth commented Sep 12, 2017 •

edited

dragonsinth commented Nov 22, 2017 •

edited