Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/url: http://a,b,c/foo parses and reencodes as http://a%2Cb%2Cc/foo #12036

Closed
rsc opened this issue Aug 5, 2015 · 8 comments
Closed

net/url: http://a,b,c/foo parses and reencodes as http://a%2Cb%2Cc/foo #12036

rsc opened this issue Aug 5, 2015 · 8 comments
Milestone

Comments

@rsc
Copy link
Contributor

rsc commented Aug 5, 2015

This affects mongodb, which uses net/url to parse URLs. See @liviosoares's comments on #12023.

@rsc
Copy link
Contributor Author

rsc commented Aug 5, 2015

@liviosoares, suppose that mongodb://ip1,ip2,ip3 parses and reencodes as mongodb://ip1%2Cip2%2Cip3. Why is that a problem? They mean the same thing. The only way I can see this being a problem is if mongodb is using raw string manipulation instead of url.Parse. In that case it should probably be using url.Parse.

@cespare
Copy link
Contributor

cespare commented Aug 5, 2015

@rsc minor note; I think this particular mongodb issue is unrelated to database/sql (unlike the original issue in #12023).

@rsc
Copy link
Contributor Author

rsc commented Aug 6, 2015

@cespare Thank you, I have edited the top comment to remove mention of database/sql. In that case, though, I'm even more confused. What's url.Parse'ing and reencoding the URLs and then handing them to something that doesn't url.Parse them?

@rsc
Copy link
Contributor Author

rsc commented Aug 6, 2015

@mikioh, I'm having a hard time understanding why shouldEscape behaves as it does for mode==escapeHost. Simplifying, it says:

// §2.3 Unreserved characters (alphanum)
if 'A' <= c && c <= 'Z' || 'a' <= c && c <= 'z' || '0' <= c && c <= '9' {
    return false
}

switch c {
case '-', '_', '.', '~': // §2.3 Unreserved characters (mark)
    return false

case '$', '&', '+', ',', '/', ':', ';', '=', '?', '@': // §2.2 Reserved characters (reserved)
    switch mode {
    case encodeHost: // §3.2.1
        // The RFC allows ':'.
        return c != ':'
    }

case '[', ']': // §2.2 Reserved characters (reserved)
    switch mode {
    case encodeHost: // §3.2.1
        // The RFC allows '[', ']'.
        return false
    }
}

// Everything else must be escaped.
return true

So:

shouldEscape('A', escapeHost) = false (first clause)
shouldEscape('-', escapeHost) = false (first branch of switch)
shouldEscape('+', escapeHost) = true (second branch of switch)
shouldEscape(':', escapeHost) = false (second branch of switch)
shouldEscape('[', escapeHost) = false (third branch of switch)

but the relevant parts of RFC 3986 say:

  host        = IP-literal / IPv4address / reg-name
  reg-name    = *( unreserved / pct-encoded / sub-delims )
  unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"
  sub-delims    = "!" / "$" / "&" / "'" / "(" / ")"
              / "*" / "+" / "," / ";" / "="

If the host allows all of those sub-delims, why does shouldEscape return true for all of them? The only thing it returns false for is ":", which ironically isn't allowed (but it is part of IPv6 syntax and the :port suffix, which is why it's allowed). But why do we force escaping of the sub-delim case? It seems like we should allow them all to be used unencoded.

That is, return c != ':' should be return false. No?

@mikioh
Copy link
Contributor

mikioh commented Aug 6, 2015

But why do we force escaping of the sub-delim case?

My bad, because I didn't want to dive into the long standing issue #5684.

@mikioh
Copy link
Contributor

mikioh commented Aug 6, 2015

That is, return c != ':' should be return false. No?

Nope, the line

case '$', '&', '+', ',', '/', ':', ';', '=', '?', '@': // §2.2 Reserved characters (reserved)

includes both gen-delims and sub-delims partially.

@rsc
Copy link
Contributor Author

rsc commented Aug 6, 2015

@gopherbot
Copy link

CL https://golang.org/cl/13254 mentions this issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants