Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/url: semicolons treated as invalid characters in query string #50034

Open
chrisguiney opened this issue Dec 7, 2021 · 10 comments
Open

net/url: semicolons treated as invalid characters in query string #50034

chrisguiney opened this issue Dec 7, 2021 · 10 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@chrisguiney
Copy link
Contributor

What version of Go are you using (go version)?

$ go version
1.17.1

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
➜  ~ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/chrisg/.cache/go-build"
GOENV="/home/chrisg/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/chrisg/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/chrisg"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/home/chrisg/sdk/go1.17.1"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/home/chrisg/sdk/go1.17.1/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.17.1"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/dev/null"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build237025076=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Parsed a url with a non url-encoded semicolon in a value of a query string parameter
https://go.dev/play/p/n68WBqiRmkt

	u, err := url.Parse("http://foo.com/?bar=;&baz=foobar&abc&xyz=&ikj=n;m")
	if err != nil {
		panic(err)
	}

	fmt.Printf("%+v\n", u)
	fmt.Printf("%+v\n", u.Query())
	fmt.Printf("%+v\n", u.RawQuery)

	q, err := url.ParseQuery(u.RawQuery)
	if err != nil {
		panic(err)
	}
	fmt.Printf("%+v\n", q)

What did you expect to see?

The value parsed literally, and successfully - e.g.,

map[abc:[] baz:[foobar] xyz:[] bar:[;] ikj=n;m]

What did you see instead?

http://foo.com/?bar=;&baz=foobar&abc&xyz=&ikj=n;m
map[abc:[] baz:[foobar] xyz:[]]
bar=;&baz=foobar&abc&xyz=&ikj=n;m
panic: invalid semicolon separator in query

goroutine 1 [running]:
main.main()
	/tmp/sandbox883685007/prog.go:22 +0x1eb

Program exited.

There have been a number of issues related to the new semicolon handling (some of which raised by myself):

The original issue, and proposal to add AllowQuerySemicolons:

It's not clear why the current behavior has been chosen. The oringal issue only highlighted semicolons as an issue as a seperator. The current implementation ignores them entirely -- which is incredibly frustrating.

The fact of the matter is that urls containing raw semicolons do exist and those parameter values have meaning.

I'd really appreciate if we could have a dialog about potential options. I'm willing to invest time to do any implementation, but it's not clear to me what the powers that be will find acceptable.

As it stands, the issues introduced by the url parsing change in 1.17 has prevented upgrades at my org. I'd really like to help find a safe & viable way forward

@chrisguiney chrisguiney changed the title net/url: semicolons treated as invalid characters net/url: semicolons treated as invalid characters in query string Dec 7, 2021
@seankhliao seankhliao added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Dec 7, 2021
@seankhliao
Copy link
Member

cc @FiloSottile

@chrisguiney
Copy link
Contributor Author

Adding from the rfc: https://datatracker.ietf.org/doc/html/rfc3986#section-2.2

  If a reserved character is found in a URI component and
   no delimiting role is known for that character, then it must be
   interpreted as representing the data octet corresponding to that
   character's encoding in US-ASCII.

where

   gen-delims    = ":" / "/" / "?" / "#" / "[" / "]" / "@"
   sub-delims    = "!" / "$" / "&" / "'" / "(" / ")"
                 / "*" / "+" / "," / ";" / "="

note ; in sub-delims

Given that ; as a delimiting role has been rejected, it reads that the only correct thing to do is parse ; as though it were any other non-reserved character

@ianlancetaylor
Copy link
Contributor

No version of Go ever produced what you expect to see:

map[abc:[] bar:[;] baz:[foobar] ikj=n;m xyz:[]]

Go 1.16 and earlier produces this:

map[abc:[] bar:[] baz:[foobar] ikj:[n] m:[] xyz:[]]

The change in semicolon behavior was chosen to prevent mishandling of invalid queries while diverging as little as possible from earlier Go behavior. See #25192 (comment).

@chrisguiney
Copy link
Contributor Author

@ianlancetaylor you are correct, but under no circumstance would I as a user expect the current behavior. That is, it was wrong, and it's still wrong....but worse, because now data is gone, and my logs are flooded.

@chrisguiney
Copy link
Contributor Author

@ianlancetaylor to your point though, I did mis-attribute the current situation with #25192 -- they're not strictly related. Please forgive the mistake, I'm a bit frustrated at the whole thing, and perhaps wrote up the issue in a bit of haste (and with doubt that anything would come of it)

They are somewhat tangentially related in that

  1. when merely upgrading without any more thought, my logs get flooded, because I deal with many urls with semicolons in query string parameter value. I don't immediately notice any behavior change though, because in my case, I'm forwarding using url.RawQuery to apache , which doesn't treat them as separators, but does parse them as described in the RFC quote above.

  2. when adding http.AllowQuerySemicolons, all ; get replaced with & on the url.RawQuery -- leaving the keys, but removing the values. Log messages go away, which is nice, but without the value being sent upstream, it's not viable for me.

when facing the two issues above, and seeing that go now either errors out (if using url.ParseQuery) or silently ignores (if using (*URL).Query()), it was clear to me that the current behavior was definitely not correct...but instead of being able to blissfully ignore it as I had in past releases, I have to do something about it.

@chrisguiney
Copy link
Contributor Author

In an effort to stay engaged and productive on this - I have a couple questions:

  • Is there any path forward for raw semicolons in the url? That is, is the go team even amenable to continuing the conversation, or is this considered a solved problem/settled issue? It seems like the current behavior was a deliberate choice. Is there room to revisit that?
  • Is this better formulated as a proposal? From my point of view, it's certainly a bug, but I can see an argument against that.

While informative, @ianlancetaylor 's response didn't really give any direction on the issue

@ianlancetaylor
Copy link
Contributor

I'm not closely involved with this issue. That said, Go pays a lot of attention to backward compatibility. If we completely change the handling of semicolons in the net/url and net/http packages, then it seems to me that we risk silently changing the behavior of existing Go programs. It seems to me that that might open up security holes, which would be bad.

Can you address your specific issue by taking the URL string and converting ; to %3B before passing it to url.ParseQuery? I'm not trying to claim that that is the ideal solution but perhaps it would help you move forward.

Whether that helps or not, please don't expect any quick resolutions here. It's hard to find paths that work for as many people as possible.

@FiloSottile
Copy link
Contributor

The current behavior was the least-bad option in a minefield, and it's motivated by trying to minimize the situations in which parser mis-alignment between a proxy and an application leads to security vulnerabilities.

As mentioned in #49399 we'll remove the log line now that it has done its job, which was alerting applications of potential breakage. That should solve the issue for anyone who's not parsing the query but just forwarding RawQuery.

Anyone who wants the pre-Go 1.17 behavior can opt-in with AllowQuerySemicolon.

The only thing that's not addressed is applications that wish to parse unencoded semicolons as non-separators. Are there a lot of clients that send unescaped semicolons expecting them to NOT be handled as separators?

If yes, maybe we could add a EscapeQuerySemicolon Handler?

func EscapeQuerySemicolons(h Handler) Handler {
	return HandlerFunc(func(w ResponseWriter, r *Request) {
		if strings.Contains(r.URL.RawQuery, ";") {
			r2 := new(Request)
			*r2 = *r
			r2.URL = new(url.URL)
			*r2.URL = *r.URL
			r2.URL.RawQuery = strings.ReplaceAll(r.URL.RawQuery, ";", "%3B")
			h.ServeHTTP(w, r2)
		} else {
			h.ServeHTTP(w, r)
		}
	})
}

@chrisguiney
Copy link
Contributor Author

The only thing that's not addressed is applications that wish to parse unencoded semicolons as non-separators. Are there a lot of clients that send unescaped semicolons expecting them to NOT be handled as separators?
If yes, maybe we could add a EscapeQuerySemicolon Handler?

@FiloSottile I work with a lot of really wacky urls. I'm certain that there are some clients that will send unescaped semicolons expecting them not to be separators.

I invite you to engage with my proposal for adding a query parsing interface (#56300). URL parsing has been shown to be very sensitive to change. It's also something that keeps cropping up in different areas, such as httputil.ReverseProxy, and http.Client.

In the http.Client case, there's currently no way to change how it behaves on parsing a Location header prior to following a redirect. (I just discovered this the other day, and haven't had a chance to note it on the proposal yet)

The challenges have lead me to believe the best way forward is to provide interfaces, and provide a safe and sane default implementation. Let users that have particular needs supply their own, so they can get consistent behavior across the various http libraries without worrying about the next go release subtly changing behavior.

@vmallet
Copy link

vmallet commented Oct 10, 2023

The only thing that's not addressed is applications that wish to parse unencoded semicolons as non-separators. Are there a lot of clients that send unescaped semicolons expecting them to NOT be handled as separators?

@FiloSottile just as an example of something probably pretty common:
Share directions from Google Maps on iOS, get a short URL:

https://maps.app.goo.gl/Pu8hswB8Tu9an9Gb8?g_st=ic

Follow the redirect to get the full URL, here's what's returned by Google:

https://maps.google.com/?geocode=Fd8FPgId1pq0-ClVVVVVjHePgDE_7c0aV1zypA%3D%3D;FXshOgIdFYO7-CmbxbP6w8uPgDE-fvZtP0T6vA%3D%3D&daddr=sjc&saddr=sfo&dirflg=d&ftid=0x808f778c55555555:0xa4f25c571acded3f;0x808fcbc3fab3c59b:0xbcfa443f6df67e3e&lucs=,47071704&g_ep=CAISBjYuODQuNBgAIJScASoJLDQ3MDcxNzA0QgJVUw%3D%3D&g_st=ic

(Found my way here after upgrading a service running some ancient Go to 1.21).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

5 participants