Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/url: fails to decode %ya whereas browsers are more tolerant #29808

Open
Darkemon opened this issue Jan 18, 2019 · 6 comments
Open

net/url: fails to decode %ya whereas browsers are more tolerant #29808

Darkemon opened this issue Jan 18, 2019 · 6 comments
Labels
NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Milestone

Comments

@Darkemon
Copy link

What version of Go are you using (go version)?

go version go1.11.4 freebsd/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env GOARCH="amd64" GOBIN="" GOCACHE="/root/.cache/go-build" GOEXE="" GOFLAGS="" GOHOSTARCH="amd64" GOHOSTOS="freebsd" GOOS="freebsd" GOPATH="/root/go" GOPROXY="" GORACE="" GOROOT="/usr/local/go" GOTMPDIR="" GOTOOLDIR="/usr/local/go/pkg/tool/freebsd_amd64" GCCGO="gccgo" CC="clang" CXX="clang++" CGO_ENABLED="1" GOMOD="" CGO_CFLAGS="-g -O2" CGO_CPPFLAGS="" CGO_CXXFLAGS="-g -O2" CGO_FFLAGS="-g -O2" CGO_LDFLAGS="-g -O2" PKG_CONFIG="pkg-config" GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build246043989=/tmp/go-build -gno-record-gcc-switches"
$ go env

What did you do?

Yandex web application (https://yandex.ru/search) periodically sends requests like:

https://yandex.ru/clck/click/reqid=1545391593487252-912524167688176914537851-man1-1492/path=690.491.59/vars=-no=19,-blob=aYLIB2m%yAdp%sgHabbJBw__/*https://yandex.ru/search/?text=%D0%BF%D1%80%D0%BE%D0%BC%D0%BE%D0%BA%D0%BE%D0%B4%20%D0%BE%D1%81%D0%B5%D1%82%D0%B8%D0%BD%D1%81%D0%BA%D0%B8%D0%B5%20%D0%BF%D0%B8%D1%80%D0%BE%D0%B3%D0%B8%20ospirogi&lr=213

I parse this URL with url.ParseRequestURI() and it returns an error, but as I understand the URL is valid.

What did you expect to see?

Parsed URL.

What did you see instead?

The error:

 parse https://yandex.ru/clck/click/reqid=1545391593487252-912524167688176914537851-man1-1492/path=690.491.59/vars=-no=19,-blob=aYLIB2m%yAdp%sgHabbJBw__/*https://yandex.ru/search/?text=%D0%BF%D1%80%D0%BE%D0%BC%D0%BE%D0%BA%D0%BE%D0%B4%20%D0%BE%D1%81%D0%B5%D1%82%D0%B8%D0%BD%D1%81%D0%BA%D0%B8%D0%B5%20%D0%BF%D0%B8%D1%80%D0%BE%D0%B3%D0%B8%20ospirogi&lr=213: invalid URL escape "%yA"
@aslrousta
Copy link

aslrousta commented Jan 18, 2019

According to the standard RFC 3986, Section 2.1, a percent encoded character must be of the form:

pct-encoded = "%" HEXDIG HEXDIG

And, the percent sign (%) itself must be encoded as:

Because the percent ("%") character serves as the indicator for percent-encoded octets, it must be percent-encoded as "%25" for that octet to be used as data within a URI.

So the character sequence %yA is an invalid percent encoded character for sure. Although, most URL parsers (and especially Web browsers) are more tolerant against such errors.

@bradfitz bradfitz added this to the Go1.13 milestone Jan 18, 2019
@bradfitz bradfitz added the NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. label Jan 18, 2019
@bradfitz bradfitz changed the title net/url: incorrectly unescapes path in URL net/url: fails to decode %ya whereas browsers are more tolerant Jan 18, 2019
@Darkemon
Copy link
Author

Thanks, I'll be waiting for decision.

@rsc
Copy link
Contributor

rsc commented May 1, 2019

A shorter version is http://site/x%ya. The problem is in decoding the path. It is unclear what the URL.Path field can possibly be set to here. There's nothing that will round-trip back to x%ya. Is that URL meant to be viewed as equivalent to as /x%25ya, that is, it decodes to "/x%ya" after unescaping?

What does Apache do, or Nginx?

@rsc rsc modified the milestones: Go1.13, Go1.14 May 1, 2019
@Darkemon
Copy link
Author

Darkemon commented May 6, 2019

Is that URL meant to be viewed as equivalent to as /x%25ya, that is, it decodes to "/x%ya" after unescaping?

I think yes. it is. But I've tested this case with Nginx (listing directory mode and reverse proxy) and it accepts only urls like http://site/x%25ya, not http://site/x%ya.

@rsc rsc modified the milestones: Go1.14, Backlog Oct 9, 2019
@WGH-
Copy link

WGH- commented Jan 11, 2022

I looked up what WHATWG URL Standard says about this. I'm not implying that Go must follow it, but just a case in point.

https://url.spec.whatwg.org/#path-state

Otherwise, run these steps:

  1. If c is not a URL code point and not U+0025 (%), validation error.
  2. If c is U+0025 (%) and remaining does not start with two ASCII hex digits, validation error.
  3. UTF-8 percent-encode c using the path percent-encode set and append the result to buffer.

First, "validation error" is not a hard error: parser might continue, and report the error separately.

Second, I think they don't want the parser to automatically decode percent-encoding when parsing the path portion of the URL. The point 3 means that certain characters, like {, are to be converted to %7B when parsing.

@WGH-
Copy link

WGH- commented Jan 11, 2022

One could work around this by using a separate URL parser, like https://github.com/nlnwa/whatwg-url, but unfortunately, net/http insists on parsing the URL itself though net/url, and doesn't allow overriding Request-URI with a string.

WGH- added a commit to WGH-/colly that referenced this issue Jan 12, 2022
Go net/http cannot send HTTP requests containing
invalid URL encoding in path (e.g. bare percent)
at all[1]. Browsers send a bare percent in
such scenario, and do not implicitly autoencode it.

Until the upstream issue is resolved somehow,
we have only two alternatives: either fail
to fetch such URLs, or at least attempt the
autoencoded variant. Lots of webservers
handle them the same way, so it's worth trying.

There aren't too many websites with invalid
URL encoding in path component, though.

[1] golang/go#29808
rca03 pushed a commit to rca03/colly that referenced this issue Aug 2, 2023
Go net/http cannot send HTTP requests containing
invalid URL encoding in path (e.g. bare percent)
at all[1]. Browsers send a bare percent in
such scenario, and do not implicitly autoencode it.

Until the upstream issue is resolved somehow,
we have only two alternatives: either fail
to fetch such URLs, or at least attempt the
autoencoded variant. Lots of webservers
handle them the same way, so it's worth trying.

There aren't too many websites with invalid
URL encoding in path component, though.

[1] golang/go#29808
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Projects
None yet
Development

No branches or pull requests

5 participants