Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/url: misleading error message when url has a leading space #29261

Open
stefanb opened this issue Dec 14, 2018 · 15 comments
Open

net/url: misleading error message when url has a leading space #29261

stefanb opened this issue Dec 14, 2018 · 15 comments
Labels
NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@stefanb
Copy link
Contributor

stefanb commented Dec 14, 2018

What version of Go are you using (go version)?

$ go version
go version go1.11.3 darwin/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/stefanb/Library/Caches/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/stefanb/go"
GOPROXY=""
GORACE=""
GOROOT="/usr/local/Cellar/go/1.11.3/libexec"
GOTMPDIR=""
GOTOOLDIR="/usr/local/Cellar/go/1.11.3/libexec/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/z2/3bdl__bs0xxd3kmf5c1pj0gm0000gn/T/go-build212726590=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

package main

import (
	"net/http"
)

func main() {
	_, err := http.Get(" http://example.org")
	if err != nil {
		panic(err)
	}
}

https://play.golang.org/p/jkcYSD6ZRcO

What did you expect to see?

Either

  • a generic error message of invalid URL or
  • a detailed error of invalid URL scheme or
  • a detailed error of leading space in URL

What did you see instead?

A misleading detailed error message:

parse  http://example.org: first path segment in URL cannot contain colon

In #24246 the proposed solution was to trim the URLs before using them, but from the given error message it is very hard to see what the real problem is.

I propose to either:

  • adjust the error message to eg:
parse  http://example.org: URL scheme cannot contain spaces

or

  • quote the problematic URL in the error message, so that it is more obvious even if the message itself remains misleading:
parse " http://example.org": first path segment in URL cannot contain colon

or

  • ideally both adjust the error message + quote the URL:
parse " http://example.org": URL scheme cannot contain spaces

The cannot contain spaces in error message can be changed for cannot contain whitespace or even contains invalid characters, depending how the check is implemented.

Current implementation:

go/src/net/url/url.go

Lines 540 to 562 in b50210f

if !strings.HasPrefix(rest, "/") {
if url.Scheme != "" {
// We consider rootless paths per RFC 3986 as opaque.
url.Opaque = rest
return url, nil
}
if viaRequest {
return nil, errors.New("invalid URI for request")
}
// Avoid confusion with malformed schemes, like cache_object:foo/bar.
// See golang.org/issue/16822.
//
// RFC 3986, §3.3:
// In addition, a URI reference (Section 4.1) may be a relative-path reference,
// in which case the first path segment cannot contain a colon (":") character.
colon := strings.Index(rest, ":")
slash := strings.Index(rest, "/")
if colon >= 0 && (slash < 0 || colon < slash) {
// First path segment has colon. Not allowed in relative URL.
return nil, errors.New("first path segment in URL cannot contain colon")
}
}

@dsnet dsnet pinned this issue Dec 14, 2018
@dsnet dsnet unpinned this issue Dec 14, 2018
@bradfitz bradfitz added this to the Go1.13 milestone Dec 14, 2018
@bradfitz bradfitz added the NeedsFix The path to resolution is known, but the work has not been done. label Dec 14, 2018
@bradfitz bradfitz changed the title url.parse - misleading error message when url has a leading space net/url: misleading error message when url has a leading space Dec 14, 2018
@cgopalan
Copy link

@bradfitz @dsnet Wouldnt a TrimSpace before getscheme on this line solve the issue? I can raise a PR for this if thats agreed upon as the fix. Would also write tests of course.

@agnivade
Copy link
Contributor

Technically, according to RFC3986-Sec3.1 -

Scheme names consist of a sequence of characters beginning with a
letter and followed by any combination of letters, digits, plus
("+"), period ("."), or hyphen ("-").

So therefore, anything not starting with a letter should be rejected and that would be the right behavior. However, I had a look at the implementations out there in the wild.

NodeJS accepts it-

> require("url").parse('  https://example.org')
Url {
  protocol: 'https:',
  slashes: true,
  auth: null,
  host: 'example.org',
  port: null,
  hostname: 'example.org',
  hash: null,
  search: null,
  query: null,
  pathname: '/',
  path: '/',
  href: 'https://example.org/' }

Curl does not -

curl '   https://example.org' 
curl: (1) Protocol "   https" not supported or disabled in libcurl

Same with wget -

wget '   https://example.org' 
   https://example.org: Scheme missing.

Python also fails to parse-

from urllib.parse import urlparse
>>> o = urlparse('  https://example.org')
>>> o
ParseResult(scheme='', netloc='', path='  https://example.org', params='', query='', fragment='')

Note that scheme and netloc are empty.

In light of the above, let us just return a more descriptive error (with quotes around the url) if we see a space. How about -

parse " http://example.org": URL scheme does not begin with an alpha-numeric character.

@robaho
Copy link

robaho commented Dec 21, 2018 via email

@agnivade
Copy link
Contributor

agnivade commented Dec 21, 2018

That's right, sorry. Then we can simplify it to ".. does not begin with a letter".

@stefanb
Copy link
Contributor Author

stefanb commented Dec 21, 2018

simplify it to ".. does not begin with a letter".

Perhaps ".. scheme does not begin with a letter" would be more accurate as "//example.org" is valid to my understanding.

Interestingly, i found

	u, err := url.Parse("    //example.org")
	if err != nil {
		panic(err)
	}
	fmt.Println(u)

is NOT failing, but outputs:

%20%20%20%20//example.org

https://play.golang.org/p/jRk63PWsKG5

@bradfitz
Copy link
Contributor

Yes, we should return an error on leading bogus chars.

@cgopalan
Copy link

ok I will make a PR for this code change with associated tests.

stefanb added a commit to stefanb/go that referenced this issue Dec 21, 2018
Current implementation doesn't always make it obvious what the exact
problem with the URL is, so this makes it clearer by consistently quoting
the invalid URL, as is the norm in other parsing implementations, eg.:
strconv.Atoi(" 123") returns an error: parsing " 123": invalid syntax

Updates golang#29261
@stefanb
Copy link
Contributor Author

stefanb commented Dec 21, 2018

I have implemented the generic improvement of error messages by consistently quoting the URLs in #29384, but more detailed messages and stricter validation are welcome of course.

@cgopalan
Copy link

@bradfitz what does "bogus chars" translate to besides a space?

@agnivade
Copy link
Contributor

Any non-letter, according to the spec.

@cgopalan
Copy link

cgopalan commented Dec 23, 2018

On further analysis, it seems its not straightforward to implement this because of 2 reasons:

  • the characters that are invalid in a scheme could be valid in other parts of the url (like the period and plus characters) so we will need to remember what we processed before. Currently getscheme does not do that, and am wondering if adding that functionality would increase complexity.
  • the url package says it should be backward compatible.

To account for the space-specific error, we could add a separate case branch and raise the error message like so:

case c == ' ':   // Add other invalid prefix chars here
	if i == 0 {
		return "", "", errors.New("URL scheme has invalid characters")
	} 

However this seems like its a bandaid fix to me, but I am not sure, so would appreciate some input. Any thoughts on whether this is good, or if there's any other approach?

@stefanb @bradfitz @agnivade ?

@stefanb
Copy link
Contributor Author

stefanb commented Dec 28, 2018

I assume the proposed change would go just before default line 450 in the switch block here:

go/src/net/url/url.go

Lines 435 to 457 in b50210f

func getscheme(rawurl string) (scheme, path string, err error) {
for i := 0; i < len(rawurl); i++ {
c := rawurl[i]
switch {
case 'a' <= c && c <= 'z' || 'A' <= c && c <= 'Z':
// do nothing
case '0' <= c && c <= '9' || c == '+' || c == '-' || c == '.':
if i == 0 {
return "", rawurl, nil
}
case c == ':':
if i == 0 {
return "", "", errors.New("missing protocol scheme")
}
return rawurl[:i], rawurl[i+1:], nil
default:
// we have encountered an invalid character,
// so there is no valid scheme
return "", rawurl, nil
}
}
return "", rawurl, nil
}

This would make it inconsistent with other validation error cases in the same switch block:

  1. the first character validation:

    go/src/net/url/url.go

    Lines 441 to 444 in b50210f

    case '0' <= c && c <= '9' || c == '+' || c == '-' || c == '.':
    if i == 0 {
    return "", rawurl, nil
    }
  2. the invalid character validation

    go/src/net/url/url.go

    Lines 450 to 453 in b50210f

    default:
    // we have encountered an invalid character,
    // so there is no valid scheme
    return "", rawurl, nil

They both just returns scheme as "" and no error whatsoever. It should be consistent in a way so that all reasons for invalid scheme result either in error or "", but not mixed.

@agnivade
Copy link
Contributor

It's hard to discuss code changes here. Maybe just send a CL and we can continue it over there.

We probably need to simplify the switch-case a bit. Have an if condition first, which checks for i==0 and non-letter, and return immediately. And integrate the rest of the switch-case accordingly.

We can keep the special c == ':' and i == 0 check, which will preserve the "missing protocol scheme" error, for backward compatibility reasons. But I will let @bradfitz decide that.

@gopherbot
Copy link

Change https://golang.org/cl/155922 mentions this issue: net/url: give a proper error message on invalid character in scheme

@gopherbot
Copy link

Change https://golang.org/cl/185117 mentions this issue: net/url: improve url parsing error messages by quoting

@andybons andybons modified the milestones: Go1.13, Go1.14 Jul 8, 2019
gopherbot pushed a commit that referenced this issue Aug 28, 2019
Current implementation doesn't always make it obvious what the exact
problem with the URL is, so this makes it clearer by consistently quoting
the invalid URL, as is the norm in other parsing implementations, eg.:
strconv.Atoi(" 123") returns an error: parsing " 123": invalid syntax

Updates #29261

Change-Id: Icc6bff8b4a4584677c0f769992823e6e1e0d397d
GitHub-Last-Rev: 648b9d9
GitHub-Pull-Request: #29384
Reviewed-on: https://go-review.googlesource.com/c/go/+/185117
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
tomocy pushed a commit to tomocy/go that referenced this issue Sep 1, 2019
Current implementation doesn't always make it obvious what the exact
problem with the URL is, so this makes it clearer by consistently quoting
the invalid URL, as is the norm in other parsing implementations, eg.:
strconv.Atoi(" 123") returns an error: parsing " 123": invalid syntax

Updates golang#29261

Change-Id: Icc6bff8b4a4584677c0f769992823e6e1e0d397d
GitHub-Last-Rev: 648b9d9
GitHub-Pull-Request: golang#29384
Reviewed-on: https://go-review.googlesource.com/c/go/+/185117
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
t4n6a1ka pushed a commit to t4n6a1ka/go that referenced this issue Sep 5, 2019
Current implementation doesn't always make it obvious what the exact
problem with the URL is, so this makes it clearer by consistently quoting
the invalid URL, as is the norm in other parsing implementations, eg.:
strconv.Atoi(" 123") returns an error: parsing " 123": invalid syntax

Updates golang#29261

Change-Id: Icc6bff8b4a4584677c0f769992823e6e1e0d397d
GitHub-Last-Rev: 648b9d9
GitHub-Pull-Request: golang#29384
Reviewed-on: https://go-review.googlesource.com/c/go/+/185117
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
@rsc rsc modified the milestones: Go1.14, Backlog Oct 9, 2019
ssapra pushed a commit to cloudfoundry/bosh-utils that referenced this issue Mar 24, 2020
See golang/go#29261. go 1.14 added quotes around error messages for net/url
ssapra pushed a commit to cloudfoundry/bosh-agent that referenced this issue May 12, 2020
See golang/go#29261. go 1.14 added quotes around error messages for net/url
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants