Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: handle CONNECT in ReadRequest/WriteRequest #2755

Closed
andybalholm opened this issue Jan 22, 2012 · 10 comments
Closed

net/http: handle CONNECT in ReadRequest/WriteRequest #2755

andybalholm opened this issue Jan 22, 2012 · 10 comments
Milestone

Comments

@andybalholm
Copy link
Contributor

What steps will reproduce the problem?
    _, err := http.ReadRequest(bufio.NewReader(strings.NewReader("CONNECT 98.189.193.163:443 HTTP/1.1\r\n\r\n")))
    fmt.Println(err)

What is the expected output?
<nil>

What do you see instead?
parse 98.189.193.163:443: invalid URI for request

Which compiler are you using (5g, 6g, 8g, gccgo)?
6g

Which operating system are you using?
Mac OS Snow Leopard

Which revision are you using?  (hg identify)
b372a927701e tip

Please provide any additional information below.
I'm not sure if these really qualify as URLs, since they don't have a scheme, but they
are used in valid HTTP requests, so we should have some way to deal with them.

If the host is a hostname rather than an IP address, the URL is parsed successfully, but
incorrectly. "www.google.com:443" is parsed as a scheme of
"www.google.com" and an opaque content of "443". But if the host is
just a numeric address, it isn't valid to parse it as a scheme, so it just gives an
error.
@rsc
Copy link
Contributor

rsc commented Jan 23, 2012

Comment 1:

I don't know what to do here.  The CONNECT pseudo-method does
not follow the HTTP RFC as far as its argument is concerned.
One possibility would be to detect CONNECT during the request
parsing and set url.Path = arg without any real URL parsing.
And the same in the request writer.

Labels changed: added priority-go1, removed priority-triage.

Owner changed to @bradfitz.

Status changed to Accepted.

@andybalholm
Copy link
Contributor Author

Comment 2:

I would be more inclined to set url.Host = arg, since it is a host. Parsing
http://www.google.com:443/ sets host to www.google.com:443; we might as well be
consistent.
It might be helpful to change the URL parsing code instead of bypassing it. "URL"s
without schemes are common enough that it would be nice if we could do something
sensible with them. (Assume HTTP for parsing purposes, but leave the scheme blank to
show it wasn't specified?) The difficult part would be coming up with a way to trigger
this behavior if and only if it's needed. IE's behavior of accepting host:port when host
is an IP address but complaining about an unknown scheme when you go to localhost:8080
is rather confusing. (Safari seems to do better, but I don't know what rules they use.)

@rsc
Copy link
Contributor

rsc commented Jan 23, 2012

Comment 3:

Sorry, but www.google.com:443 is not a URL.
Russ

@andybalholm
Copy link
Contributor Author

Comment 4:

That's why I put "URL" in quotes in my comment :-).
But if you don't want to go Postel in the URL package, I understand. Working on the HTML
package has showed me how crazy things can get when you try to follow Postel's principle.

@rsc
Copy link
Contributor

rsc commented Jan 23, 2012

Comment 5:

'www.google.com' does not mean 'http://www.google.com/'.It means the
path www.google.com.  It would be truly surprising
if adding :443 made it mean the host instead of the path.However, it
might be okay to treat something that has a dot before the first
colon as scheme-free, so that it was treated as a Path instead of
as a syntax error.  That would fix CONNECT and be consistent
with the colon-free behavior.

@andybalholm
Copy link
Contributor Author

Comment 6:

> 'www.google.com' does not mean 'http://www.google.com/'.
It does when a user types it in the address bar of a browser. In some circumstances it
would be nice to have incomplete URLs like this work. But I had forgotten about relative
URLs.
It is impossible to have one function that would correctly process relative URLs and
also have the do-what-I-mean capabilities users expect in the address bar. So there's
really no point moving the URL parser in the DWIM direction. If someone wants DWIM URL
processing, they can parse the URL and then if that doesn't give them the result they
want, they can prefix "http://" and try again.
So it's probably best to just treat CONNECT as a special case.

@rsc
Copy link
Contributor

rsc commented Jan 23, 2012

Comment 7:

Address bars are very special.
It is important that the URL parser be usable
when parsing real URLs; for example, if the
html parser used the URL parser and we made
this change, then it would mishandle
.
Russ

@andybalholm
Copy link
Contributor Author

Comment 8:

Right.
So we should fix it in net/http (Just for CONNECT), not in net/url.

@dsymonds
Copy link
Contributor

Comment 9:

Consensus is to treat CONNECT specially when parsing the first line of the request: if
the URL part does not start with "/", prefix it with "http://", parse it as a URL, and
then clear the Scheme field.

Status changed to Started.

@dsymonds
Copy link
Contributor

Comment 11:

This issue was closed by revision c3b9650.

Status changed to Fixed.

@rsc rsc added this to the Go1 milestone Apr 10, 2015
@rsc rsc removed the priority-go1 label Apr 10, 2015
@golang golang locked and limited conversation to collaborators Jun 24, 2016
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants