You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[parse http://zhidao.baidu.com/special/view?id=a9105a24626975510000&preview=1: first path segment in URL cannot contain colon]
fix recommends
// net/url// Maybe rawurl is of the form scheme:path.// (Scheme must be [a-zA-Z][a-zA-Z0-9+-.]*)// If so, return scheme, path; else return "", rawurl.funcgetscheme(rawurlstring) (scheme, pathstring, errerror) {
fori:=0; i<len(rawurl); i++ {
c:=rawurl[i]
switch {
case'a'<=c&&c<='z'||'A'<=c&&c<='Z':
// do nothingcase'0'<=c&&c<='9'||c=='+'||c=='-'||c=='.':
ifi==0 {
return"", rawurl, nil
}
casec==':':
ifi==0 {
return"", "", errors.New("missing protocol scheme")
}
returnrawurl[:i], rawurl[i+1:], nildefault:
// we have encountered an invalid character,// so there is no valid schemereturn"", rawurl, nil
}
}
return"", rawurl, nil
}
if leading character is space character, then continue, start to check when first character is not space.
or you can just use TrimSpace before invoke the getscheme method.
// parse parses a URL from a string in one of two contexts. If// viaRequest is true, the URL is assumed to have arrived via an HTTP request,// in which case only absolute URLs or path-absolute relative URLs are allowed.// If viaRequest is false, all forms of relative URLs are allowed.funcparse(rawurlstring, viaRequestbool) (*URL, error) {
varreststringvarerrerrorifrawurl==""&&viaRequest {
returnnil, errors.New("empty url")
}
url:=new(URL)
ifrawurl=="*" {
url.Path="*"returnurl, nil
}
// Split off possible leading "http:", "mailto:", etc.// Cannot contain escaped characters.ifurl.Scheme, rest, err=getscheme(rawurl); err!=nil {
returnnil, err
}
url.Scheme=strings.ToLower(url.Scheme)
ifstrings.HasSuffix(rest, "?") &&strings.Count(rest, "?") ==1 {
url.ForceQuery=truerest=rest[:len(rest)-1]
} else {
rest, url.RawQuery=split(rest, "?", true)
}
if!strings.HasPrefix(rest, "/") {
ifurl.Scheme!="" {
// We consider rootless paths per RFC 3986 as opaque.url.Opaque=restreturnurl, nil
}
ifviaRequest {
returnnil, errors.New("invalid URI for request")
}
// Avoid confusion with malformed schemes, like cache_object:foo/bar.// See golang.org/issue/16822.//// RFC 3986, §3.3:// In addition, a URI reference (Section 4.1) may be a relative-path reference,// in which case the first path segment cannot contain a colon (":") character.colon:=strings.Index(rest, ":")
slash:=strings.Index(rest, "/")
ifcolon>=0&& (slash<0||colon<slash) {
// First path segment has colon. Not allowed in relative URL.returnnil, errors.New("first path segment in URL cannot contain colon")
}
}
if (url.Scheme!=""||!viaRequest&&!strings.HasPrefix(rest, "///")) &&strings.HasPrefix(rest, "//") {
varauthoritystringauthority, rest=split(rest[2:], "/", false)
url.User, url.Host, err=parseAuthority(authority)
iferr!=nil {
returnnil, err
}
}
// Set Path and, optionally, RawPath.// RawPath is a hint of the encoding of Path. We don't want to set it if// the default escaping of Path is equivalent, to help make sure that people// don't rely on it in general.iferr:=url.setPath(rest); err!=nil {
returnnil, err
}
returnurl, nil
}
just add strings.TrimSpace(rawurl) to 477 line inside net/url source file.
The text was updated successfully, but these errors were encountered:
bradfitz
changed the title
[net/url] parse url error with leading space in url
net/url: parse url error with leading space in url
Mar 5, 2018
Please answer these questions before submitting your issue. Thanks!
What version of Go are you using (
go version
)?go version go1.9.2 darwin/amd64
Does this issue reproduce with the latest release?
YES
What operating system and processor architecture are you using (
go env
)?darwin/amd64
What did you do?
use goquery to crawl web pages
If possible, provide a recipe for reproducing the error.
A complete runnable program is good.
A link on play.golang.org is best.
What did you expect to see?
no error happens
What did you see instead?
[parse http://zhidao.baidu.com/special/view?id=a9105a24626975510000&preview=1: first path segment in URL cannot contain colon]
fix recommends
if leading character is space character, then continue, start to check when first character is not space.
or you can just use
TrimSpace
before invoke thegetscheme
method.just add
strings.TrimSpace(rawurl)
to 477 line insidenet/url
source file.The text was updated successfully, but these errors were encountered: