Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: net/url: url manipulation after creation/parsing #40239

Closed
CommoDor64 opened this issue Jul 16, 2020 · 8 comments
Closed

proposal: net/url: url manipulation after creation/parsing #40239

CommoDor64 opened this issue Jul 16, 2020 · 8 comments

Comments

@CommoDor64
Copy link

CommoDor64 commented Jul 16, 2020

What version of Go are you using (go version)?

go1.14.2 darwin/amd64

Does this issue reproduce with the latest release?

It is reproducible also on Playground, go1.14.4

What operating system and processor architecture are you using (go env)?

GO111MODULE="on"
GOARCH="amd64"
GOHOSTARCH="amd64"
GOHOSTOS="darwin"

What did you do?

net/url presents weird behaviour in some situations

Now, the issue I spot, is that there is no way to re-parse the same struct on url structure change.
All changes to the url.URL struct are direct property manipulation and not with proper struct fucntions

package main

import (
	"fmt"
	"net/url"
)

func parseURLWithoutScheme() *url.URL {
	u, err := url.Parse("somedomain.com/firstpath")
	if err != nil {
		return nil
	}
	return u
}

func parseURLWithScheme() *url.URL {
	u, err := url.Parse("https://somedomain.com/firstpath")
	if err != nil {
		return nil
	}
	return u
}

func main() {

	fmt.Println("1 path WITH scheme intialized:", parseURLWithScheme().Path) 
        // output: /firstpath

	u := parseURLWithoutScheme()
	fmt.Println("2 path WITH NO scheme intialized:", u.Path) 
        // output: somedomain.com/firstpath
	
	u.Scheme = "https"
	fmt.Println("3 path WITH NO scheme intialized, scheme was added later:", parseURLWithoutScheme().Path)
        // output: somedomain.com/firstpath

	u, _ = url.Parse(u.String())
	fmt.Println("4 same path of url struct in example 3, but reparsed into url.URL:", u.Path)
        // output: /firstpath

}

Available here
https://play.golang.org/p/3SkxRVIgmYS

The inability to change the url structure is such a way after creation is quirky.

Proposal

Add manipulation methods such as

func (*URL) SetPath(path string) { // re-parse and set properties respectively}

or allowing deep copy

func (*URL) DeepCopy() *URL { // re-parse and create a new URL struct}
@gopherbot gopherbot added this to the Proposal milestone Jul 16, 2020
@icholy
Copy link

icholy commented Jul 23, 2020

Why not just add the scheme to your string?

func ParseURL(rawurl string) (*url.URL, error) {
	u, err := url.Parse(rawurl)
	if err != nil || u.Scheme != "" {
	    return u, err
	}
	return url.Parse("https://" + rawurl)
}

@ianlancetaylor
Copy link
Contributor

Why does this needs to be in the standard library? How often does it come up?

@ianlancetaylor ianlancetaylor added this to Incoming in Proposals (old) Aug 7, 2020
@ianlancetaylor ianlancetaylor changed the title proposal: url manipulation after creation/parsing proposal: net/url: url manipulation after creation/parsing Aug 7, 2020
@CommoDor64
Copy link
Author

Why does this needs to be in the standard library? How often does it come up?

It's comes up every time that a URL is being manipulated in such a way. Maybe indeed a rare use-case but it creates an inconsistency that I cannot reason, if someone can do it, I would love to hear why the behavior is as presented and why a re-parsing function won't be an acceptable solution.

Thanks for your comment

@icholy
Copy link

icholy commented Aug 7, 2020

@CommoDor64 what does your proposed SetPath method do? Also why would you need to re-parse anything when making a copy?

@rsc
Copy link
Contributor

rsc commented Aug 12, 2020

@CommoDor64, in your example, the problem is that the definition of URLs is that "example.com/foo" is the same as "foo/bar" - they are both path-only URLs, with no domain at all. Even though example.com looks like a domain, it is not in this case. That is, it is not just a "URL without scheme". It is a "URL without scheme and without domain".

If you want a URL without scheme but with domain, that syntax is "//example.com/foo". Parsing that, setting u.Scheme, and reparsing u.String does work.

The problem here seems to be confusion about the URL syntax as defined by the RFC, not a bug in the Go library, which follows the RFC.

@CommoDor64
Copy link
Author

@rsc Thanks for taking the time to reply.
After looking at the RFC https://tools.ietf.org/html/rfc3986 I have to agree completely.

Thanks everyone, I will close the issue

@martisch martisch removed this from Incoming in Proposals (old) Aug 14, 2020
@rsc rsc added this to Declined in Proposals (old) Aug 14, 2020
@rogpeppe
Copy link
Contributor

ISTM that the odd thing here is that when there's a URL with a scheme but no host, the string representation makes the first element of the path into the host. Is there something better that it could do in that case? I suspect not.

@ainar-g
Copy link
Contributor

ainar-g commented Aug 15, 2020

FWIW, I've had several instances where I needed a Clone() *url.URL method. It's not that hard to write, but I still think, that it would be nice to have in the standard library.

@golang golang locked and limited conversation to collaborators Aug 15, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
No open projects
Development

No branches or pull requests

7 participants