Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: net/url: add QueryParser and QueryEncoder interfaces #56300

Open
chrisguiney opened this issue Oct 18, 2022 · 2 comments
Open

proposal: net/url: add QueryParser and QueryEncoder interfaces #56300

chrisguiney opened this issue Oct 18, 2022 · 2 comments
Labels
Milestone

Comments

@chrisguiney
Copy link
Contributor

chrisguiney commented Oct 18, 2022

Summary

I would like to propose adding new interfaces to net/url to allow users to have control over url query string parsing and encoding implementations.

This can be done in a backwards compatible manner by

  1. introducing the interface
  2. providing a default implementation of the interface that implements current behavior
  3. exposing the default implementations as package level variables
  4. updating call sites in the standard library to accept interface values to prefer over the default implementation

With the DefaultQueryParser and DefaultQueryEncoder being package level variables, a user can set the implementations once in main or init. This allows the user to have confidence that their chosen implementation will be used, even if some usages have not yet been updated to accept a custom implementation.

This approach allows users to gradually widen their API to accept a QueryParser or QueryEncoder, allowing for implementation to be spread over multiple releases if necessary.

Proposed Interface Addition

var (
    DefaultQueryParser QueryParser = qsParser{}
    DefaultQueryEncoder QueryEncoder = qsParser{}
)

type QueryParser interface {
    Parse(string) (Values, error)
}

type QueryEncoder interface {
    Encode(Values) (string, error)
}

type qsParser struct{}

func (qsParser) Parse(s string) (Values, error) {
    // body would be the current body of ParseQuery
}

func (qsParser) Encode(v Values) (string, error) {
    // body would be the current body of Values.Encode
}


// update existing apis to use the DefaultQueryParser
// and DefaultQueryEncoder

func ParseQuery(s string) (Values, error) {
    return DefaultQueryParser.Parse(s)
}

func (v Values) Encode() string {
     s, _ := DefaultQueryEncoder.Encode()
    return s
}

Rationale

Other systems have their own interpretations of the url RFCs, which are often at odds with how go interprets them. This creates situations where a behavior can be a security issue for one user, but an intentional and relied upon behavior for others. There is no single implementation that can serve all users.

By allowing custom implementations to be supplied, we're able to provide a safe default that covers the majority of use cases, but still allow users with very specific needs the flexibility to provide their own implementations.

Users of the package include http.Server and httputil.ReverseProxy`. Both are too large to reasonably expect users to fork in order to call their own parsing and encoding implementations.

Historical Issues

There have been a number of issues surrounding query string parsing and encoding. Security issues have caused a number of backwards incompatible changes, which then motivated other changes to allow prior behavior to be restored.

For users sensitive to these changes, some of the above issues caused operational disruption. In other cases, the backwards compatibility breakages were in point releases.

Issues since proposal

edit: forgot interface keyword on QueryParser and QueryEncoder

2022/11/21 edit: added Issues since proposal

@gopherbot gopherbot added this to the Proposal milestone Oct 18, 2022
@neild
Copy link
Contributor

neild commented Oct 18, 2022

I don't think a global variable is a good idea: What happens if something parses a URL at init time? Or in a goroutine started at init time? Or if two places attempt to set the variable?

http.Request automatically calls ParseQuery to populate Request.Form on demand, but code that wants a different parser can always parse the query itself.

func httpHandler(w http.ResponseWriter, req *http.Request) {
  req.Form = myQueryParser(req.URL.RawQuery)
  // ...
}

ReverseProxy uses ParseQuery to sanitize forwarded URLs, but this behavior can be disabled:

// Go 1.20 Rewrite func:
proxy.Rewrite = func(r *httputil.ProxyRequest) {
  r.Out.URL.RawQuery = r.In.URL.RawQuery // preserve raw query string from inbound request
}
// Pre-1.20 Director func:
proxy.Director = func(r *http.Request) {
  // ...

  // Clear Request.Form before returning to disable query string sanitization.
  // Note that sanitization is not performed if Form is not set, so Director functions
  // that do not set Form (possibly implicitly by calling Request.FormValue) will never
  // sanitize the forwarded query.
  r.Form = nil
}

@chrisguiney
Copy link
Contributor Author

What happens if something parses a URL at init time? Or in a goroutine started at init time? Or if two places attempt to set the variable?

The simplest answer would be "don't do that". The globals haven't been an issue in other places in the standard library such as:

  • encoding/binary: both BigEndian and LittleEndian
  • net/http: DefaultServeMux, DefaultClient
  • flag: CommandLine
  • All package level errors

It's not my first choice in design decisions, but given the go1 compatibility promise, I think it's the most failsafe way for a user to divorce themselves from the default implementation.

If concurrency is a concern, I think it'd be reasonable to keep the variables themselves at the package level and have a package level mutex and SetDefaultParser(QueryParser) and SetDefaultEncoder(QueryEncoder) functions. This doesn't necessarily take care of the case that an init function could spawn a goroutine while another is setting it, and it may be indeterminate which parser is used. I think this is a reasonable risk because no such code currently exists, so it can be warned against from the start.

As a general rule I would consider it a faux pas to set it anywhere other than main


I should also note, and perhaps i wasn't clear in the proposal, that users of url.ParseQuery should be updated to accept a query parser/encoder, so they're not reliant on the global default.

That is, http server should be extended to have a field:

type Server struct {
    ... existing fields...
   QueryParser url.QueryParser
}

which should be used if set, falling back to url.ParseQuery.

This might actually give a means to disable the log message that gets printed when http.Server receives a url with a semicolon in it, but without using http.AllowSemicolons. If a user has a custom url parser set, it could omit the message.

(the log message was introduced in #25192)


http.Request automatically calls ParseQuery to populate Request.Form on demand, but code that wants a different parser can always parse the query itself.

This means that users need to maintain their own ParseForm, just to be able to use a different query string parser.

ReverseProxy uses ParseQuery to sanitize forwarded URLs, but this behavior can be disabled

Honestly, it didn't occur to me that it could be avoided in 1.19.2, I was under the impression that there was no way to restore prior behavior before 1.20. The release notes barely mentions it, let alone that it's breaking, and no public documentation was updated for it. The only place it's noted is on a comment made in the issue that initiated the change. I'm glad to know that there's an upgrade path until 1.20 comes out.


The list of hacks is adding up though:

  • custom form parser
  • reverse proxy director/rewrite
  • storing and restoring the url before and after http.AllowSemicolon or supplying a custom writer that omits the log message

There's also the overhead of having document and teach all of this to anyone onboarding onto a codebase.

At the end of the day, my program should be able to easily have a single consistent url parser that obeys the parsing rules I need it to follow. And I shouldn't have to fork net/url and net/http to do so. Nor any third party library that may use url.ParseQuery.

Life would be so much easier if I could easily set a global parser & encoder once, and just explain what behavior that's needed out of it that the default parser and encoder don't provide.

After two backwards incompatible changes in roughly a year, it'd feel much safer knowing that the go team can make whatever decisions around the default parser without impacting code that's very sensitive to parser changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Incoming
Development

No branches or pull requests

3 participants