Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

url.parseQuery supporting & but not ";" as separator #2210

Closed
gopherbot opened this issue Aug 31, 2011 · 9 comments
Closed

url.parseQuery supporting & but not ";" as separator #2210

gopherbot opened this issue Aug 31, 2011 · 9 comments

Comments

@gopherbot
Copy link

by mt4swm:

What steps will reproduce the problem?
1. run godoc -http :6060
2. in a browser, type
     http://127.0.0.1:6060/src/pkg/url/url.go?h=%22%26%22&;s=14652:14657#L534
   This will show url.parseQuery, with "Split" and "&" highlighted.
3. Now, type 
     http://127.0.0.1:6060/src/pkg/url/url.go?h=%22%26%22;s=14652:14657#L534

What is the expected output?

I would expect ";" being accepted as query string value separator, just like
"&".

What do you see instead?

The current implementation of parseQuery seems not to recognize ";",
but it handles "&". Thus the query values `h' and `s' of the second
variant cannot be decoded.

Which revision are you using?  (hg identify)

tip:47d429aad39c

Please provide any additional information below.

Apparently some cgi's and html pages rely on ";" being accepted as a value
separator in query strings, besides "&". There is a recommendation
suggesting this behaviour at
http://www.w3.org/TR/1999/REC-html401-19991224/appendix/notes.html#h-B.2.2 .

Would a patch addressing this be welcome, or, on the contrary, has ";" been
left out by intention?
@rsc
Copy link
Contributor

rsc commented Aug 31, 2011

Comment 1:

I am aware of the "recommendation".
However, I am unaware of any commonly used clients that
send ; instead of &, and given the lack of use in any
clients I don't see much point to supporting it on the server.
What web server libraries support ; ?
You pointed at HTML 4.01.  I would be more inclined if
the HTML 5 spec said something.
Russ

Owner changed to @rsc.

Status changed to WaitingForReply.

@gopherbot
Copy link
Author

Comment 2 by mt4swm:

[seems replying to go@googlecode.com did not work]
> I am aware of the "recommendation".
> > However, I am unaware of any commonly used clients that
> > send ; instead of &, and given the lack of use in any
> > clients I don't see much point to supporting it on the server.
> > What web server libraries support ; ?
I've checked a few. Some do, some do not:
Python: yes
    http://hg.python.org/cpython/file/2.7/Lib/urlparse.py#l379
        pairs = [s2 for s1 in qs.split('&') for s2 in s1.split(';')]
Ruby (CGI): yes
    http://ruby-doc.org/stdlib/libdoc/cgi/rdoc/classes/CGI.html#M000108
        query.split(/[&;]/).each do |pairs|
Ruby (/usr/lib/ruby/1.8/webrick/httputils.rb): yes
    def parse_query(str)
    ...
        str.split(/[&;]/).each{|x|
Haskell (CGI): no
    http://hackage.haskell.org/packages/archive/cgi/3001.1.8.2/doc/html/src/Network-CGI-Protocol.html#CGIRequest
        where (nv,rs) = break (=='&') s
              (n,v) = break (=='=') nv
Inferno: no
    inferno/appl/svc/httpd/cgiparse.b
Android: no
    core/java/android/net/Uri.java
        getQueryParameters
Could not look into:
.NET Framework 4 (HttpUtility.ParseQueryString): ?
    http://msdn.microsoft.com/de-de/library/ms150046.aspx
As for myself I am using a set of proprietary C libraries (cgi,
html templating) I wrote >10 years ago, which form a web
application running in embedded systems. I'm trying to extend
this system, or replace parts of it, using Go, and tried to
examine how it would fit together.
I agree that there shouldn't be a problem with
web clients and form data, as they always use "&".
What I had in mind are hyperlinks containing query strings,
like `<a href="foo.cgi?sort=1;limit=20;columns=3">...</a>'.  Such links
are emitted by some of my cgi programs as part of html pages. In
such cases I used to use ";" as separator, as ascii-only
query strings were easy to construct even from shell scripts.
> > You pointed at HTML 4.01.  I would be more inclined if
> > the HTML 5 spec said something.
Apparently the HTML5 spec only says -- in the url-encoded form
data section -- that `&` has to be used ("append a single U+0026
AMPERSAND character"), and I found nothing about form-data-like
query strings as part of URIs used in href attributes.  So one
might suppose that such strings should also contain &, not ";".
I can't tell how common the use of ";" still is.  As one can
see from sites like the google search page, in many cases &
is properly escaped as &, but not always:
  <a
href="/advanced_search?q=form+data+parse_query&hl=de&ie=UTF-8&prmd=ivns"
class="gl nobr" id="sflas">Erweiterte Suche</a>
vs.
  <a class=gb1
href="http://www.google.de/search?q=form+data+parse_query&um=1&ie=UTF-8&tbm=isch&source=og&sa=N&hl=de&tab=wi">Bilder</a>
Same for amazon.com, nytimes.com. One can get an idea why
";" was used alternatively, as in many generated pages both &
and & get inserted as separators in href attributes, depending on
whether a programmer took care or not.
Perhaps it is the best to forget about the semicolon for now,
and see if there will be a section in a new revision of the
HTML5 spec. Besides, it is probably better to fix broken
query-string generators in cgi programs (with increasing use
of utf-8 strings in query values there has to be a proper
escaping anyway, so one should be able to insert the &s
easily).
Michael

@rsc
Copy link
Contributor

rsc commented Sep 1, 2011

Comment 3:

The evidence you've gathered makes me likely
to do it.  What about PHP and Perl?

@gopherbot
Copy link
Author

Comment 4 by mt4swm:

For Perl there exist multiple ways to parse a query string,
two of them are provided by CGI.pm from the Perl core,
and Apache2::Request from libapreq. Both support "&" and ";":
CGI.pm:
    http://codesearch.google.com/#E4XixW5gvCc/pub/CPAN/src/latest.tar.bz2%7CNU9eyGOUCk8/perl-5.12.1/cpan/CGI/lib/CGI.pm&type=cs&l=792
    sub parse_params
        ...
        my(@pairs) = split(/[&;]/,$tosplit);
libapreq:
    http://svn.apache.org/viewvc/httpd/apreq/trunk/library/param.c?view=markup#l158
    APREQ_DECLARE(apr_status_t) apreq_parse_query_string ...
    {
        ...
        for (;;++qs) {
            switch (*qs) {
            ...
                case '&':
                case ';':
                ...
                        s = apreq_param_decode(...);
In PHP, query string parsing is done by
function ext/standard/string.c:parse_str() and
ext/mbstring/mb_gpc.c:mbstr_treat_data(), which is using a
configurable parameter "arg_separator.input" from php.ini,
which is "&" per default. If the semicolon needs to be
supported too, one will have to edit arg_separator.input
appropriately. Php.ini says:
    ; List of separator(s) used by PHP to parse input URLs into variables.
    ; PHP's default setting is "&".
    ; NOTE: Every character in this directive is considered as separator!
    ; http://php.net/arg-separator.input
    ; Example:
    ;arg_separator.input = ";&"
Drupal, btw, is also using PHP's parse_str, in drupal_parse_url().

@rsc
Copy link
Contributor

rsc commented Sep 5, 2011

Comment 5:

Status changed to Started.

@rsc
Copy link
Contributor

rsc commented Sep 6, 2011

Comment 6:

This issue was closed by revision 686181e.

Status changed to Fixed.

@gopherbot
Copy link
Author

Comment 7 by mr.jacob.good:

So the one issue I've found so far... RFC does not require form data posted in a body to
be URL Encoded, as far as I can tell. I'm running into this issue as I have a value with
an unencoded ";" in form value of a string.
parseQuery treats the body and the url query as being similar, where one has
restrictions that the other doesn't.
I could be wrong.

@rsc
Copy link
Contributor

rsc commented Mar 1, 2012

Comment 8:

Please raise this on the golang-nuts@ mailing list.  Let's see what other web people say.

@gopherbot
Copy link
Author

Comment 9 by mr.jacob.good:

Nevermind. After some more digging the wording is as such in RFC 1866, that the default
encoding is url encoded.
I actually found it to be a bug in my http client library explicitly not encoding ";".
8.2.1. The form-urlencoded Media Type
   The default encoding for all forms is `application/x-www-form-
   urlencoded'. A form data set is represented in this media type as
   follows:
8.2.3. Forms with Side-Effects: METHOD=POST
   If the service associated with the processing of a form has side
   effects (for example, modification of a database or subscription to a
   service), the method should be `POST'.
   To process a form whose action URL is an HTTP URL and whose method is
   `POST', the user agent conducts an HTTP POST transaction using the
   action URI, and a message body of type `application/x-www-form-
   urlencoded' format as above. The user agent should display the
   response from the HTTP POST interaction just as it would display the
   response from an HTTP GET above.

@golang golang locked and limited conversation to collaborators Jun 24, 2016
@rsc rsc removed their assignment Jun 22, 2022
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants