Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/xml: Marshal/Escape allows invalid characters #4235

Closed
anacrolix opened this issue Oct 11, 2012 · 9 comments
Closed

encoding/xml: Marshal/Escape allows invalid characters #4235

anacrolix opened this issue Oct 11, 2012 · 9 comments
Labels
FrozenDueToAge Suggested Issues that may be good for new contributors looking for work to do.
Milestone

Comments

@anacrolix
Copy link
Contributor

http://play.golang.org/p/ac4WDMTt5t

What is the expected output?
xml error: Invalid character

What do you see instead?
"<string>\0</string>"

Please provide any additional information below.

Invalid XML characters pass xml.Marshal. In the example I've given, the null byte is
passed through. This causes many XML decoders to barf.

http://en.wikipedia.org/wiki/Valid_characters_in_XML
@minux
Copy link
Member

minux commented Oct 11, 2012

Comment 1:

xml.Escape also has this problem.

Labels changed: added priority-later, removed priority-triage.

Status changed to Accepted.

@robpike
Copy link
Contributor

robpike commented Oct 11, 2012

Comment 2:

This isn't easy. NUL bytes are forbidden, as are surrogates and some other code points.
It's easy to detect them; the problem is what to do?  Escape, for instance, has no error
return. The only solution I can see is to panic, which is heavy-handed.

@minux
Copy link
Member

minux commented Oct 12, 2012

Comment 3:

yeah, that's why i didn't fix it.
Marshal can return error, but the Escape case is difficult.
If we fix Escape by allowing it to panic, does it break the Go 1 compatibility promise?

@anacrolix
Copy link
Contributor Author

Comment 4:

I agree, it's heavy handed. However keep in mind this is actually a pretty serious bug,
as many heavy-hitting XML framework/super-libraries are pedantic about checking that
characters are valid. So the result of Marshal and Escape will cause errors in some
other process down the line. The case I have is in a SAX parser in some Java process I
don't control which abrubtly explodes when reading an attribute with a '\0' in it.
At least Marshal could return an error, and a CheckedEscape or something to that effect
could be used that allows returning an error.
Another idea is to add a function that will check a string for valid codepoints, which
could be used either before or after Escape, or whenever user code feels like it.

@rsc
Copy link
Contributor

rsc commented Dec 9, 2012

Comment 5:

1. Internally we will need the routine to know whether it is being called from something
that can return errors.
2. Bad characters result in an error if we're returning errors; otherwise they turn into
U+FFFD (best we can do).
3. Add EscapeText, a version of Escape that returns an error (issue #4112).

@rsc
Copy link
Contributor

rsc commented Dec 10, 2012

Comment 6:

Labels changed: added size-m.

@rsc
Copy link
Contributor

rsc commented Dec 10, 2012

Comment 7:

Labels changed: added suggested.

@gopherbot
Copy link

Comment 8 by osaingre:

Hi, I'm working on this.

@rsc
Copy link
Contributor

rsc commented Mar 14, 2013

Comment 9:

This issue was closed by revision f74eb6d.

Status changed to Fixed.

@anacrolix anacrolix added fixed Suggested Issues that may be good for new contributors looking for work to do. labels Mar 14, 2013
@rsc rsc added this to the Go1.1 milestone Apr 14, 2015
@rsc rsc removed the go1.1 label Apr 14, 2015
@golang golang locked and limited conversation to collaborators Jun 24, 2016
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge Suggested Issues that may be good for new contributors looking for work to do.
Projects
None yet
Development

No branches or pull requests

5 participants