Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/xml: Decoder doesn't recognize valid characters in entity names #3813

Closed
gopherbot opened this issue Jul 10, 2012 · 3 comments
Closed
Milestone

Comments

@gopherbot
Copy link

by dmitri.m:

What steps will reproduce the problem?
According to the XML spec, the rules for valid entity names are more complex than what
(*xml.Decoder) Decode() uses. For example, the following are all legal: &a-b;
&C.D; &e·321;.
http://www.w3.org/TR/REC-xml/#sec-entity-decl
But the following program fails with parsing errors:
http://play.golang.org/p/Y2-dIsoAXE

What is the expected output?
Valid entity names should be properly parsed without errors.

What do you see instead?
In Go 1.0.2 parsing fails with an "XML syntax error on line 7: invalid character
entity &a;" error.
In tip (695f65745351) parsing fails with an "XML syntax error on line 7: invalid
character entity &a (no semicolon)" error.
Note that the part of the entity name starting with the first unrecognized character is
ignored and in case the user-provided entity map includes another entity that matches
the initial characters, then that entity will be used instead. For example, if
xml.Decoder.Entity is set to map[string]string{"a": "first",
"a-ignored": "second"} then &a-ignored; will be parsed as if it
were &a;.

Which compiler are you using (5g, 6g, 8g, gccgo)?
6g

Which operating system are you using?
OS X 10.7

Which version are you using?  (run 'go version')
go1.0.2 and tip:
parent: 13709:695f65745351 tip
branch: default

Please provide any additional information below.
The straightforward fix would be to update the character range conditions in
pkg/encoding/xml/xml.go:870 to include additional valid characters. But the XML spec
specifies different constraints for the initial character of the name and the rest of
the name (see http://www.w3.org/TR/REC-xml/#NT-Name) so more invasive changes will be
required for full compliance.
@robpike
Copy link
Contributor

robpike commented Jul 12, 2012

Comment 1:

Labels changed: added priority-later, removed priority-triage.

Status changed to Accepted.

@rsc
Copy link
Contributor

rsc commented Sep 12, 2012

Comment 2:

Labels changed: added go1.1.

@rsc
Copy link
Contributor

rsc commented Oct 22, 2012

Comment 3:

This issue was closed by revision 2e67dd8.

Status changed to Fixed.

@rsc rsc added this to the Go1.1 milestone Apr 14, 2015
@rsc rsc removed the go1.1 label Apr 14, 2015
@golang golang locked and limited conversation to collaborators Jun 24, 2016
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants