Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/xml: Parser accepts invalid prolog, parser.text may accept disallowed characters #1259

Closed
gopherbot opened this issue Nov 6, 2010 · 2 comments

Comments

@gopherbot
Copy link

by nigel.kerr:

versus revision 6725 and as early as the 2010-11-02 release.

What steps will reproduce the problem?
1. create an xml.Parser on a document that has disallowed characters between start of
document and root element, or between start of document and XML declaration, or between
start of document and DOCTYPE declaration
2. exhaust the parser with _, err parser.Token() until err == os.EOF

What is the expected output?

Expected is to return the first call to Token() with an error of some sort, about the
disallowed characters, and not parse further.

What do you see instead?

xml.Parser will return tokens all the way to EOF.  The disallowed characters are the
first token, and are of xml.CharData, followed by the rest of the document.


Which compiler are you using (5g, 6g, 8g, gccgo)?

6g


Which operating system are you using?

darwin
( uname -a
Darwin host-elided.local 10.4.0 Darwin Kernel Version 10.4.0: Fri Apr 23 18:28:53 PDT
2010; root:xnu-1504.7.4~1/RELEASE_I386 i386)


Which revision are you using?  (hg identify)

68aae563fd33+ tip


Please provide any additional information below.

I am attaching as small a go program as I can think of with examples inside it.

This from a conversation with Russ Cox on golang-nuts (
http://groups.google.com/group/golang-nuts/browse_thread/thread/ddabf01fdbe57c9f# )
about xml.Parser and results of using it with the XML Conformance Suite.

I believe that there are two specific difficulties: 

First, the initial call to RawToken() does not know that we are at the beginning of the
document, before the prolog.  At line 430, we check the first read byte for not being
'<', at which point we call p.text and create a CharData with the results.  This if
block would need to handle the before-prolog case with some state (that
no-longer-before-prolog-detection would need to change...).  There are some character
sequences that are permissible at this location, provided there isn't an XML declaration
(DOCTYPE and root element can have whitespace and comments and processing instructions
before them, it seems).

Second, parser.text() accepts at least some byte sequences that we don't think it should
(my attached example with have a single byte 0x12 at the beginning of the document,
which isn't in the XML Character Range).  This I don't haven't analyzed at all beyond
this.

Respectfully submitted,
Nigel Kerr

Attachments:

  1. badprolog.go (471 bytes)
@robpike
Copy link
Contributor

robpike commented Nov 6, 2010

Comment 1:

Owner changed to r...@golang.org.

Status changed to Accepted.

@rsc
Copy link
Contributor

rsc commented Dec 9, 2010

Comment 2:

This issue was closed by revision 27f2d5c.

Status changed to Fixed.

@mikioh mikioh changed the title xml.Parser accepts invalid prolog, parser.text may accept disallowed characters encoding/xml: Parser accepts invalid prolog, parser.text may accept disallowed characters Jan 9, 2015
@golang golang locked and limited conversation to collaborators Jun 24, 2016
@rsc rsc removed their assignment Jun 22, 2022
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants