Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/xml: decoder fails to locate correct closing tag #56091

Closed
christopherhwood opened this issue Oct 7, 2022 · 4 comments
Closed

encoding/xml: decoder fails to locate correct closing tag #56091

christopherhwood opened this issue Oct 7, 2022 · 4 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@christopherhwood
Copy link

What version of Go are you using (go version)?

1.19, also reproducible on dev branch

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

Linux amd64

What did you do?

Link to go playground with problematic code example: https://go.dev/play/p/sW7Wgl67OFi

xml decoder fails to find correct closing tag.

What did you expect to see?

xml decoder should locate the correct closing tag.

What did you see instead?

@christopherhwood
Copy link
Author

Seems the problem is that the cdata section does not include a space before the ]]> closure. After adding a space it is properly handled by the xml decoder.

@cagedmantis cagedmantis changed the title affected/package: xml encoding/xml: decoder fails to locate correct closing tag Oct 7, 2022
@cagedmantis cagedmantis added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Oct 7, 2022
@cagedmantis cagedmantis added this to the Backlog milestone Oct 7, 2022
@cagedmantis
Copy link
Contributor

cc @rsc

@tenkoh
Copy link

tenkoh commented Oct 18, 2022

I assume that there are no issues on Go code, but the reproduction code on playground might have an encoding bug.

  • As we know, Go's source code is supposed to be encoded by UTF-8.
  • The original reproduction code contains Chinese sentences. Though they are encoded by UTF-8, the xml tag says it is encoded by GBK. So xml decoder fails to decode properly.
  • In the reproduction code, the decoding failure caused a lack of the CDATA's closing tag (see below). As a result, xml decoder skipped it and failed to parse xml behind the CDATA.
s := "润]]>"
r, _ := charset.NewReaderLabel("gbk", strings.NewReader(s))
b, _ := io.ReadAll(r)
fmt.Println(string(b))
// Output: 娑�]>

https://go.dev/play/p/TXhWsqyaSPY

How was the original problem? If the xml body is properly encoded by GBK, the xml decoder with gbk reader works well like below.

https://go.dev/play/p/JkyF8rOfWFU

@christopherhwood
Copy link
Author

Oh I see, that makes sense. Thanks for the explanation.

@golang golang locked and limited conversation to collaborators Oct 18, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

4 participants