Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/xml: line endings in data get replaced #24426

Closed
tehsphinx opened this issue Mar 16, 2018 · 3 comments
Closed

encoding/xml: line endings in data get replaced #24426

tehsphinx opened this issue Mar 16, 2018 · 3 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@tehsphinx
Copy link

What version of Go are you using (go version)?

go version go1.9.3 darwin/amd64

Does this issue reproduce with the latest release?

I created a copy of the xml package and applied the code changes of
https://go-review.googlesource.com/c/go/+/46433
which is supposed to fix issue
#20614
but the issue persists.

What did you do?

I get some xml from another service with data that contains line endings.
After parsing the data all line endings are standardized to \n.

I found this on the subject but wonder if that is supposed to even touch line endings inside CDATA. If it is just let me kindly know and I can move on to finding a workaround.
https://www.w3.org/TR/REC-xml/#sec-line-ends

Here a reproducable sample with and without CDATA escaping:
https://play.golang.org/p/PdnIyRD6Qsv

What did you expect to see?

My data with \r\n line endings intact.

What did you see instead?

My data with \n line endings.

@andybons andybons added this to the Unplanned milestone Mar 16, 2018
@andybons andybons added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Mar 16, 2018
@andybons
Copy link
Member

andybons commented Mar 16, 2018

Hm. The section on CDATA makes no note about normalizing line endings, however CDATA sections are unparsed entities and the line endings section seems to only apply to parsed entities.

This could go either way. Do you know how other parsers handle it?

@ianlancetaylor @rsc?

@tehsphinx
Copy link
Author

Did some more research on the topic:

Here on stackoverflow somebody argues that since this has to be done before parsing the xml parser does not yet know if the line ending is part of a CDATA section or not: stackoverflow

MSDN library states:

XML processors treat the character sequence Carriage Return-Line Feed (CRLF) like single CR or LF characters. All are reported as a single LF character. Applications can save documents using the appropriate line-ending convention.

So I guess golang xml parser is correctly implemented and I should use base64 encoding to get line endings across.

@andybons
Copy link
Member

OK. Closing for now. Let us know if you have any other concerns and feel free to re-open if you like.

@golang golang locked and limited conversation to collaborators Mar 18, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

3 participants