Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/xml: Decoder escapes tab characters #47464

Closed
chrisdoherty4 opened this issue Jul 29, 2021 · 4 comments
Closed

encoding/xml: Decoder escapes tab characters #47464

chrisdoherty4 opened this issue Jul 29, 2021 · 4 comments
Labels
FrozenDueToAge WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.

Comments

@chrisdoherty4
Copy link

chrisdoherty4 commented Jul 29, 2021

What version of Go are you using (go version)?

Go 1.16.6 (play.golang.org)

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

play.golang.org

What did you do?

Use the encoding/xml packages Encoder to output a tab character in xml.CharData.

https://play.golang.org/p/oFvH6HEn310

package main

import (
	"bytes"
	"encoding/xml"
	"fmt"
	"io"
)

func parse(doc []byte) []byte {
	var buf bytes.Buffer
	encoder := xml.NewEncoder(&buf)

	decoder := xml.NewDecoder(bytes.NewReader(doc))

	func() {
		for {
			token, err := decoder.Token()
			if err == io.EOF {
				return
			} else if err != nil {
				panic(err)
			}

			if _, ok := token.(xml.CharData); ok {
				fmt.Printf("%s\n", token)
			}

			encoder.EncodeToken(token)
		}
	}()

	encoder.Flush()

	return buf.Bytes()
}

func main() {
	// xml document with tabs in.
	doc := []byte(`<Document>Hello		World</Document>`)

	fmt.Printf("Original: %s\n", doc)
	doc = parse(doc)
	fmt.Printf("Process 1: %s\n", doc)
	doc = parse(doc)
	fmt.Printf("Process 2: %s\n", doc)
}

Output:

Original: <Document>Hello		World</Document>
Hello		World
Process 1: <Document>Hello&#x9;&#x9;World</Document>
Hello		World
Process 2: <Document>Hello&#x9;&#x9;World</Document>

What did you expect to see?

Unescaped tab bytes are output as unescaped tab bytes.

What did you see instead?

Unescaped tab bytes are output as escaped tab bytes.

@chrisdoherty4 chrisdoherty4 changed the title encoding/xml packages Decoder escapes tab characters encoding/xml: Decoder escapes tab characters Jul 29, 2021
@chrisdoherty4
Copy link
Author

chrisdoherty4 commented Jul 29, 2021

This could actually be worse than I thought. CDATA tags are totally removed.

https://play.golang.org/p/gPrYuFfRcRK

Edit: this looks like a misunderstanding on my behalf about CDATA. Its not a tag, its just a section used to denote unescaped values that would otherwise mess with the document such as <.

I'm still unsure as to why the Encoder is deciding to escape tabs though, that doesn't seem right?

@neild
Copy link
Contributor

neild commented Jul 30, 2021

Simpler demonstration of encoding/xml marshaling tabs:

s, _ := xml.Marshal("\t")
fmt.Println(string(s))

Prints:

<string>&#x9;</string>

I'm not certain what's supposed to be not right here. &#x9; is a valid character reference representing a tab.

@neild neild added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Jul 30, 2021
@chrisdoherty4
Copy link
Author

It might not really be a bug, frankly. It just seems odd that the encoder doesn't respect whats input. The XML it outputs is totally valid though.

@neild
Copy link
Contributor

neild commented Aug 3, 2021

It just seems odd that the encoder doesn't respect whats input.

I'm not certain what this means; the encoder produces valid XML representing the input. Possibly escaping a tab character is unnecessary, but it certainly isn't wrong.

@golang golang locked and limited conversation to collaborators Aug 5, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Projects
None yet
Development

No branches or pull requests

3 participants