Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/xml: XML CDATA section could be joined together with regular characters #12611

Open
pgundlach opened this issue Sep 14, 2015 · 6 comments
Milestone

Comments

@pgundlach
Copy link
Contributor

go version go1.5 darwin/amd64

One thing I stumbled across yesterday (not a real bug, but a minor nuisance from a user's perspective perhaps):

package main

import (
    "encoding/xml"
    "fmt"
    "strings"
)

func main() {
    src := `<root>a<![CDATA[b]]>c</root>`
    r := strings.NewReader(src)

    dec := xml.NewDecoder(r)
    for {
        tok, err := dec.Token()
        if err != nil {
            fmt.Println(err)
            break
        }
        fmt.Printf("%#v\n", tok)
    }
}

gives

xml.StartElement{Name:xml.Name{Space:"", Local:"root"}, Attr:[]xml.Attr{}}
xml.CharData{0x61}
xml.CharData{0x62}
xml.CharData{0x63}
xml.EndElement{Name:xml.Name{Space:"", Local:"root"}}
EOF

I would expect one xml.CharData{} token instead:

xml.StartElement{Name:xml.Name{Space:"", Local:"root"}, Attr:[]xml.Attr{}}
xml.CharData{0x61, 0x62, 0x63}
xml.EndElement{Name:xml.Name{Space:"", Local:"root"}}
EOF

While I understand the source of the three tokens, I would expect one as the user (= me) is unable to distinguish between a CDATA node and a regular text node.

@ianlancetaylor ianlancetaylor changed the title XML CDATA section could be joined together with regular characters encoding/xml: XML CDATA section could be joined together with regular characters Sep 14, 2015
@ianlancetaylor ianlancetaylor added this to the Unplanned milestone Sep 14, 2015
@lly-c232733
Copy link

lly-c232733 commented Dec 22, 2023

also is a problem if the xml element contains indented children and a cdata section

example:

<parent>
<![CDATA[Description of parent]]>
<child ID=1></child>
<child ID=2></child>
<child ID=3></child>
</parent>

",cdata" of parent ends up being:
\nDescription of parent\n\n\n\n

workaround is to use ",innerxml" and create a custom marshalXML/unmarshalXML method for your datatype

@pgundlach
Copy link
Contributor Author

also is a problem if the xml element contains indented children and a cdata section
[...]
",cdata" of parent ends up being: \nDescription of parent\n\n\n\n

Two comments from me:

  1. I believe this is expected: the string value of parent is what you write
  2. This is unrelated to the report (joining adjacent CDATA sections)

@lly-c232733
Copy link

lly-c232733 commented Dec 23, 2023

You said:

I would expect one as the user (= me) is unable to distinguish between a CDATA node and a regular text node.

I agree. However I can't distinguish cdata sections using unmarshal, and I can't distinguish cdata sections using token.

This behavior is not to spec:

When I simply write what I read using unmarshal and then marshal these types of cdata sections get a ton of extra newlines (in addition to the regular indent ones) all wrapped in cdata. Aka crazy output.

Very much against section 2.11
Of the spec:
https://www.w3.org/TR/xml/#sec-line-ends

@lly-c232733
Copy link

Example crazy output using ",cdata":

Input of 'Unmarshal'

<parent>
<![CDATA[Description of parent]]>
<child ID=1></child>
<child ID=2></child>
<child ID=3></child>
</parent>

Output of 'MarshalIndent'

<parent>
<![CDATA[
Description of parent



]]>
<child ID=1></child>
<child ID=2></child>
<child ID=3></child>
</parent>

@pgundlach
Copy link
Contributor Author

I still believe this is a different issue, related but not the same as this one. I am not the author of the XML package, so I can't give an authorative answer. Perhaps you should post code in a new bug report which shows the behaviour? This makes it easier to reproduce the problem. Your “crazy output” seems crazy to me, too. I think you have hit a bug, while my issue is just a nuisance.

@lly-c232733
Copy link

Fair enough, thanks for your feedback, and have a Merry Christmas

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants