-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
encoding/xml: unmarshal only processes first XML element #20754
Comments
I think there may be two issues here; the other is that the call to EDIT: To clarify, the "correct" way to write an array or slice type, assuming we were marshaling it as a complete document and not part of a stream, in XML would be something like this (names don't matter, I just stuck with "int" and arbitrarily picked "slice" as the root): <slice>
<int>1</int>
<int>2</int>
</slice> |
Sorry @SamWhited, but I disagree. Marshal clearly can be used to generate fragments of XML documents as well, for example if you pass it a slice of things to marshal. The docs explicitly contemplate this:
We can't change that now, and I think it would be a mistake to do so anyway, since it would take control of the wrapping away from the users. |
Fair enough; I did not take that statement in the docs to mean that it marshals each element and does not ensure validity or a root node, but as you said, there's probably no way to change that anyways even if we agreed. On an interesting, but unrelated note, in the paper you recommend ("The Essence of XML; Siméon, Wadler") it says that "lists" in XML are actually space-separated ( |
@SamWhited XSD is a rich specification language. With it, one could specify that the tag takes a space delimited list of numbers, or you could specify a comma delimited list of numbers, or whatever. The value inside of the tag would be an xs:string with further constraint of a regex pattern like ^[0-9]{1}[\s0-9]+[0-9]{1}$ (or something like that). A validating parser would validate that the string matches the regex pattern, but that list of ints would just be the string "1 2 3". It would be unusual to do this. I think someone would only do this in their schema if they were trying to save space. I am saddened that XML has fallen out of favor since the XSD specification language is so powerful (it's akin strong static typing vs what feels like dynamic typing in JSON). Unfortunately there is no way that I know of to validate an XML file against an XSD with pure Go. The only way to do it that I know of is with libxml2 and cgo. In fact, today I was musing at what it would take to port libxml2 to Go. I think cloc had libxml2 around 200k lines of C code. I image porting C to Go is somewhat 1-to-1. |
Any news about this issue? Half a year later still happening over here. |
No news, sorry. Any fixes will be reported here. |
unmarshal only processes first XML element |
If you Marshal []int{1,2} you get
<int>1</int><int>2</int>
, but then if you Unmarshal it back into a new slice, you get just []int{1}. Unmarshal simply stops after the first top-most XML element, because it is implemented as NewDecoder(bytes.NewReader(data)).Decode(v). When v is not a slice, this makes sense. But if v is a slice there's an argument that maybe Unmarshal should repeat the Decode calls until it reaches the end of the data. That's maybe easier said than done, and also maybe not a compatible change, but we should at least consider it. The Decoder itself is right to process only a single element, since it is processing an arbitrary input stream that might block if one reads too far. But Unmarshal, holding a []byte, has perfect knowledge of the remainder of the stream and might be able to do better. Or perhaps Unmarshal should return an error.In contrast, the equivalent input given to json.Unmarshal produces an error:
This is more justified in the case of JSON, since Marshaling []int{1,2} does not produce
[1] [2]
.The text was updated successfully, but these errors were encountered: