Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/xml: support for collecting all attributes #3633

Closed
rsc opened this issue May 17, 2012 · 29 comments
Closed

encoding/xml: support for collecting all attributes #3633

rsc opened this issue May 17, 2012 · 29 comments

Comments

@rsc
Copy link
Contributor

rsc commented May 17, 2012

Received via private mail.  Think about for Go 1.1.

---

I'm currently using "encoding/xml" to read some XML into some structs. All is
going well until I hit an XML type that could have n number of attributes with p number
of child nodes (and each child node can follow the same rules). I think I have the child
node thing solved, but what about collecting all of the attributes?

This is what I have at this point:

    type Extensions struct {
        XMLName xml.Name
        Attrs   []string     `xml:",attr"`     // Does not work. Need a suggestion here.
        Data    string       `xml:",chardata"`
        Nodes   []Extensions `xml:",any"`
    }

Thanks in advance for any help.
@niemeyer
Copy link
Contributor

Comment 1:

We already have xml.Attr. This should be supported:
type Extensions struct {
        ...
        Attrs []xml.Attr
        ...
}

@rsc
Copy link
Contributor Author

rsc commented May 17, 2012

Comment 2:

Sounds good.  What's the trigger?  Any []xml.Attr?  Does there need to
be a tag like ,any?

@niemeyer
Copy link
Contributor

Comment 3:

The type seems enough of a hint in the described case, but we should probably enforce
the use of ",attr" with it, to sanitize the interaction with attributes in nested
elements.
We have this today, which is quite useful:
    Value  string    `xml:"sub>node"
Several people asked to complement with this:
    Attr  string     `xml:"sub>node>attrname,attr"
So this would be the counterpart:
    Attrs []xml.Attr `xml:"sub>node,attr"`
And in the simple case:
    Attrs []xml.Attr `xml:",attr"`

@rsc
Copy link
Contributor Author

rsc commented May 17, 2012

Comment 4:

sgtm

@anacrolix
Copy link
Contributor

Comment 5:

Is there any way to make use of xml.Attr this way in the xml package for Go 1.0? Do I
have to use the []string `xml:",attr"` for now?

@rsc
Copy link
Contributor Author

rsc commented Sep 12, 2012

Comment 6:

Should probably start on this if its for Go 1.1.

@rsc
Copy link
Contributor Author

rsc commented Dec 10, 2012

Comment 7:

Labels changed: added size-m.

@rsc
Copy link
Contributor Author

rsc commented Mar 12, 2013

Comment 8:

I am sad to say it, but I think we will have to postpone XML work until
after Go 1.1.
I regret that we didn't have more time to make encoding/xml better, but
given the tradeoff I think focusing on core performance and
implementation pieces for this final release push is probably the right
choice. Unlike most of the performance and other stuff we're trying to
shake out right now, functionality such as XML parsing can be provided
by go get-able libraries as a stopgap until Go 1.2.

Labels changed: added go1.2, removed go1.1.

Owner changed to ---.

@dominikh
Copy link
Member

Comment 9:

Is this still being considered for Go 1.2?

@rsc
Copy link
Contributor Author

rsc commented Jul 15, 2013

Comment 10:

I said sgtm in #4 but now I am not so sure. All the > confuse me.
Go 1.2 will likely have support for custom marshalers and unmarshalers. Perhaps that
will be good enough and we can postpone this specific thing until we have experience
using those.

@rsc
Copy link
Contributor Author

rsc commented Jul 30, 2013

Comment 11:

Labels changed: added feature.

@mattetti
Copy link
Contributor

mattetti commented Aug 1, 2013

Comment 12:

Let say you have some XML like that:
<FileRef>
  <Name Value="my-doc.pdf" />
</FileRef>
To extract the value info I have to create 2 structure types:
type FooFile struct {
    Filename             AttrValue    `xml:"FileRef>Name"`
}
// Wrapper structure used to extract XML node value attributes (string).
type AttrValue struct {
    Value string `xml:",attr"`
}
And once I unmarshal my XML and get an object of type FooFile, I need to call 
file.Filename.Value()
Being able to use `xml:"FileRef>Name,Value"`  would be nice for sure. I personally
don't find that the > are confusing. Not sure how the new custom unmarshelers will
work tho.

@robpike
Copy link
Contributor

robpike commented Aug 29, 2013

Comment 13:

Letting this soak until after 1.2 and the new marshaling code gets a chance.

Labels changed: removed go1.2.

@rsc
Copy link
Contributor Author

rsc commented Nov 27, 2013

Comment 14:

Labels changed: added go1.3maybe.

@rsc
Copy link
Contributor Author

rsc commented Nov 27, 2013

Comment 15:

Labels changed: removed feature.

@rsc
Copy link
Contributor Author

rsc commented Dec 4, 2013

Comment 16:

Labels changed: added release-none, removed go1.3maybe.

@rsc
Copy link
Contributor Author

rsc commented Dec 4, 2013

Comment 17:

Labels changed: added repo-main.

@gopherbot
Copy link

Comment 18 by andrewjohnmigliore:

Another vote for supporting something like:
    Attr  string     `xml:"sub>node>attrname,attr"
    Attrs []xml.Attr `xml:"sub>node,attr"`
    Attrs []xml.Attr `xml:",attr"`
Without this support, parsing XML that is attribute laden and #CDATA light using
xml.Unmarshal() is just down right plain ugly and goes against the concise nature of
golang!
cheers

@andredasilvapinto
Copy link

+1 for having support for unmarshalling attributes of a specific node without having to replicate the entire struct hierarchy.

http://stackoverflow.com/q/27404456/43046

@rsc rsc added this to the Unplanned milestone Apr 10, 2015
@grmartin
Copy link

grmartin commented Jun 1, 2015

I still vote for this. Too bad its not a priority for you folks, looks like I'm gonna roll my own.

@raitucarp
Copy link

up.. I need this feature

@Zilog8
Copy link

Zilog8 commented Sep 25, 2015

A feature like this would make some xml encodings much more concise. For example, it would simplify item handling in MRSS from this:

type Thumbnail struct {
     Url string `xml:"url,attr"`
}

type Content struct {
     Url      string `xml:"url,attr"`
     Bitrate  string `xml:"bitrate,attr"`
     Duration string `xml:"duration,attr"`
     Height   string `xml:"height,attr"`
}

type Item struct {
     Title  string    `xml:"title"`
     Thumb  Thumbnail `xml:"media:thumbnail"`
     Media  Content   `xml:"media:content"`
}

Into this:

type Item struct {
     Title    string `xml:"title"`
     ThumbUrl string `xml:"media:thumbnail>url,attr"`
     MediaUrl string `xml:"media:content>url,attr"`
     Bitrate  string `xml:"media:content>bitrate,attr"`
     Duration string `xml:"media:content>duration,attr"`
     Height   string `xml:"media:content>height,attr"`
}

@mrcook
Copy link

mrcook commented Oct 15, 2015

I've started to port my Ruby EPUB tool to Go and I'm having trouble with gathering up the book [OPF] Metadata due to there being an arbitrary number/type of nodes, with an arbitrary number/type of attributes. So being able to use Attrs []xml.Attr to collect them up would be a great feature. Is there any possibility of this feature being added?

Here's an example of the kind of data that needs parsing:

<metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:opf="http://www.idpf.org/2007/opf">
  <dc:identifier id="pub-identifier">_simple_book</dc:identifier>
  <meta refines="#pub-identifier" property="dcterms:identifier">_simple_book</meta>
  <dc:title id="pub-title">A Book</dc:title>
  <meta refines="#pub-title" property="dcterms:title">A Book: Subtitle</meta>
  <dc:date opf:event="original-publication">2015-10-10</dc:date>
  <dc:date opf:event="publication">2015-10-10</dc:date>
  <dc:language>en</dc:language>
  <dc:creator opf:role="aut" opf:file-as="Doe, Jon">Jon Doe</dc:creator>
  <dc:subject>Fiction</dc:subject>
  <dc:description>Some description</dc:description>
  <dc:publisher>A Publisher</dc:publisher>
  <dc:rights>Copyright</dc:rights>
  <meta content="cover-image" name="cover"/>
</metadata>

@ghost
Copy link

ghost commented Oct 22, 2015

If anyone needs a generic solution to collecting an array of attributes, this is what I use currently:

type Node struct {
    XMLName    xml.Name
    Attributes []xml.Attr
    Data       string
    Nodes      []*Node
}

func (e *Node) UnmarshalXML(d *xml.Decoder, start xml.StartElement) error {
    var nodes []*Node
    var done bool
    for !done {
        t, err := d.Token()
        if err != nil {
            return err
        }
        switch t := t.(type) {
        case xml.CharData:
            e.Data = strings.TrimSpace(string(t))
        case xml.StartElement:
            e := &Node{}
            e.UnmarshalXML(d, t)
            nodes = append(nodes, e)
        case xml.EndElement:
            done = true
        }
    }
    e.XMLName = start.Name
    e.Attributes = start.Attr
    e.Nodes = nodes
    return nil
}

func (e *Node) MarshalXML(enc *xml.Encoder, start xml.StartElement) error {
    start.Name = e.XMLName
    start.Attr = e.Attributes
    return enc.EncodeElement(struct {
        Data  string `xml:",chardata"`
        Nodes []*Node
    }{
        Data:  e.Data,
        Nodes: e.Nodes,
    }, start)
}

It would be nice to have this support built in.

https://play.golang.org/p/o60LVVmpgq

@ghost
Copy link

ghost commented Oct 22, 2015

Are you accepting contributions for:

type Node struct {
        ...
         Attrs []xml.Attr `xml:",attr"`
        ...
}

If so, I would happily make the change.

@gopherbot
Copy link

CL https://golang.org/cl/16292 mentions this issue.

@ivankravchenko
Copy link

I came up with some kind of working code for @Zilog8's example in
#3688 (comment)

Code diff gist: https://gist.github.com/ivankravchenko/036f68e671e33179b636bd58f6ebc9d0

@gopherbot
Copy link

CL https://golang.org/cl/30946 mentions this issue.

@asafschers
Copy link

https://golang.org/doc/go1.8 -
Unmarshal now has wildcard support for collecting all attributes using the new ",any,attr" struct tag.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests