Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/xml: cannot marshal self-closing tag #21399

Open
halfcrazy opened this issue Aug 11, 2017 · 51 comments · May be fixed by nemith/netconf#27
Open

encoding/xml: cannot marshal self-closing tag #21399

halfcrazy opened this issue Aug 11, 2017 · 51 comments · May be fixed by nemith/netconf#27
Labels
NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@halfcrazy
Copy link

halfcrazy commented Aug 11, 2017

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

go version go1.8.3 darwin/amd64

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/xxx/.go:/Users/xxx/workspace/goproject"
GORACE=""
GOROOT="/usr/local/Cellar/go/1.8.3/libexec"
GOTOOLDIR="/usr/local/Cellar/go/1.8.3/libexec/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/53/r61bh2fj1hg9n0w03fs__7kc0000gn/T/go-build440341166=/tmp/go-build -gno-record-gcc-switches -fno-common"
CXX="clang++"
CGO_ENABLED="1"
PKG_CONFIG="pkg-config"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"

What did you do?

I want to marshal a xml struct which has a self-closing tag field. After looking up in golang doc, there isn't an official support. In google group they use string.Replace, it's ugly. It seems we could write marshal func for the custom struct, by i'm willing to see an official support like

	type Person struct {
		XMLName   xml.Name `xml:"person"`
		Id        int      `xml:"id,attr"`
		FirstName string   `xml:"name>first"`
		LastName  string   `xml:"name>last"`
		Age       int      `xml:"age"`
		Height    float32  `xml:"height,omitempty"`
		Married   bool
		Address
		Comment string `xml:",comment"`
		Balabala string `xml:"balabala,allowempty"`  // i want something like this
	}

In xml standard, a self-closing tag is permitted.
https://www.w3.org/TR/REC-xml/#dt-empty

And empty-element tag should be used, and should only be used, for elements which are declared EMPTY.
https://www.w3.org/TR/REC-xml/#d0e2480

@Manishearth
Copy link
Contributor

I like this idea; though i suspect the community will have to agree on it (and also the naming of the tag)

However, I whipped up a quick implementation at https://go-review.googlesource.com/59830 .

I'll polish it up and add tests if this proposal actually goes through.

@gopherbot
Copy link

Change https://golang.org/cl/59830 mentions this issue: encoding/xml: add 'allowempty' tag for self-closing XML tags

@ianlancetaylor ianlancetaylor added this to the Go1.10 milestone Aug 29, 2017
@ianlancetaylor
Copy link
Contributor

CC @rsc XML suggestion.

@SamWhited
Copy link
Member

Change https://golang.org/cl/59830 mentions this issue: encoding/xml: add 'allowempty' tag for self-closing XML tags

The proposed patch only works for marshaling structs. What if we're using the EncodeToken method and write a StartElement immediately followed by its corresponding EndElement? If this ends up being desired and it were an option on the Encoder instead it would work in both places (although it would be global for all tags, not local to specific struct fields, but that seems more desriable to me).

@Manishearth
Copy link
Contributor

We still need the ability to output non-selfclosing pairs of tags. The idea is to give the user the option to make certain fields self closing, not to do that for all fields.

@Manishearth
Copy link
Contributor

Oh, I see, you're suggesting this be global. Hm. I don't really like the idea of making this a global change, but we could also make it an option on the marshaler.

@SamWhited
Copy link
Member

I don't really like the idea of making this a global change

Is there a benefit to only doing this for some elements and not for others?

@Manishearth
Copy link
Contributor

For the same reason this bug exists in the first place; not all XML parsers handle this correctly and I wanted maximum flexibility.

I'm fine with doing it the other way though.

@SamWhited
Copy link
Member

SamWhited commented Oct 2, 2017

not all XML parsers handle this correctly and I wanted maximum flexibility

If a parser did not handle this you couldn't mix the two anyways, you'd just have to turn it on or off entirely to make sure that parser would work.

In case there's confusion, I'm not suggesting that we always do self-closing tags or non-self-closing tags, but that an individual *Encoder should be configurable to do one or the other.

@Manishearth
Copy link
Contributor

fair. I'll change the patches. Should/can I tag you for review?

@SamWhited
Copy link
Member

SamWhited commented Oct 2, 2017

fair. I'll change the patches. Should/can I tag you for review?

I don't have +2 permissions, but I'm happy to run trybots and provide feedback (also, setting this on the Encoder is my suggestion, but I don't actually have any authority here, so grain of salt until rsc or someone replies :) )

@dnaeon
Copy link

dnaeon commented Nov 18, 2017

Can we expect this to be implemented in Go 1.10?

Currently we have a Go library, that needs to communicate with a remote API endpoint, which can only understand empty XML elements with self-closing tags.

With the current implementation of XML package we are not able to use Go in order to communicate with that API endpoint. I realize that this is an issue of the remote side by not supporting both ways of specifying empty elements, but having the option to define how empty elements are defined in Go structs would be very useful and would allow us to overcome such edge cases.

Thanks,
Marin

@rsc rsc modified the milestones: Go1.10, Go1.11 Nov 22, 2017
@rsc
Copy link
Contributor

rsc commented Nov 22, 2017

Sorry, but we're not going to get to this for Go 1.10.

@iwdgo
Copy link
Contributor

iwdgo commented Mar 28, 2018

The current behaviour of documented tag "omitempty" allows the following workaround.
AllowEmpty string `xml:"allowempty>doesnotmatter,omitempty"'
will be marshalled as
<allowempty></allowempty>

@cfstras
Copy link

cfstras commented Apr 24, 2018

@iwdgo how does this solve the original question?

The desire is to produce either
<tag /> or <tag>notempty</tag>. Not <tag></tag>.

@iwdgo
Copy link
Contributor

iwdgo commented Apr 24, 2018

The answer was focused on the suggested additional tag. A medium fix is required to use self-closing tags. If implemented, it would not be optional without breaking Go 1 promise. #9519 and #8167 have the same constraint.

@bradfitz bradfitz modified the milestones: Go1.11, Go1.12 Jun 13, 2018
@ianlancetaylor ianlancetaylor modified the milestones: Go1.12, Go1.13 Dec 12, 2018
@ianlancetaylor ianlancetaylor added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Dec 12, 2018
@robzon
Copy link

robzon commented Jan 21, 2019

Any chances of this being addressed? Stumbled upon this issue while working with the XMPP protocol. Some clients will only accept <required/>, but not <required></required>.

@SamWhited
Copy link
Member

SamWhited commented Jan 21, 2019

@robzon see the milestone; this issue is marked for Go 1.13.

EDIT: I should have also said: "please file an issue against those clients when you find them; they are broken".

@robzon
Copy link

robzon commented Jan 21, 2019

It's been modified several times, so I'm not sure how reliable this is. Don't want to come across as complaining, I'm just curious on what can be realistically expected as I'm sure it's not a high priority thing.

@rsc rsc modified the milestones: Go1.14, Backlog Oct 9, 2019
@niklaskorz

This comment was marked as spam.

@0syntrax0

This comment was marked as spam.

@Otoru

This comment was marked as spam.

1 similar comment
@lcd1232

This comment was marked as spam.

@alexei38

This comment was marked as spam.

@hellt
Copy link

hellt commented May 6, 2022

is anyone aware of a custom marshaller example that can achieve self-closing aka empty tags?

@dpisarewski

This comment was marked as off-topic.

@ptxmac

This comment was marked as off-topic.

@dpisarewski

This comment was marked as off-topic.

@Merovius
Copy link
Contributor

Merovius commented Sep 6, 2022

I'm just running into this problem myself. I'd be interested in a fix too.

Side-note: This should probably be NeedsInvestigation or NeedsDecision, not NeedsFix.

@Manishearth

A potential middle ground is to make the field only affect marshalling structs, and for EncodeToken add a new SelfClosing token variant so the caller can deal with it.

To be clear, this token would only be accepted by EncodeToken, not being created by (*Decoder).Token, correct? Because otherwise, this would probably break consumers using that API, who type-switch over the returned tokens. Though we could add an option to *Decoder to generate them.

An alternative might be to add an Attr []Attr field to EndElement, as that's what a self-closing tag syntactically is. The restriction would be the same, that the *Decoder continues to synthesize StartElement/EndElement pairs, but EncodeToken could accept them. This seems nicer than adding a new token type, but it's probably less compatible.

Personally, I consider it more important to support the low-level (i.e. EncodeToken) use-case, than support he high-level (i.e. EncodeElement) use-case. It is always possible to drop to a lower level, but not vice-versa. I'm not opposed to solve it for EncodeElement as well, though.

@Manishearth
Copy link
Contributor

To be clear, this token would only be accepted by EncodeToken, not being created by (*Decoder).Token, correct?

Yeah.

@jybp
Copy link

jybp commented Dec 8, 2022

I'm curious what "clean" solution / workaround people have come up with to write self-closing elements with encoding/xml in the meantime?
I've only found the following which is just a regex on the whole content:
https://groups.google.com/g/golang-nuts/c/guG6iOCRu08/m/uMFigsrpCwAJ

@hellt
Copy link

hellt commented Dec 8, 2022

@jybp we use this

func forceSelfClosingTags(b []byte) []byte {

	emptyTagIdxs := regexp.MustCompile(`<(\w+)></\w+>`).FindAllSubmatchIndex(b, -1)

	var nb []byte

	for _, idx := range emptyTagIdxs {
		// get everything in b up till the first of the submatch indexes (this is the start of an
		// "empty" <thing></thing> tag), then get the name of the tag and put it in a self-closing
		// tag.
		nb = append(b[0:idx[0]], fmt.Sprintf("<%s/>", b[idx[2]:idx[3]])...) //nolint: gocritic

		// finally, append everything *after* the submatch indexes
		nb = append(nb, b[len(b)-(len(b)-idx[1]):]...)
	}

	return nb
}

@nemith
Copy link
Contributor

nemith commented Feb 17, 2023

I have been looking for this since I started writing Go and my original issue #6896 and still running into this.

In my particular use case a self-closing is being used (probably improperly by the API) to be a bool. If it exists then we get a "true" if it's missing we get a "false". There is a second use case where self-closed tags are used as variables, but there are ok workarounds for that.

It seems a custom type would be needed to follow the proposals "allowempty" proposal to be used a extant bool, but there are at least workarounds.

Given a "RawXML" token type we could have a proper workaround. #26756

@gopherbot
Copy link

Change https://go.dev/cl/469495 mentions this issue: encoding/xml: Add EmptyElement token.

@StEvUgnIn
Copy link

Do this

import (
	"regexp"
)

func SelfClosing(xml []byte) ([]byte, error) {
	regex, err := regexp.Compile(`></[A-Za-z0-9_]+>`)
	return regex.ReplaceAll(xml, []byte("/>")), err
}

@nemith
Copy link
Contributor

nemith commented Apr 27, 2023

@StEvUgnIn Pretty much eliminates stream mode and has overhead with regexp matching.

@Merovius
Copy link
Contributor

It's also not correct - it would, for example, also apply the replacement in comments.

@ttys3
Copy link

ttys3 commented Jun 13, 2023

just created a mod with patch in https://go.dev/cl/469495 applied.

and fixed a indent issue. in case someone need a workaround, here is the repo:

https://github.com/ttys3/go-xml#usage

@chadlwm
Copy link

chadlwm commented Sep 13, 2023

Do this

import (
	"regexp"
)

func SelfClosing(xml []byte) ([]byte, error) {
	regex, err := regexp.Compile(`></[A-Za-z0-9_]+>`)
	return regex.ReplaceAll(xml, []byte("/>")), err
}

the following case not work:

<student><name>xxx</name></student>

the ></student> also match the regex rule, but it's wrong

@chadlwm
Copy link

chadlwm commented Sep 13, 2023

the repo seems fine: https://github.com/ECUST-XX/xml/tree/master
however, why the offical doesnot support this feature? what's the key point?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

Successfully merging a pull request may close this issue.