Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: encoding/xml: context-aware XML decoders #22290

Closed
MOZGIII opened this issue Oct 16, 2017 · 16 comments
Closed

proposal: encoding/xml: context-aware XML decoders #22290

MOZGIII opened this issue Oct 16, 2017 · 16 comments

Comments

@MOZGIII
Copy link

MOZGIII commented Oct 16, 2017

I think it would be a good idea to add context.Context to xml.Decoder.

Addition of context will allow passing various options to UnmarshalerXML implementations per decoding procedure, rather that by maintaining some state on the side.

There are xml.HTMLAutoClose and xml.HTMLEntity that could be also stored in the context.Context of xml.Decoder instead of having them as global values. To maintain backward compatibility those variables can remain to be consulted if no values is set on the context.

As a bonus, it will allow both internal and custom unmarshaling logic to abort parsing on deadline.

What do you think?

@gopherbot gopherbot added this to the Proposal milestone Oct 16, 2017
@MOZGIII MOZGIII changed the title Proposal: context-aware XML decoders Proposal: encoding/xml: context-aware XML decoders Oct 16, 2017
@ianlancetaylor ianlancetaylor changed the title Proposal: encoding/xml: context-aware XML decoders proposal: encoding/xml: context-aware XML decoders Oct 16, 2017
@ianlancetaylor
Copy link
Contributor

We don't normally use context.Context values to pass parameters to a function. A context.Context can hold values, but that is intended to hold things like a request ID for profiling, logging, and tracing.

Interrupting XML parsing based on a deadline doesn't sound like a very useful feature to me. It would be reasonable for the reader to implement some sort of deadline, but that doesn't require passing a context through the XML code.

@MOZGIII
Copy link
Author

MOZGIII commented Oct 16, 2017

I see. How about adding just an publicly accessible interface{} to xml.Decoder, i.e. xml.Decoder.Opts? This would effectively allow to provide configuration to UnmarshalXML configuration per call as well.

@ianlancetaylor
Copy link
Contributor

Can you give an example of where that would be used?

@MOZGIII
Copy link
Author

MOZGIII commented Oct 16, 2017

I'm implementing a custom XML parser, and I want to support "strict" and "non-strict" modes. Strict mode being disallowing cdata and unknown elements, and a few other tweaks.
I currently don't need to modify this state during the parsing, however I may need to parse some parts of xml document "strictly" and some "not strictly", and then I'd switch the strictness while inside the UnmarshalXML call (after processing a token).
I'm building this thing to work with some strange XML API I'm not in control of.

@ianlancetaylor
Copy link
Contributor

Seems like if you are implementing a custom XML parser, it can do whatever you like.

The question here is what do with the standard encoding/xml parser. encoding/xml doesn't take any options that an XML unmarshaler could reasonably change, so I don't see a reason to give the unmarshaling methods a mechanism for changing options.

@MOZGIII
Copy link
Author

MOZGIII commented Oct 16, 2017

Well, I have phrased incorrectly. I'm not building a custom parser, but custom UnmarshalXML implementations on to of standard library. Right now there's no elegant way to extend the xml.Decoder's state machine with custom logic, and I'd very much like to do that. What makes things worse is the fact that everywhere decoder is a concrete type, and not an interface.
Maybe there's a way to use a some kind of wrapper around standard xml.Decoder that would be accessible to xml.UnmarshalXML calls? Or any other way to carry on additional state...

@ianlancetaylor
Copy link
Contributor

It's likely that I still misunderstand, but the current XML parser is not designed to be extended, and that is not a feature we are going to add. We've had enough trouble with it. And of course one doesn't normally pass multiple types to an XML decoder, so I'm still having some trouble envisioning what you want.

@MOZGIII
Copy link
Author

MOZGIII commented Oct 17, 2017

This is a concrete example:

type Decoder struct {
    ...
    CustomState interface{}
}

However if it's been problematic with xml package and you don't want to touch it I can usnderstand why. I can always use a non-standard library, so no problem here.

@MOZGIII
Copy link
Author

MOZGIII commented Oct 17, 2017

More explaination:
That would be the only addition to the standard library. User would assign CustomState after creating a Decoder and be reading/reassigning it from an UnmarshalXML implementation.
Simple and useful.

@ianlancetaylor
Copy link
Contributor

I suppose you could do that by writing

type CustomUnmarshaler struct {
    State *MyState
    Val interface{}
}

c := CustomUnmarshaler {
    &state,
    nil
}
xml.Unmarshal(r, &c)

All the various custom unmarshal types would point to the same state. Afterward you would extract the values.

@MOZGIII
Copy link
Author

MOZGIII commented Oct 17, 2017

With that approach I kind of have to have a copy of the type hierarchy with every type wrapped with CustomUnmarshaller. It would work, it's true, but it is neither efficient nor elegant. This may be improved by doing the wrapping at runtime with reflection. However it's again not efficient. I thought go was idiomaticlly against stuff like that (this seems a lot like generics).

@ianlancetaylor
Copy link
Contributor

I agree it's not great. I wouldn't go so far as to say that Go is idiomatically against that kind of thing.

In any case I don't think the standard library encoding/xml package is going to provide the kind of flexibility you are looking for. It seems very special purpose to me, and will tend to be a maintenance burden on the standard library. I recommend just copying the standard library package and modifying it to serve your needs.

@davecheney
Copy link
Contributor

davecheney commented Oct 17, 2017 via email

@rsc
Copy link
Contributor

rsc commented Feb 5, 2018

[Repeating answer on very similar #23059]

Contexts are not for per-operation options. It's for timeouts and more global things like authorization credentials, not fine-grained per-call options. We pulled sql.TxOption out of context before Go 1.9 for the same reason. Think about "would this be appropriate to set and force into all operations during a given client RPC", even nested operations calling into unrelated different APIs in back ends (to gather information for the response). Unfortunately I don't know of a good post explaining this nuance with context.

More generally, the xml package is almost completely done. If you need significantly extended semantics (and it seems you do), then it would be fine to copy the current one and make your own modifications, creating a new package customized to your use case.

@rsc rsc closed this as completed Feb 5, 2018
@MOZGIII
Copy link
Author

MOZGIII commented Feb 5, 2018

Makes sense. I agree with your point @rsc.

By the way, is it a good practice to copy the parts of the standard library?

@ianlancetaylor
Copy link
Contributor

By the way, is it a good practice to copy the parts of the standard library?

You'll get a better answer to this question over on golang-nuts or some other forum (https://golang.org/wiki/Questions). Personally I see no problem with it but others may have different perspectives.

@golang golang locked and limited conversation to collaborators Feb 5, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants