Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: encoding/json: iter.Seq marshal support #65873

Open
mrwonko opened this issue Feb 22, 2024 · 9 comments
Open

proposal: encoding/json: iter.Seq marshal support #65873

mrwonko opened this issue Feb 22, 2024 · 9 comments
Labels
Milestone

Comments

@mrwonko
Copy link

mrwonko commented Feb 22, 2024

Proposal Details

I propose that json.Marshal should treat iter.Seq[T] like []T, encoding it as an array. Then we can e.g. write this:

json.Marshal(xiter.Map(time.Duration.String, []time.Duration{5*time.Second, 3*time.Hour}))
// -> ["5s","3h0m0s"]

and avoid allocating a slice.

@gopherbot gopherbot added this to the Proposal milestone Feb 22, 2024
@gophun
Copy link

gophun commented Feb 22, 2024

How would this be implemented?

func Marshal(v any) ([]byte, error) {
	if seq, ok := v.(iter.Seq[T]); ok {
		// It's a sequence
	}
}

is obviously not possible.

@gophun
Copy link

gophun commented Feb 22, 2024

Ok, it's possible with reflection, see #61897 (comment)

@dsnet
Copy link
Member

dsnet commented Feb 22, 2024

I would love for there to be a general-purpose API to dynamically represent types that is semantically equivalent to a Go slice or map, thus container types like linked lists or b-trees don't need custom MarshalJSON and UnmarshalJSON methods.

For this to happen I would expect:

  • The API to support both Marshal and Unmarshal. At present, the proposal only works for Marshal, but I'd argue that we need a solution for Unmarshal as well.
  • A universal interface for describing ordered values:
    • (for marshal) iterate through values
    • (for unmarshal) ability to reset the sequence and to append to the sequence
  • A universal interface for describing unordered key-value pairs:
    • (for marshal) iterate through key-value pairs
    • (for unmarshal) ability to reset the mapping and to get or set arbitrary keys

@dsnet
Copy link
Member

dsnet commented Feb 22, 2024

@gophun, the suggested implementation using reflect.Value.Call would have poor performance. In addition to my concerns about asymmetry between marshal and unmarshal, we would need first-class support or optimizations in "reflect" to make iteration fast.

@empire
Copy link
Contributor

empire commented Feb 23, 2024

To encode and decode iter.Seq using JSON, one way is to implement the json.Marshaler and json.Unmarshaler interfaces for iter.Seq.

Here's an example implementation:

func (s Seq[V]) MarshalJSON() ([]byte, error) {
	var buf bytes.Buffer
	buf.WriteByte('[')

	first := true
	for v := range s {
		if !first {
			buf.WriteByte(',')
		}
		first = false

		bytes, err := json.Marshal(v)
		if err != nil {
			return nil, err
		}
		buf.Write(bytes)
	}
	buf.WriteByte(']')

	return buf.Bytes(), nil
}

func (s *Seq[V]) UnmarshalJSON(bytes []byte) error {
	var vs []V
	if err := json.Unmarshal(bytes, &vs); err != nil {
		return err
	}
	*s = Iter(vs)
	return nil
}

This implementation allows iter.Seq to be marshaled into JSON format and unmarshaled from JSON format seamlessly.

https://go.dev/play/p/Jqg81JhEr9_f?v=gotip

@dsnet
Copy link
Member

dsnet commented Feb 23, 2024

To be an appropriate solution, I would expect the following properties:

  • Be streaming where no additional memory is needed other than the underlying storage of the backing store for the sequence. In this situation, we're allocating a new []V and ignoring whatever storage was backing the iterator.
  • To have no dependency on the "json" package. That is, the idea should work for XML, CBOR, YAML, etc. without forcing iter.Seq to have special knowledge of all of these.

@mrwonko
Copy link
Author

mrwonko commented Feb 24, 2024

Iterators as we currently (experimentally) have them (i.e. iter.Seq) are single-pass and read-only (what C++ would call an "input iterator"), which is why I only brought up Marshal and not Unmarshal.

To unmarshal, we'd need something equivalent to C++'s Output Iterator, then we could use an equivalent to a back_insert_iterator (we'd probably call it "append" instead of "back insert"). Once we have that, I can envision something similar to json.Decoder.UseNumber() for defining a custom way to unmarshal arrays. Interesting to think about, but out of scope of this ticket, I think.

@mrwonko
Copy link
Author

mrwonko commented Feb 24, 2024

Then again, making Marshal and Unmarshal asymmetric isn't great, either. For example let's look at the types generated by protoc for grpc/protobuf. When I make a request, I would like to be able to pass a seq.Iter for repeated fields. But when I receive a request, a slice is more convenient, as I can e.g. check its length before iterating through it. Besides, changing that would be a breaking change.

Does that mean protoc would need to generate two different versions of each message for sending and receiving? That does not seem desirable. If I just want to forward (part of) an incoming message to another service, I don't want to have to convert it first.

@apparentlymart
Copy link

apparentlymart commented Feb 27, 2024

It is possible in principle to return an iter.Seq that closes over a json.Decoder that consumes the array in a streaming fashion: https://go.dev/play/p/6G4GTrHMNo4?v=gotip

But of course the first round of iterators only supports infallible iterators, so any attempt to delay doing the JSON parsing creates the possibility of errors in a place where they cannot be handled.

In principle it could call json.Valid first to find a subset of possible errors before beginning iteration, at the expense of walking over the tokens twice. Even that isn't really sufficient though, because it's also possible that the syntactically-valid values inside are not compatible with type V. As far as I can tell the only way to deal with that is to a full parsing pass first, at which point buffering the result into memory to return during iteration is probably (but not necessarily) a win vs. parsing it over again during iteration. 🤷‍♂️

To me this just feels like a natural consequence of not supporting fallible iterators: it isn't appropriate to use the iterator abstraction for anything that can dynamically fail, and JSON parsing can dynamically fail.

(I acknowledge that this alternative is also JSON-specific and so fails the goal of being generalizable over multiple serialization formats, but I think the inability to handle errors disqualifies this approach even before considering any formats beyond JSON.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Incoming
Development

No branches or pull requests

6 participants