Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: encoding/json: new MarshalAppender interface to make custom marshallers more efficient #34701

Closed
philpearl opened this issue Oct 4, 2019 · 20 comments

Comments

@philpearl
Copy link
Contributor

What version of Go are you using (go version)?

$ go version
go version go1.13 darwin/amd64

Does this issue reproduce with the latest release?

Yes. And with the head of the master branch.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE="off"
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/phil/Library/Caches/go-build"
GOENV="/Users/phil/Library/Application Support/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/phil/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/Cellar/go/1.13/libexec"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/Cellar/go/1.13/libexec/pkg/tool/darwin_amd64"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/0j/nhsrh_152tg44s03bvq7mrh80000gn/T/go-build757131500=/tmp/go-build -gno-record-gcc-switches -fno-common"

Problem

If you marshal a struct with time.Time field in it there are additional allocations per time field. There appear to be 3 reasons for this.

  1. the json.Marshaler interface forces you to allocate a []byte to return the result rather than following an 'append' or 'io.Writer' pattern
  2. the json.Marshaler interface isn't trusted to return good JSON, so there's additional overhead validating and compacting the returned data when copying it into the underlying buffer
  3. If the MarshalJSON function has a non-pointer receiver there's an allocation creating the interface value to call the function

These inefficiencies exist for any type that implements MarshalJSON, but time.Time is a big issue in code I've been involved with.

Proposal

I think we can remove the first 2 of these by creating a new interface that follows the append pattern and trusts implementations to create compact valid JSON.

// MarshalAppender is implemented by types that can marshal themselves into compact and
// valid JSON. Implementations should append their JSON to the data parameter and return 
// the resulting byte slice. MarshalAppender can be implemented more efficiently than
// Marshaler. If a type implements both Marshaler and MarshalAppender then 
// MarshalAppender is used in preference.
type MarshalAppender interface {
	MarshalAppendJSON(data []byte) ([]byte, error)
}

I think an "Append" model is preferable to using io.Writer as the standard library has many append methods, e.g. time.Time has AppendFormat, and there are many Append functions in strconv. I suspect that means an interface following the append pattern will be less surprising and easier to use efficiently.

I've knocked together a prototype implementation with the following results on a simple benchmark (interestingly I can't find a benchmark that covers using a custom marshaler in the existing codebase).

benchstat before.txt after.txt 
name                       old time/op    new time/op    delta
MarshalMarshaler-8            128ns ± 0%      94ns ± 8%  -26.25%  (p=0.000 n=7+8)
MarshalMarshalerEncoder-8     132ns ± 3%      90ns ± 1%  -32.12%  (p=0.001 n=7+6)

name                       old alloc/op   new alloc/op   delta
MarshalMarshaler-8            96.0B ± 0%     24.0B ± 0%  -75.00%  (p=0.000 n=8+8)
MarshalMarshalerEncoder-8     80.0B ± 0%      8.0B ± 0%  -90.00%  (p=0.000 n=8+8)

name                       old allocs/op  new allocs/op  delta
MarshalMarshaler-8             4.00 ± 0%      2.00 ± 0%  -50.00%  (p=0.000 n=8+8)
MarshalMarshalerEncoder-8      3.00 ± 0%      1.00 ± 0%  -66.67%  (p=0.000 n=8+8)

I've be very pleased to attempt to contribute this if the proposal is accepted.

Apologies if I've not followed the proposal process properly - I don't know what I'm doing!

@gopherbot gopherbot added this to the Proposal milestone Oct 4, 2019
@philpearl philpearl reopened this Oct 4, 2019
@networkimprov
Copy link

cc @mvdan

@wI2L
Copy link
Contributor

wI2L commented Oct 11, 2019

Regarding the validation and compaction that is applied to the JSON data returned by MashalJSON implementations, I think this should still be done, even for an Append model.
However, the ability to disable it may be addressed separately, perhaps with an encoder's option, just like we are already doing it for HTML characters escaping.

@philpearl
Copy link
Contributor Author

@wI2L could you expand on why it's necessary? Folk can always validate their JSON themselves using json.Valid. Is there a history of people producing bad marshallers that this was introduced to address?

@wI2L
Copy link
Contributor

wI2L commented Oct 11, 2019

@philpearl AFAIK, there is no mention of this behavior in the documentation, neither in Marshal or Marshaler comments. I presume this is not an oversight, but rather an implicit behavior that is inherent to the fact that Mashal is supposed/expected to return valid JSON, but don't quote me on that.
My thought is that not doing validation/compaction of the data returned by the implementations of the interface you propose would create a gap with the existing, which would require to document it and clearly point that the behavior is not the same.
With that said, I do however think that we should be able to deactivate this behavior when performance matter.

@lofcek
Copy link

lofcek commented Oct 12, 2019

When I read it for a first time I tough, that is a bright idea. However I changed my mind.

  • Similar pattern for Unmarshaler would not work, would it?
  • For parsing small json files is current json.Marshaler good enough. For bigger (let say kilobytes) is your approach better. If result don't have to be in continuous memory I could prefer to write [][]byte (because it does not need reallocations).
  • For even bigger I would prefer write to io.Writer (typically file). However these issue solve project like easyjson.
  • Find general solution for all this cases are probably impossible and this proposal solves only limited part of possible cases.

@mvdan
Copy link
Member

mvdan commented Oct 13, 2019

I'm surprised that this already has over thirty thumbs up after being posted just over a week ago. Have there been discussions about this proposal elsewhere?

While I agree that the current interface has some performance limitations, I'm not convinced that this is a change we want to make right now. It would make the json package significantly more complex, and there are other aspects to performance, such as the validation and compaction that @wI2L brings up.

There are a large number of proposals that want to significantly change the json package. If we accepted even a small portion of them, the package's API would quickly double in size and become unmanageable. This is why many of them are under the Proposal-Hold label, to all be considered at once when it's time to design a new iteration of the package.

I think this proposal falls under the same category. The design choices that make encoding/json a bit inefficient also make it much simpler to use than the faster json encoders and decoders out there. I think that's a good tradeoff to make for the standard library.

Until large design refactors happen at some point in the future, I'd personally recommend looking at third-party json packages with performance at the center of their design. For example, there are many that use code generation.

@creker
Copy link

creker commented Oct 13, 2019

@mvdan there's a blog post about this reposted on reddit https://www.reddit.com/r/golang/comments/ddqpbt/bad_go_adventures_with_json_marshalling/ That's how I personally found out about this proposal.

@philpearl
Copy link
Contributor Author

Oh god someone put in on reddit. Save me from the comments!

@philpearl
Copy link
Contributor Author

One reason I'm pursuing this is that I've tried to change a large codebase to use a code generation based JSON library (easyjson) and there are issues with doing so, including the following.

  • you have to remember to regenerate the code all the time (yes, we've written some randomised tests for this but it's still hard to keep everything in step)
  • in places we've embedded structs, which works great with reflection based marshalling/unmarshalling, but can go disastrously wrong with embedded interfaces
  • sometimes these libraries are just wrong - so we need to test everything marshals/unmarshals via different JSON implementations and make fixes.

In my head if there are easy wins that improve the standard JSON library performance (and I'm more concerned with garbage than straight-line speed) then that's a much nicer path for folk who run into performance issues than a lift-and-shift to a whole new JSON library.

I'm also struct by the irony that the main use for custom marshallers (other than time.Time & json.RawMessage) appears to be for building nullable values without using pointers - presumably to try to avoid allocations.

I liked the idea of this change because it enables improvements without impacting the normal user much if at all. Surely not many folk write custom marshallers, and those that do can I hope deal with a little extra complication?

@rsc
Copy link
Contributor

rsc commented Oct 21, 2019

At this point, new JSON features should probably be developed in 3rd-party packages; forking the current encoding/json to start is fine. The fact is that encoding/json simply cannot accommodate every last feature every user will want. We should treat the API as basically frozen, I think.

@philpearl
Copy link
Contributor Author

One small problem with doing this externally is that we can't add new marshalling methods for time.Time and (much weaker argument) json.RawMessage. That said, these could be special-cased in a fork.

@networkimprov
Copy link

Why not simply patch your stdlib, and avoid duplicating any code?

@rsc
Copy link
Contributor

rsc commented Nov 6, 2019

Given that this can be explored in 3rd-party packages easily, this seems like a likely decline.
Leaving open for a week for final comments.

@philpearl
Copy link
Contributor Author

Curious to know what a successful exploration would look like, and what the consequences of success would be. Would also be interested in any advice as to how to go about such an exploration - my instinct is that any exploration would be just me, and therefore the impact would be precisely zero (as I'm already convinced).

@rsc
Copy link
Contributor

rsc commented Nov 13, 2019

One consequence of writing your own package is that you could use it and would not need the change in the standard library. If others also found it useful (perhaps by searching https://pkg.go.dev/search?q=json), they could use it too. In general we can't add every possible new feature to the standard library, especially encoding/json and encoding/xml. Third-party alternatives are an important part of a healthy ecosystem.

@rsc
Copy link
Contributor

rsc commented Nov 13, 2019

No change in consensus, so declining.

@rsc rsc closed this as completed Nov 13, 2019
@philpearl
Copy link
Contributor Author

OK, so "can be explored in 3rd-party packages easily" does not imply a mechanism to eventually apply the change back to core if the "exploration" proved successful.

Ah well. Will contemplate creating my own special JSON library. Only I would love it, but perhaps that's enough?

@mvdan
Copy link
Member

mvdan commented Nov 13, 2019

@philpearl if a feature in an alternative json package works well and is very popular, we could always reconsider, but I imagine its chances are low if it trades API complexity for performance.

If you're thinking in the long term, what I think is more interesting would be a v2 redesign of encoding/json. Perhaps that way its design could be better for performance, without duplicating parts of the API. But that's just a thought experiment at this point; I don't think there are currently any plans for new major versions of standard library packages.

@philpearl
Copy link
Contributor Author

More thoughts on this. https://philpearl.github.io/post/json_own_way/

@wI2L
Copy link
Contributor

wI2L commented Feb 1, 2020

For those interested, jettison v0.5.0 features a AppendMarshaler interface. @philpearl

@golang golang locked and limited conversation to collaborators Jan 31, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

8 participants