Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: encoding,encoding/json: common struct tag for field names #60791

Open
sparr opened this issue Jun 14, 2023 · 10 comments
Open

proposal: encoding,encoding/json: common struct tag for field names #60791

sparr opened this issue Jun 14, 2023 · 10 comments
Labels
Milestone

Comments

@sparr
Copy link

sparr commented Jun 14, 2023

Currently if a package wants to define a struct that can be saved to and loaded from files in different formats, with field names different from the struct field names (e.g. changing "FooBar" to "foo_bar" to match conventions, "Miscellaneous" to "misc" for brevity, etc), the package must add separate struct tags for json, toml, yaml, etc. Any encoding not specifically enumerated in the tags will either fall back to using the struct field names directly, or have to implement parsing of another encoding's tag. Any tag options supported by multiple encoders must be specified multiple times.

While these different encoding packages offer some unique functionality, such as go-yaml's inline, encoding/json's string, and go-toml's multiline, they all share common functionality of specifying the key name and the omitempty option. Since go-toml v2, they also all use the same structure for the contents of the tag, i.e. "name,option,option...". For use cases where that subset of functionality is sufficient, it would be convenient if most or all of the markup/encoder/serializer/marshaler/etc packages supported a common tag name.

My proposal is for a standard tag that looks and works like the existing tag syntax for toml, json, and yaml, but with a new name. Something like "markup", "marshal", "encoded", "serialized", etc. Preferably relatively short.

With this proposal, and support by the relevant packages, the following code:

type Platform struct {
	ArchitectureType string `toml:"arch_type,multiline,omitempty" json:"arch_type,string,omitempty" yaml:"arch_type,inline,omitempty"`
	Variant string `toml:"var,omitempty" json:"var,omitempty" yaml:"var,omitempty"`
	// ...
}

might be replaced with this:

type Platform struct {
	ArchitectureType string `marshal:"arch_type,omitempty" toml:",multiline" json:",string" yaml:",inline"`
	Variant string `marshal:"var,omitempty"`
	// ...
}

This new tag would specify the expected behavior of some options, possibly currently only omitempty, which I believe has consistent behavior across all three of the packages mentioned above, and at least most of the other yaml packages.

Each of the packages could still read its own tag, for both unique and common functionality, with the following proposed conflict resolution behavior:

  • Specifying a field name in both the new common tag and the package tag would result in the package tag overriding the common tag.
  • Specifying an option (e.g. omitempty) in the common tag but not the package tag would result in the option still being applied; packages would need to provide an inverse option (e.g. keepempty) in their own tag to override this behavior.

Alternately, packages could read arbitrary options from the standard tag, which would simplify the struct definition even further but risks future collisions between options understood with different meanings by different packages.

The implementation of the functionality to decode this tag could be left to the individual packages, or go in a new part of the standard library possibly somewhere near reflect.StructTag.Get or elsewhere in encoding (possibly the same place that #60770 ends up if we move tagOptions and parseTag out of encoding/json), or may end up in a third party package like https://pkg.go.dev/github.com/fatih/structtag. Wherever it ends up, the conflict resolution described above could also be implemented generically and made available to all consuming packages.

@sparr sparr added the Proposal label Jun 14, 2023
@gopherbot gopherbot added this to the Proposal milestone Jun 14, 2023
@earthboundkid
Copy link
Contributor

Why not just have the other packages fall back to json: if toml: isn’t set?

@sparr
Copy link
Author

sparr commented Jun 15, 2023

@Nasfame Regarding collisions, I have used github code search to search for path:*.go StructTag AND "Get(\"json\")" and equivalent for other tag names. The "code" category results are as follows:

json: 1.2k
yaml: 148
toml: 79
markup: 0
marshal: 0
encoded: 0
serialized: 1 (https://github.com/inklabs/rangedb and forks)
[empty string]: 6 (outdated forks of hashicorp/packer)

@sparr
Copy link
Author

sparr commented Jun 15, 2023

@carlmjohnson I did also suggest that on one of those projects. I am taking a multi-pronged approach to this situation. pelletier/go-toml#880

@seankhliao
Copy link
Member

cc @mvdan @dsnet

@sparr
Copy link
Author

sparr commented Jul 17, 2023

@carlmjohnson The developer of go-toml has said he will use this proposal if it succeeds, but will not use the json struct tag in the main release of his package.

pelletier/go-toml#880 (comment)

@ianlancetaylor ianlancetaylor changed the title proposal: encoding/json: Support a generic struct tag for marshaled field name and common options proposal: encoding/json: support a generic struct tag for marshaled field name and common options Mar 15, 2024
@seankhliao seankhliao changed the title proposal: encoding/json: support a generic struct tag for marshaled field name and common options proposal: encoding,encoding/json: common struct tag for field names Jul 10, 2024
@seankhliao
Copy link
Member

#68361 drew parallels to encoding.TextMarshaler, and suggested text as the name for the tag.

@dsnet
Copy link
Member

dsnet commented Jul 10, 2024

One advantage of the #68361 is that it simplified the problem to just the textual name, while this proposal cover the name and other common-ish attributes like omitempty.

A name tag alone is easier to reason through, while attributes like omitempty is more challenging if they have different semantics across serialization libraries. For example, the v2 "json" package redefines omitempty for a field to be omitted if it an empty JSON value, but adds omitzero which omits a field if it is the zero Go value.

@dsnet
Copy link
Member

dsnet commented Jul 10, 2024

While textual names are more common, should there also be support for numeric field IDs? This is useful for formats like CBOR or protobuf that represent fields with numeric integers, rather than textual names.

@seankhliao
Copy link
Member

#68361 still included omitempty:

The only allowed option is omitempty (although that should be a vet check). Values with - will be skipped as it works currently.

I'd agree it makes more sense to only support the field name and no options.
Protobuf seems to require additional info, if numeric IDs are always going to be used in format that require additional metadata, then it may not make sense to try and create additional indirection.

@dsnet
Copy link
Member

dsnet commented Jul 10, 2024

@adonovan and I once (many years back) tendered the idea of a package that can serialize Go structs as protobuf using just Go reflection (side-stepping the protobuf compiler). All you need is the numeric field ID as the other attributes of protobuf (e.g., whether a field is optional) can be inferred from the type of the field.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Incoming
Development

No branches or pull requests

6 participants
@sparr @earthboundkid @dsnet @gopherbot @seankhliao and others