-
Notifications
You must be signed in to change notification settings - Fork 18k
Proposal: encoding/json: add ability to unmarshal with offset information #16433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
It seems to me this will only be useful if you litter your data structure with fields of type Another possibility would be to define a new method, along the lines of Either way it's not clear to me this is worth the additional complexity in the standard library package. |
My intention was to have this an interface analogous to unmarshalling to an Another possibility would be to have the
This would require every type to implement the interface, which would lead to a huge amount of boilerplate code. I'm also not sure I follow how this would be used? I'm imagining something like:
But the problem is that when |
Over in #16426 minux suggested a I don't understand your suggestion of unmarshaling twice. If you do that at top level you will just get the top level offset. If you do that at every level then you can keep track of the offset yourself. How would your suggestion work? |
When unmarshalling to a I think I understand your idea better now. It would only save relative offsets for each element and leave computing absolute offsets to the user. Correct? I'm still of the opinion this would need a lot of boilerplate code and would force the user to either embed the offset information in their types or use a global to record the offsets.
I like the |
Offset into strings for JSON is good debug or recovery information potentially.
A bug: you get this instance of JSON once every two weeks with that missing quote. This would be a nightmare to track down with bad or no logging assuming a large dataset - getting the exact malformed JSON point makes knowing "hey a client sent this bad huge data set, they have a bug" provable. The proposal is a superset of this though, an arbitrary matching of string character index to JSON data symbols for processing beyond just signaling error handling. "If you look at the original JSON string, each map and slice in the unmarshaled struct and each of their elements and their values point to each of these character indexes" I think is what is being asked for. A solution could be a method that holds onto the work by returning an independent map:
Then you iterate through the map or directly lookup the JSON symbols you care about. The documentation would have to define the JSONSymbol encoding. Ordering could be part of the JSONSymbol encoding. This is a specialized use of JSON. I think it would not be too difficult to write a third party implementation to do the work a second time (which leaves the standard library as is):
Determining last character for an object is looking up next object first character, minus one. |
I experimented with a map solution as well, except it mapped from the addresses of the unmarshalled data to their offsets instead of mapping from strings to offsets. This avoids having to specify a special encoding for the JSONSymbol. I think this proposal is cleaner and more similar to the existing api, which is why I chose to propose it instead. |
This seems largely about validation, which had a proposal that was rejected (#16426), so it seems there is little need for this. Also, as you say, errors already do deliver offset information. |
While I don't disagree that validation might be out of scope for the core library, I do think that this mechanism (or something similar) is necessary. To take the example of validation - to create that external validation package would require either support from the core JSON package or an alternate implementation of the unmarshaller. It seems largely a waste of time (and almost certainly error-prone) to rewrite the JSON unmarshaller just for the purpose of exposing more information to the consumer. Modifying the core library to expose this information, on the other hand, is a relatively small change and enables external packages to greatly extend the functionality. |
Problem
Currently there is no way to get offset information when unmarshalling json; once the json has been unmarshalled into an interface{}, the offsets which the values were derived from cannot be obtained.
This is needed to do detailed validation error reporting, such as when json is used as a configuration language and the source of the error should be reported with the error. For simple cases, such as when the json is invalid due to syntax errors or mismatched types,
json.SyntaxError
andjson.UnmarshalTypeError
will report the offset with the error. However for more complex cases, such as when the json is first completely unmarshalled then validated, there is no mechanism for getting the offset from which the unmarshalled value came.Proposed solution:
Add a new type to encoding/json which can be unmarshalled into like an interface{} but also includes offset information.
The
Value
field would be:I have written an implementation here:
ajeddeloh@60514ff
This change does not affect unmarshalling of other types, since it only takes affect when unmarshalling to a json.Node.
The text was updated successfully, but these errors were encountered: