Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

log/slog: JSONHandler should format everything like json.Marshal #59345

Closed
jba opened this issue Mar 31, 2023 · 26 comments
Closed

log/slog: JSONHandler should format everything like json.Marshal #59345

jba opened this issue Mar 31, 2023 · 26 comments

Comments

@jba
Copy link
Contributor

jba commented Mar 31, 2023

[edit: special case for error values]

This issue now proposes that JSONHandler should format values just like a json.Encoder with SetEscapeHTML(false).

The only exceptions are:

  • Values of type error are formatted as the string obtained from calling the Error method.
  • When json.Encoder.Encode returns a non-nil error, that error is formatted as string, instead of being returned from Handle.

Original top post:


JSONHandler should format time.Duration values with time.Duration.String.

For the most part, JSONHandler conforms to encoding/json.Marshal, which formats a duration as an integer number of nanoseconds.

Using the output of time.Duration.String is better for two reasons:

  • No unit is associated with the integer: you have to know it's nanos. If you didn't already know it was generated by Go, you'd probably think it was seconds. If computer clocks ever get faster, we may achieve higher resolutions, but we won't be able to change the format (too many consumers would silently break) so it would always be an approximation. The string format's interpretation is clear and extensible.

  • The integer can overflow the maximum integer that JSON can represent exactly. json.Marshal will output that as a JSON number, which may cause JSON parsers to fail or return values that only approximate the original.

This change will slow down JSONHandler for durations, but favoring readability and correctness over speed is the right choice in this case. One can always write a ReplaceAttr function to return an int64 and get the fast behavior.

Original proposal: #56345.

@jba jba added the Proposal label Mar 31, 2023
@gopherbot gopherbot added this to the Proposal milestone Mar 31, 2023
@dsnet
Copy link
Member

dsnet commented Mar 31, 2023

Copying my comment on the CL over here for discussion purposes.

Here's an overview of possible representations of a duration and their strengths and weaknesses. There is no obviously right approach. All have flaws of some kind.

  • JSON integer in nanoseconds (e.g., 47683228428000)

    • ✔️ It's what "encoding/json" does today.
    • ❌ It lacks any context as to its meaning. Is it in seconds, milliseconds, microseconds, nanoseconds? Lacking context, seconds is often a reasonable first guess.
    • ❌ It can't scale to higher precisions as it arbitrary assumes only nanosecond resolution.
      • Two decades ago, most APIs used microseconds until CPUs got faster and then nanoseconds became more popular, leading to awkwardness where some APIs still operated on microseconds, and yet others on nanoseconds. CPU frequency has plateaued since the end of Dennard scaling around 2006, and it seems nanosecond resolution is here to stay for some time. However, it's still entirely possible that some breakthrough technology brings clock speeds higher and then we're all of a sudden talking about durations at picosecond resolution.
    • ❌ It exceeds the exact representation of float64 in ~104 days, which will reduce accuracy on JSON parsers without real 64-bit integer support.
  • JSON fractional number in seconds (e.g., 47683.228428)

    • ✔️ The presence of a fractional component is a strong signal that this is in seconds.
    • ✔️ It can scale to arbitrary precision by just adding more fractional digits.
    • ❌ Its representation in float64 is lossy. For example, time.Duration(strconv.ParseFloat("0.000000015", 64)*1e9) results in 14ns (rather than the expected 15ns).
    • ❌ Representation of small durations is verbose (e.g., 1ns would be 0.000000001 as opposed to "1ns" from time.Duration.String).
  • JSON string using time.Duration.String (e.g., "13h14m43.228428s")

    • ✔️ The meaning is fairly unambiguous to a human reader.
    • ✔️ The representation is exact (in that there's no accidental loss due to float64 handling of JSON parsers).
    • ✔️ It can scale to arbitrary precision by adjusting the SI prefix (e.g., "1ps" for 1 picosecond).
    • ✔️ There is built-in formatting and parsing of this in Go.
    • ❌ It's fairly Go specific. The use of "m" for minute may be considered non-standard, since the SI symbol for minute is actually "min", while "m" is usually the unit for meters.
  • JSON string using stop-watch representation (e.g., "13:14:43.228428")

    • ✔️ This is the reading you often see when reading a stop-watch or also defacto used by popular programs (e.g. ffmpeg). There's more precedence for this representation outside of Go.
    • ✔️ The meaning is decently unambiguous to a human reader.
      • ❌ Although, it is still ambiguous whether this represents a wall-clock reading or a duration.
    • ✔️ The representation can be exact (in that there's no accidental loss due to float64 handling of JSON parsers).
    • ✔️ It can scale to arbitrary precision by just adding more fractional digits.
    • ❌ To my knowledge, there's no standard specification for this.
    • ❌ There's no native formatter or parser for this in Go.
    • ❌ Representation of small durations is verbose (e.g., 1ns would be "0:00:00.000000001" as opposed to "1ns" from time.Duration.String).

@gopherbot
Copy link

Change https://go.dev/cl/480735 mentions this issue: log/slog: JSONHandler formats time.Duration as string

@jclark
Copy link

jclark commented Apr 4, 2023

Another possibility is a JSON string using the ISO 8601 representation of durations. I don't much like it, but, as far as I know, it's the only standardized string representation of a duration, so perhaps worth considering.

For the fractional number of seconds option, it would also be possible to use scientific notation: 1e-9 is a reasonably intuitive, concise representation of 1 nanosecond.

@dsnet
Copy link
Member

dsnet commented Apr 4, 2023

Another possibility is a JSON string using the ISO 8601 representation of durations. I don't much like it, but, as far as I know, it's the only standardized string representation of a duration, so perhaps worth considering.

Oh right, I forgot about ISO 8601's duration format.

I personally have never seen that format used in practice. My observation is that there's a general trend of moving away from ISO 8601 since it is overly complicated and esoteric. In fact RFC 3339 exists to document a particular subset of ISO 8601 so that the internet can more readily agree upon how a timestamp is represented.

@jclark
Copy link

jclark commented Apr 4, 2023

If you take the output of time.Duration.String, uppercase it and prepend PT, you have an ISO8601 duration (apart from the special use of millisecond, microsecond and nanosecond for durations less than a second).

I've seen ISO 8601 durations used in some contexts, notably https://www.w3.org/TR/xmlschema-2/#duration It also looks like it's coming to JavaScript: https://tc39.es/proposal-temporal/docs/duration.html

FWIW, I would choose a JSON string using the format of time.Duration.String. The Go standard library has (in time.Duration.String) chosen a string format for durations, and it ought to be consistent about that. I also think it's a tasteful, well-designed representation that is concise, understandable and unambiguous. Though it's not the same as ISO8601 durations, I think there's a good reason for all the ways in which it diverges, and it's not inherently Go-specific.

@08d2
Copy link

08d2 commented Apr 5, 2023

I have also never seen ISO 8601 used in practice. A duration string has a basic set of expectations which I guess are pretty unambiguous, and pretty well represented by the behavior of the standard library's time.ParseDuration function. For example 1ns should parse as one nanosecond, 2us as two microseconds, 3ms as three milliseconds, 4s as four seconds, 5h as five hours, 6d as six days, 1h3m as an hour and three minutes, 1.23s as one thousand two hundred and thirty milliseconds, and so on.

@rsc
Copy link
Contributor

rsc commented Apr 5, 2023

How is this going to be implemented? Is it just a special case for an Attr containing a Duration key or value? I assume we're not trying to handle durations nested inside other JSON objects?

@jba
Copy link
Contributor Author

jba commented Apr 5, 2023

Good point. Yes, it's only a special case for slog.Values of kind Duration. The same is true for the special float handling.

In light of that inconsistency, should we consider matching encoding/json's behavior exactly, and wait for encoding/json/v2 to deal with these issues? (@dsnet outlined one way for slog to backwards-compatibly work with v2.)

@rsc
Copy link
Contributor

rsc commented Apr 6, 2023

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@doggedOwl
Copy link

doggedOwl commented Apr 7, 2023

In light of that inconsistency, should we consider matching encoding/json's behavior exactly, and wait for encoding/json/v2 to deal with these issues?

There is no advantage of continuing to replicate the current encoding/json behaviour when it's clearly the worst of all the choices described in #59345 (comment).

It would be nice to start with a more readable format or there is risk of getting stuck with it. While on development side there is a fairly easy upgrade path to the potential json/v2 behaviour, there will be resistance to it because then we would have either two different log rapresentations depending on time and version used or having to go back and retroactivly modify the old logs to have a single consistent rapresentation. This second option would be a heavy burdon to most projects.

@dsnet
Copy link
Member

dsnet commented Apr 7, 2023

Fundamentally, "slog" will be released before any hypothetical v2 "json". If v2 "json" happens, there must be a way that users can switch from whether "slog" is using v1 or v2 "json" semantics. While the representation of time.Duration in v1 is not great, there is a benefit to consistency where users can toggle between "slog" operating purely on v1 or v2 "json" semantics, as opposed to some strange hybrid, where top-level Duration values use a different representation as those nested within a struct.

@hherman1
Copy link

hherman1 commented Apr 7, 2023

Why not just ship a json v2 handler when there is such a thing? Shouldn’t be too troublesome for folks to switch their handlers

@doggedOwl
Copy link

doggedOwl commented Apr 7, 2023

Would it be possible to "backport" the Duration format to jsonv1 (under an option) in time for it to be used by slog and have the same behaviour also for nested structs?
I understand the inconsistency issue but at the same time it doesn't seem right for the new logging library to start with an inferior option because nomatter how easy it is to switch the semantics of v1->v2 in the package slog later, the damage would already have been done on the log files that might have been distributed or agregated several times and just for that it will be a tough task to convince any team to change those semantics.

@dsnet
Copy link
Member

dsnet commented Apr 7, 2023

Would it be possible to "backport"

See #59365 (comment).

The tentative plan is for v1 and v2 to use the same implementation under the hood and for every divergence in behavior to be configurable through options. Thus, v1 and v2 are technically the same thing, but just that they start with different default options. You could make v1 more v2-like by specifying the appropriate options, or make v2 more v1-like with the appropriate set of options. The options would be shared between v1 and v2 and be composable with each other.

@jba
Copy link
Contributor Author

jba commented Apr 8, 2023

I agree that the number of nanoseconds is the worst format in the list, but maybe there's something worse.

  1. I start with slog.Info("msg", "since", time.Since(start)).
  2. I add a second duration:
    slog.Info("msg", "since1", time.Since(start1), "since2", time.Since(start2))
    
  3. I decide to use a group (assuming log/slog: change slog.Group signature to ...any #59204):
    slog.Info("msg", Group("since", "1", time.Since(start1), "2", time.Since(start2)))
    
  4. I decide to use a struct:
    type Since struct { Start1, Start2 time.Duration }
    slog.Info("msg", "since", Since{Start1: ..., Start2: ...})
    

For the first three, JSONHandler will write the durations as strings. For the fourth, a seemingly minor change, json.Marshal is called and the durations log as numbers. That change is probably going to be a surprise, and may not be caught until a log ingester fails in production, or even later when the logs are needed for debugging.

@neild
Copy link
Contributor

neild commented Apr 8, 2023

One option I don’t think I’ve seen suggested: Nanoseconds with a “ns” suffix. This is simpler for non-Go parsers to process than the full time.Duration.String output and leaves the option for changing the precision someday.

For what it’s worth, the protobuf JSON encoding represents durations as fractional seconds with a “s” suffix.

@dsnet
Copy link
Member

dsnet commented Apr 8, 2023

I find the use of a "ns" suffix kind of arbitrary. I do like the protobuf approach of using a JSON string with a "s" suffix.

Both representations are compatible with time.ParseDuration and neither are hard to format manually.

@jba
Copy link
Contributor Author

jba commented Apr 10, 2023

I've become convinced (see above) that the current JSONHandler should behave just like the current encoding/json. That goes for NaNs and infinities too. json.Marshal fails on those, but slog won't fail; it will output a JSON string with the error.

A simple ReplaceAttr function can restore the special behavior (and the inconsistency along with it).

@jba jba changed the title proposal: log/slog: JSONHandler should format durations as strings proposal: log/slog: JSONHandler should format everything like json.Marshal Apr 10, 2023
@08d2
Copy link

08d2 commented Apr 12, 2023

For the first three, JSONHandler will write the durations as strings. For the fourth, a seemingly minor change, json.Marshal is called and the durations log as numbers. That change is probably going to be a surprise, and may not be caught until a log ingester fails in production, or even later when the logs are needed for debugging.

Code which calls slog.Info has no way of knowing which handler will ultimately render the log events; nothing guarantees a JSONHandler. In any case, arbitrary structs passed as log values guarantee no specific output regardless of handler, and are therefore generally an error. Switching from explicit key-val fields to the example type Since struct is definitely not a minor change, and the resulting output is definitely not going to be a surprise 😉

@rsc
Copy link
Contributor

rsc commented Apr 19, 2023

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

@jba
Copy link
Contributor Author

jba commented Apr 20, 2023

One small change: we will retain special handling of error values, by calling their Error method and outputting the string. If we didn't treat these specially, then any error created with errors.New will format as {}, throwing away the message. In this case, the inconsistency is worth it.

@gopherbot
Copy link

Change https://go.dev/cl/486855 mentions this issue: log/slog: remove special float handling from JSONHandler

@rsc
Copy link
Contributor

rsc commented May 3, 2023

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

@rsc rsc changed the title proposal: log/slog: JSONHandler should format everything like json.Marshal log/slog: JSONHandler should format everything like json.Marshal May 3, 2023
@rsc rsc removed this from the Proposal milestone May 3, 2023
@seankhliao
Copy link
Member

@jba does this not include the special casing for source info?

@jba
Copy link
Contributor Author

jba commented May 9, 2023

@seankhliao, the implementation special-cases Source, that is true. But it behaves exactly as json.Marshal would, except the field names are lower-cased.

https://go.dev/cl/494035 adds json tags to the fields of Source, so that the behavior is truly identical and the implementation is just an optimization.

@dmitshur dmitshur modified the milestones: Backlog, Go1.21 Jun 4, 2023
eric pushed a commit to fancybits/go that referenced this issue Sep 7, 2023
Remove the special-case handling of NaN and infinities from
appendJSONValue, making JSONHandler behave almost exactly like
a json.Encoder without HTML escaping.

The only differences are:

- Encoding errors are turned into strings, instead of causing the Handle method to fail.

- Values of type `error` are displayed as strings by calling their `Error` method.

Fixes golang#59345.

Change-Id: Id06bd952bbfef596e864bd5fd3f9f4f178f738c9
Reviewed-on: https://go-review.googlesource.com/c/go/+/486855
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Jonathan Amsterdam <jba@google.com>
Reviewed-by: Alan Donovan <adonovan@google.com>
eric pushed a commit to fancybits/go that referenced this issue Sep 7, 2023
Remove the special-case handling of NaN and infinities from
appendJSONValue, making JSONHandler behave almost exactly like
a json.Encoder without HTML escaping.

The only differences are:

- Encoding errors are turned into strings, instead of causing the Handle method to fail.

- Values of type `error` are displayed as strings by calling their `Error` method.

Fixes golang#59345.

Change-Id: Id06bd952bbfef596e864bd5fd3f9f4f178f738c9
Reviewed-on: https://go-review.googlesource.com/c/go/+/486855
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Jonathan Amsterdam <jba@google.com>
Reviewed-by: Alan Donovan <adonovan@google.com>
@mitar
Copy link
Contributor

mitar commented Oct 7, 2023

Ah, I missed this discussion. I just wanted to add (but probably too late) that errors should by my experience be JSON formatted as {"error": "... error string ..."} and not just as string. The reason is that errors can be annotated with extra information (stack traces, structured details themselves, etc.) and it is easier to add them if JSON is already an object.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Accepted
Development

No branches or pull requests