Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/json: json.Number accepts quoted values by default #34472

Open
vearutop opened this issue Sep 23, 2019 · 38 comments
Open

encoding/json: json.Number accepts quoted values by default #34472

vearutop opened this issue Sep 23, 2019 · 38 comments
Labels
NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Milestone

Comments

@vearutop
Copy link
Contributor

What version of Go are you using (go version)?

$ go version
go version go1.13 darwin/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/viacheslav.poturaev/Library/Caches/go-build"
GOENV="/Users/viacheslav.poturaev/Library/Application Support/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GONOPROXY="github.com/hellofresh"
GONOSUMDB="github.com/hellofresh"
GOOS="darwin"
GOPATH="/Users/viacheslav.poturaev/go"
GOPRIVATE="github.com/hellofresh"
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/Cellar/go/1.13/libexec"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/Cellar/go/1.13/libexec/pkg/tool/darwin_amd64"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/h7/7qg3nbt91bb6mgk2xtqqpwc40000gp/T/go-build850653985=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

I've decoded a JSON value of "123" into json.Number.
https://play.golang.org/p/9Nwjn3rBFxI

What did you expect to see?

I've expected to see an error.

What did you see instead?

I've seen a successful result and an int64 value of 123 accessible with .Int64().

I could not find any documentation that json.Number accepts quoted string values by default and was expecting the opposite (as per "being precise" motivation expressed in #22463 (comment)).

Not sure if this is a documentation or implementation issue.

@bcmills
Copy link
Contributor

bcmills commented Sep 23, 2019

Looks like the Number validation changed very recently (in CL 195045; see #14702).

Is this still reproducible using gotip?

@bcmills bcmills added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. labels Sep 23, 2019
@vearutop
Copy link
Contributor Author

Same behavior with go version devel +361ab73 Mon Sep 23 05:57:54 2019 +0000 darwin/amd64.

@vearutop
Copy link
Contributor Author

I think such behavior is historical, at least in go1.12 it is still same. Maybe the documentation has to be more explicit.

@bcmills bcmills removed the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Sep 23, 2019
@bcmills
Copy link
Contributor

bcmills commented Sep 23, 2019

CC @dsnet @bradfitz @mvdan

@bcmills
Copy link
Contributor

bcmills commented Sep 23, 2019

Thanks for checking.

@bcmills bcmills added this to the Unplanned milestone Sep 23, 2019
@bcmills bcmills added NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Sep 23, 2019
@mvdan
Copy link
Member

mvdan commented Sep 23, 2019

Some changes related to json.Number from @breml were merged recently, perhaps he has thoughts.

Without looking deeply into the issue, remember that json.Number is a string type underneath, so perhaps this was meant to be supported by design.

@kentquirk
Copy link

json.Number is a string; the Int64() and Float64() methods return an error if the conversion is invalid.

I can't think of why this would need to exist unless it were intended to work this way.

To force conformance to a specific type in JSON, you could use the type directly instead of json.Number.

I think this could be clarified in documentation but changing the behavior would cause previously-working code to break.

@vearutop
Copy link
Contributor Author

@kentquirk being a string for raw JSON value is ok, being a quoted string is less ok.

As you would not expect the following code to return valid integer:

strconv.Atoi(`"123"`)

JSON.org defines values as:

A value can be a string in double quotes, or a number, or true or false or null, or an object or an array.

Which makes an explicit difference between double-quoted string and number.

The documentation says:

A Number represents a JSON number literal.

I also think this issue in likely about clarifying documentation given the potential impact of behavior change. Though maybe it would make sense to revisit behavior for Go 2.

@breml
Copy link
Contributor

breml commented Sep 23, 2019

I quickly checked the code and I have come to the following conclusions:

if v.Type() == numberType && (!fromQuoted || !isValidNumber(string(s))) {
	return fmt.Errorf("json: invalid number literal, trying to unmarshal %q into Number", item)
}
  • I also quickly checked the test cases in decode_test.go and I could not find a test case testing this case, so I assume, it is not intended behavior.
  • Respective test cases could be easily added after the encoding/json: Unmarshal into struct field or a map does not validate invalid json.Number #14702 related test cases
    // #14702
    {
    in: `invalid`,
    ptr: new(Number),
    err: &SyntaxError{
    msg: "invalid character 'i' looking for beginning of value",
    Offset: 1,
    },
    },
    {
    in: `"invalid"`,
    ptr: new(Number),
    err: fmt.Errorf("json: invalid number literal, trying to unmarshal %q into Number", `"invalid"`),
    },
    {
    in: `{"A":"invalid"}`,
    ptr: new(struct{ A Number }),
    err: fmt.Errorf("json: invalid number literal, trying to unmarshal %q into Number", `"invalid"`),
    },
    {
    in: `{"A":"invalid"}`,
    ptr: new(struct {
    A Number `json:",string"`
    }),
    err: fmt.Errorf("json: invalid use of ,string struct tag, trying to unmarshal %q into json.Number", `invalid`),
    },
    {
    in: `{"A":"invalid"}`,
    ptr: new(map[string]Number),
    err: fmt.Errorf("json: invalid number literal, trying to unmarshal %q into Number", `"invalid"`),
    },

@kentquirk
Copy link

kentquirk commented Sep 24, 2019

I feel like the entire point of json.Number is to deal with the complex mess of things that JS programs will do, because JS is way too casual about this stuff.

If you have JSON coming from JS that you don't control, you may see numbers encoded as strings. Worse, you may see them sometimes encoded as strings and sometimes as floats or integers, even for the same field. Your options to consume that in a Go API are:

  • treat it as json.RawMessage and then evaluate the byte stream and unmarshal it manually
  • treat is as a json.Number and use .Int64 or .Float64()

I would argue that the latter is preferable -- because if you knew for sure that it was it not a string, you could have simply unmarshaled it directly into an int or float directly.

If you're going to insist that json.Number force the value to be not-a-string, the correct fix should be to remove it entirely, because it is otherwise useless.

@vearutop
Copy link
Contributor Author

Additional purpose of json.Number is dealing with float64 53-bit precision of int64.

@kentquirk
Copy link

I think that's a good idea but can you point me to any code that supports that? If you marshal an int64 to a string you may store a value that doesn't successfully convert to a JS Number but as far as I can tell, json.Number won't care and can't help you there.

@Gobd
Copy link

Gobd commented Sep 27, 2019

I don't think this behaviour should change because I rely on it as I'm sure many other people do.

@mvdan
Copy link
Member

mvdan commented Oct 8, 2019

@Gobd unless I'm missing something, this would be trivial to fix when upgrading to Go 1.14, if the change is done in that release. The json.Number fields in question would just need to get the ,string tag option added. Adding that option shouldn't break earlier Go versions either.

Non-API breaking changes should be kept at a minimum for sure, but given that the program in question is relying on the opposite of what the documentation says, and it would have a very simple fix, I don't see why we should be conservative. Particularly when the new behavior would be more consistent.

There's also the argument that json.Number is especially useful since it allows both non-strings and strings when decoding at the same time. That was never documented and I think it was clearly a mistake. I personally think that keeping that behavior is wrong; why should json.Number allow both at the same time, while other types like int and float64 do not? It would make encoding/json more inconsistent.

I would argue that the latter is preferable -- because if you knew for sure that it was it not a string, you could have simply unmarshaled it directly into an int or float directly.

No - the purpose of json.Number has nothing to do with decoding strings. Its sole purpose is to not lose precision and formatting when encoding/decoding. See https://codereview.appspot.com/6202068.

If it has been used by some to accept strings and non-strings quickly, that's simply using an unintended and undocumented side effect, which I'm arguing against at the start of my comment.

@mvdan
Copy link
Member

mvdan commented Oct 8, 2019

To support the idea that this was never an intended or documented use of json.Number - applying @breml's fix above doesn't break any of the existing tests or examples.

@vearutop
Copy link
Contributor Author

vearutop commented Oct 8, 2019

The json.Number fields in question would just need to get the ,string tag option added.

The upgrade would be more complicated in case of non-object, e.g. "123" or ["123", 456, 789.0].

@Gobd
Copy link

Gobd commented Oct 8, 2019

What benefit does @breml's fix have over documenting & adding tests to show that it supports strings & non-strings? I think that would be prefered vs changing current behaviour for what seems like no benefit.

@mvdan
Copy link
Member

mvdan commented Oct 10, 2019

@Gobd the benefit is consistency, and actually following what has been documented since the beginning, including the purpose of the string option. The more special cases the json package documents and implements, the harder it is to understand and maintain in the long run.

The upgrade would be more complicated in case of non-object

Please read my longer comment above; I mention this specific use case.

For example, here's a simple piece of code to accept both a string and a non-string for integer values (not using json.Number since that accepts both today):

https://play.golang.org/p/bQBzvSvMv1e

It requires ten more lines of code, but I think that's completely acceptable when your restriction is that you need to accept different kinds of JSON at once. And as said before, json.RawMessage or interface{} are always an option.

@puellanivis
Copy link

This behavior has been supporting quoted numbers since before go1.0. Changing it now (regardless of how good an argument could be made for it) seems like a bad idea. Just because an argument could be made that programs that are relying upon this behavior are incorrect, or buggy, does not change that people who are relying upon this behavior will feel like there was a violation of the go1 compat guarantee because their programs that worked before have stopped working now.

I will also note that the specification behavior in the proto3 JSON mapping is to always quote int64 values, because it is actually impossible by specification to make a guarantee of more than 53-bits of accuracy with JSON. That Go’s encoding/json permits higher than 53-bits of accuracy is irrelevant to the fact that proxies or any other handling of JSON is technically by spec permitted to drop anything more than 53-bits of accuracy.

So, it remains, the only way to guarantee full int64 range and accuracy is to quote the number in a string. Making json.Number restrictive to only unquoted numbers makes supporting this method of guaranteeing numeric precision more difficult.

@mvrhov
Copy link

mvrhov commented Feb 18, 2020

This change just broke the "field":"" which previously worked and the use of json.Number was on purpose. I don't have the option to change the json input I'm getting.

@mvdan
Copy link
Member

mvdan commented Feb 18, 2020

@mvrhov there has been no change yet. If there was, this issue would not be open with the NeedsDecision label.

@puellanivis it's indeed a tough choice. If we can't come to an agreement on changing it (which seems to be the case), then it's probably too late or painful to change it. I can have a look at a documentation change for 1.15, to at least have the documentation reflect the internal behavior that too many people have been relying on already.

@breml
Copy link
Contributor

breml commented Feb 18, 2020

@mvdan A documentation change would not be enough. If we decide to raise the status of the current behavior to "fully supported", then we need to add test cases for this as well.

In general, I am still in favor of this change, because in my opinion the ,string tag is there to support unmarshaling of quoted (numeric) values.

In regards to the go1 compat guarantee I do not agree with @puellanivis. I do not think, that this fix would be a violation, because this behavior is nowhere documented or specified. If we would treat this as a violation, we would end up in a situation, where we can never fix a bug in Go or the stdlib, because someone could potentially rely on the buggy behavior and it would break their code.

@breml
Copy link
Contributor

breml commented Feb 18, 2020

Additionally, I would like to mention, that the documentation of type Number in the encoding/json package states the following:

A Number represents a JSON number literal.

In the JSON specification, a number literal is defined as (link):

integer fraction exponent

So the type Number is clearly specified and a JSON number literal is never quoted.

@puellanivis
Copy link

I am not arguing that just because someone might depend upon a buggy behavior, that bugs cannot be fixed.

I am saying that changing unspecified behavior has to be done intelligently, and with respect to the users that might (accidentally) be relying upon that unspecified behavior.

Even if we “fix” all of this, or introduce a ,string tag, or some other way to continue supporting the current behavior, we should not just shove out a “your code now fails” patch all in one version. We should properly document that the behavior was unspecified, that it was never intended, and that it will stop working in the future.

And now after reading the two main standards ECMA-404, and RFC 8259, I don’t even think a change is necessary.

The ECMA-404 standard specifically says:

The goal of this specification is only to define the syntax of valid JSON texts. Its intent is not to provide any semantics or interpretation of text conforming to that syntax

And RFC 8259 also only defines only a grammar, not semantics.

So, even with json.Number accepting quoted integers, this package accepts all text that conforms to the JSON grammars that both standards layout. And there are no standards pointing to any MUST semantics here, both of them only cover syntax. And a number encoded in a string is valid syntax.

@breml
Copy link
Contributor

breml commented Feb 20, 2020

@puellanivis I am not sure, if I get your point in your last comment.

The documentation of json.Number states: "A Number represents a JSON number literal.".
A JSON number literal has a clearly defined syntax and this syntax does not include any quotes.

Of course the encoding/json package accepts all text that conforms to the JSON grammars, but if you want to decode a quoted string (maybe containing a character sequence, that is a valid JSON number literal), json.Number is not the right target type, because the documentation limits the usage of json.Number to "JSON number literals".

@puellanivis
Copy link

Regardless of what the documentation says, it has, since before go1.0, accepted string-quoted numeric values as well. The documentation and the implementation are in disagreement. The solution is to either fix the code, or fix the documentation.

Because the JSON standard does not dictate semantics, there is no ground to stand on to say that we cannot have json.Number support quoted numeric values like it has for over 9 years.

Nothing breaks if we fix the documentation, it is a strict superset of functionality. However, many things will break if we force the code to fit the documentation.

@bcmills
Copy link
Contributor

bcmills commented Feb 20, 2020

To put it a different way: we can consider this an implementation bug (“json.Number accepts quoted values when its documentation says it should only accept number literals”), or a documentation bug (“json.Number's documentation says it represents a JSON number literal, when it actually represents any reasonable JSON encoding of a number”).

@puellanivis's point, as I understand it, is that the choice between those interpretations should be informed by existing usage.

@puellanivis
Copy link

Yes, that is way more clearly what I wanted to say.

Though, I wouldn’t say that I am firmly against changing the code to fit the documentation, but it should at least be carefully measured about what the effect either decision would lead to.

@dryairship
Copy link

* The decoding of a quoted numerical value into `json.Number` should (in my opinion) only work, if the option `,string` is set in the struct tags.

Can't @breml 's suggestion be implemented the opposite way? Like, instead of a ,string tag that ensures it accepts quoted numbers, we can use a ,strictlyNumeric (or something else) tag to ensure it accepts only numeric values and not strings. This would not break compatibility, and would practically solve the problem.

@puellanivis
Copy link

Such a change would still require a change of documentation to document the current behavior, and the new “strictlyNumeric” functionality.

@mvdan
Copy link
Member

mvdan commented Jul 17, 2020

What @puellanivis said. Adding a new option is not a good solution; if anything, the bar for new json options is pretty high at this point.

Personally, I've started to give up trying to fix the historical design issues with encoding/json; I've tried with a number of other undocumented or unintended behaviors (or even those that go directly against the documentation), but most of them end up getting reverted because they break existing code. So I'm pretty sure this would fall under the same category.

Whether or not the documentation should be updated is up to personal preference, I guess. I would personally not document these implementation quirks that were never meant to be.

@darnmason
Copy link

so this broke... that was fun

@dsnet
Copy link
Member

dsnet commented Jun 8, 2022

@darnmason: Nothing has changed on this issue in 2 years. Are you suggesting that something changed in the json behavior as it relates to this issue?

@vearutop
Copy link
Contributor Author

vearutop commented Jun 8, 2022

It seems the documentation is still misleading.

@darnmason
Copy link

@dsnet I was a bit late to the party and updated an old project from go 1.13 to 1.16

Turns out my code was relying on the "feature" of json.Number being able to decode both number literals and quoted numbers

@dsnet
Copy link
Member

dsnet commented Jun 8, 2022

@darnmason, I'm not sure how that matters since behavior related to this issue has not changed since at least go1.2 (which is the oldest version of Go on my machine).

@darnmason
Copy link

ah so it hasn't actually changed. Sorry for the confusion, my bad.

@Zeplar
Copy link

Zeplar commented Sep 28, 2022

I'm curious how the ",string" approach would have helped here. With the current implementation of the string tag, the field must be quoted. This doesn't solve the common problem of a JS client using both quoted and unquoted formats.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Projects
None yet
Development

No branches or pull requests