Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/json: unexpected result when json.Unmarshaling #38126

Closed
AllenX2018 opened this issue Mar 28, 2020 · 6 comments
Closed

encoding/json: unexpected result when json.Unmarshaling #38126

AllenX2018 opened this issue Mar 28, 2020 · 6 comments

Comments

@AllenX2018
Copy link

What version of Go are you using (go version)?

$ go version
go version go1.14.1 windows/amd64

Does this issue reproduce with the latest release?

Yes, it reproduces with the latest release.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
set GO111MODULE=on
set GOARCH=amd64
set GOBIN=
set GOEXE=.exe
set GOFLAGS=
set GOHOSTARCH=amd64
set GOHOSTOS=windows
set GOINSECURE=
set GONOPROXY=
set GONOSUMDB=
set GOOS=windows

What did you do?

My struct has a string field with a json:"string" field tag. When this field contains escaped character, I get unexcepted result when doing roundtrip test.

type T struct {
	F1 string `json:"F1,string"`
}

t := T {
	"aaa\tbbb",
}

b, _ := json.Marshal(t)
if err := json.Unmarshal(b, &t); err == nil {
	fmt.Printf("%+v\n", t)
}else {
	fmt.Println(err)
}

What did you expect to see?

{F1:aaa	bbb}

What did you see instead?

json: invalid use of ,string struct tag, trying to unmarshal "\"aaa\tbbb\"" into string
@mvdan
Copy link
Member

mvdan commented Mar 28, 2020

I've confirmed this is a duplicate of #38105. I'll close this issue to consolidate into the other one, but I'll add this as a test case.

@mvdan mvdan closed this as completed Mar 28, 2020
@gopherbot
Copy link

Change https://golang.org/cl/226218 mentions this issue: encoding/json: don't mangle strings in an edge case when decoding

@AllenX2018
Copy link
Author

It seems that https://golang.org/cl/226218 has fix this issue. However, I still have a question.

type T struct {
	F1 string `json:"F1,string"`
}

t := T {
	"aaa\tbbb",
}

b, _ := json.Marshal(t)
fmt.Println(string(b))

Before go 1.14, its output result is

{"F1":"\"aaa\\tbbb\""}

Add after go 1.14, it changes to

{"F1":"\"aaa\tbbb\""}

It seems this change is introduced by this commit but may I ask why you change the encode behavior for string type struct field with ",string" option like this? Which of them excatly match the godoc's description?

The "string" option signals that a field is stored as JSON inside a JSON-encoded string

@mvdan
Copy link
Member

mvdan commented Mar 30, 2020

You're right, @AllenX2018, something is up with this change in behavior. Can you please file a separate issue? I'm already investigating a fix, but it's good to have an issue to track the bug.

@AllenX2018
Copy link
Author

I've created #38173 for track

gopherbot pushed a commit that referenced this issue May 8, 2020
The added comment contains some context. The original optimization
assumed that each call to unquoteBytes (or unquote) followed its
corresponding call to rescanLiteral. Otherwise, unquoting a literal
might use d.safeUnquote from another re-scanned literal.

Unfortunately, this assumption is wrong. When decoding {"foo": "bar"}
into a map[T]string where T implements TextUnmarshaler, the sequence of
calls would be as follows:

	1) rescanLiteral "foo"
	2) unquoteBytes "foo"
	3) rescanLiteral "bar"
	4) unquoteBytes "foo" (for UnmarshalText)
	5) unquoteBytes "bar"

Note that the call to UnmarshalText happens in literalStore, which
repeats the work to unquote the input string literal. But, since that
happens after we've re-scanned "bar", we're using the wrong safeUnquote
field value.

In the added test case, the second string had a non-zero number of safe
bytes, and the first string had none since it was all non-ASCII. Thus,
"safely" unquoting a number of the first string's bytes could cut a rune
in half, and thus mangle the runes.

A rather simple fix, without a full revert, is to only allow one use of
safeUnquote per call to unquoteBytes. Each call to rescanLiteral when
we have a string is soon followed by a call to unquoteBytes, so it's no
longer possible for us to use the wrong index.

Also add a test case from #38126, which is the same underlying bug, but
affecting the ",string" option.

Before the fix, the test would fail, just like in the original two issues:

	--- FAIL: TestUnmarshalRescanLiteralMangledUnquote (0.00s)
	    decode_test.go:2443: Key "开源" does not exist in map: map[开���:12345开源]
	    decode_test.go:2458: Unmarshal unexpected error: json: invalid use of ,string struct tag, trying to unmarshal "\"aaa\tbbb\"" into string

Fixes #38105.
For #38126.

Change-Id: I761e54924e9a971a4f9eaa70bbf72014bb1476e6
Reviewed-on: https://go-review.googlesource.com/c/go/+/226218
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
@gopherbot
Copy link

Change https://golang.org/cl/233057 mentions this issue: [release-branch.go1.14] encoding/json: don't mangle strings in an edge case when decoding

gopherbot pushed a commit that referenced this issue May 27, 2020
…e case when decoding

The added comment contains some context. The original optimization
assumed that each call to unquoteBytes (or unquote) followed its
corresponding call to rescanLiteral. Otherwise, unquoting a literal
might use d.safeUnquote from another re-scanned literal.

Unfortunately, this assumption is wrong. When decoding {"foo": "bar"}
into a map[T]string where T implements TextUnmarshaler, the sequence of
calls would be as follows:

	1) rescanLiteral "foo"
	2) unquoteBytes "foo"
	3) rescanLiteral "bar"
	4) unquoteBytes "foo" (for UnmarshalText)
	5) unquoteBytes "bar"

Note that the call to UnmarshalText happens in literalStore, which
repeats the work to unquote the input string literal. But, since that
happens after we've re-scanned "bar", we're using the wrong safeUnquote
field value.

In the added test case, the second string had a non-zero number of safe
bytes, and the first string had none since it was all non-ASCII. Thus,
"safely" unquoting a number of the first string's bytes could cut a rune
in half, and thus mangle the runes.

A rather simple fix, without a full revert, is to only allow one use of
safeUnquote per call to unquoteBytes. Each call to rescanLiteral when
we have a string is soon followed by a call to unquoteBytes, so it's no
longer possible for us to use the wrong index.

Also add a test case from #38126, which is the same underlying bug, but
affecting the ",string" option.

Before the fix, the test would fail, just like in the original two issues:

	--- FAIL: TestUnmarshalRescanLiteralMangledUnquote (0.00s)
	    decode_test.go:2443: Key "开源" does not exist in map: map[开���:12345开源]
	    decode_test.go:2458: Unmarshal unexpected error: json: invalid use of ,string struct tag, trying to unmarshal "\"aaa\tbbb\"" into string

Fixes #38106.
For #38105.
For #38126.

Change-Id: I761e54924e9a971a4f9eaa70bbf72014bb1476e6
Reviewed-on: https://go-review.googlesource.com/c/go/+/226218
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
(cherry picked from commit 55361a2)
Reviewed-on: https://go-review.googlesource.com/c/go/+/233057
Run-TryBot: Dmitri Shuralyov <dmitshur@golang.org>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
@golang golang locked and limited conversation to collaborators May 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants