Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/json: marshal result of string type struct field with ",string" option change in go1.14 #38173

Closed
AllenX2018 opened this issue Mar 31, 2020 · 4 comments
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@AllenX2018
Copy link

What version of Go are you using (go version)?

$ go version
go version go1.14.1 windows/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
set GO111MODULE=on
set GOARCH=amd64
set GOBIN=
set GOEXE=.exe
set GOFLAGS=
set GOHOSTARCH=amd64
set GOHOSTOS=windows
set GOINSECURE=
set GONOPROXY=
set GONOSUMDB=
set GOOS=windows

What did you do?

This issue is actually a separate issue from discussions below #38126.

type T struct {
	F1 string `json:"F1,string"`
}

t := T {
	"aaa\tbbb",
}

b, _ := json.Marshal(t)
fmt.Println(string(b))

What did you expect to see?

Before go 1.14:

{"F1":"\"aaa\\tbbb\""}

Autually I'm not sure which of the output results is correct. I think it was introduced by 0e015e20 but I don't know why it has to be changed like that. I want to figure out which of them exactly match the godoc's description:

The "string" option signals that a field is stored as JSON inside a JSON-encoded string

What did you see instead?

After go1.14:

{"F1":"\"aaa\tbbb\""}
@mvdan
Copy link
Member

mvdan commented Mar 31, 2020

@gopherbot, please open a backport issue for 1.14.

I think this is a regression - https://go-review.googlesource.com/c/go/+/193604 was a good change, but we didn't properly think of all the edge cases. I have a fix nearly ready. CC @breml @dsnet

@gopherbot
Copy link

Backport issue(s) opened: #38178 (for 1.14).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases.

@mvdan mvdan added the NeedsFix The path to resolution is known, but the work has not been done. label Mar 31, 2020
@gopherbot
Copy link

Change https://golang.org/cl/226498 mentions this issue: encoding/json: properly encode strings with ",string" again

@gopherbot
Copy link

Change https://golang.org/cl/233037 mentions this issue: [release-branch.go1.14] encoding/json: properly encode strings with ",string" again

gopherbot pushed a commit that referenced this issue May 27, 2020
…,string" again

golang.org/cl/193604 fixed one bug when one encodes a string with the
",string" option: if SetEscapeHTML(false) is used, we should not be
using HTML escaping for the inner string encoding. The CL correctly
fixed that.

The CL also tried to speed up this edge case. By avoiding an entire new
call to Marshal, the new Issue34127 benchmark reduced its time/op by
45%, and lowered the allocs/op from 3 to 2.

However, that last optimization wasn't correct:

	Since Go 1.2 every string can be marshaled to JSON without error
	even if it contains invalid UTF-8 byte sequences. Therefore
	there is no need to use Marshal again for the only reason of
	enclosing the string in double quotes.

JSON string encoding isn't just about adding quotes and taking care of
invalid UTF-8. We also need to escape some characters, like tabs and
newlines.

The new code failed to do that. The bug resulted in the added test case
failing to roundtrip properly; before our fix here, we'd see an error:

	invalid use of ,string struct tag, trying to unmarshal "\"\b\f\n\r\t\"\\\"" into string

If you pay close attention, you'll notice that the special characters
like tab and newline are only encoded once, not twice. When decoding
with the ",string" option, the outer string decode works, but the inner
string decode fails, as we are now decoding a JSON string with unescaped
special characters.

The fix we apply here isn't to go back to Marshal, as that would
re-introduce the bug with SetEscapeHTML(false). Instead, we can use a
new encode state from the pool - it results in minimal performance
impact, and even reduces allocs/op further. The performance impact seems
fair, given that we need to check the entire string for characters that
need to be escaped.

	name          old time/op    new time/op    delta
	Issue34127-8    89.7ns ± 2%   100.8ns ± 1%  +12.27%  (p=0.000 n=8+8)

	name          old alloc/op   new alloc/op   delta
	Issue34127-8     40.0B ± 0%     32.0B ± 0%  -20.00%  (p=0.000 n=8+8)

	name          old allocs/op  new allocs/op  delta
	Issue34127-8      2.00 ± 0%      1.00 ± 0%  -50.00%  (p=0.000 n=8+8)

Instead of adding another standalone test, we convert an existing
"string tag" test to be table-based, and add another test case there.

One test case from the original CL also had to be amended, due to the
same problem - when escaping '<' due to SetEscapeHTML(true), we need to
end up with double escaping, since we're using ",string".

Fixes #38178.
For #38173.

Change-Id: I2b0df9e4f1d3452fff74fe910e189c930dde4b5b
Reviewed-on: https://go-review.googlesource.com/c/go/+/226498
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
(cherry picked from commit b1a48af)
Reviewed-on: https://go-review.googlesource.com/c/go/+/233037
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Run-TryBot: Dmitri Shuralyov <dmitshur@golang.org>
@golang golang locked and limited conversation to collaborators May 8, 2021
@rsc rsc unassigned mvdan Jun 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

3 participants