Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/json: unmarshal into slice reuses element data between len and cap #21092

Open
trotha01 opened this issue Jul 19, 2017 · 21 comments
Open
Labels
NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@trotha01
Copy link

What version of Go are you using (go version)?

go1.8

What operating system and processor architecture are you using (go env)?

GOOS=nacl
GOARCH=amd64p32

What did you do?

https://play.golang.org/p/lbYUhgOe--

What did you expect to see?

According to the json Unmarshal docs, Unmarshal resets the slice length to zero and then appends each element to the slice.

What did you see instead?

Instead Unmarshal resets the slice length to zero and then modifies the elements of the underlying slice.

@bradfitz
Copy link
Contributor

This is working as intended. See https://blog.golang.org/slices for how slices work.

"resets the slice length to zero and then appends each element to the slice" and "resets the slice length to zero and then modifies the elements of the underlying slice" are the same thing.

@trotha01
Copy link
Author

https://golang.org/pkg/encoding/json/#Unmarshal

To unmarshal a JSON array into a slice, Unmarshal resets the slice length to zero and then appends each element to the slice

@bradfitz This may be working as intended, but not as described in the docs. The docs describe the slice length being set to zero (which happens), then appending (which is not what is happening).

The builtin append will replace the struct in a slice wholly. Unmarshal does not. Since the builtin append does one thing, and Unmarshal does a second, we should not say that Unmarshal is appending.

@bradfitz bradfitz reopened this Jul 20, 2017
@bradfitz bradfitz changed the title json unmarshal into slice does not append encoding/json: unmarshal into slice does not append Jul 20, 2017
@bradfitz bradfitz added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jul 20, 2017
@bradfitz bradfitz added this to the Go1.10 milestone Jul 20, 2017
@thoeni
Copy link
Contributor

thoeni commented Jul 20, 2017

TL;DR
Probably update documentation explaining the actual behaviour of Unmarshal as the current doc is misleading.

Extended version:
I had a look at this issue, and here are my findings:

On the encoding/json Unmarshal documentation this is mentioned:

If a JSON value is not appropriate for a given target type, or if a JSON number overflows the target type, Unmarshal skips that field and completes the unmarshaling as best it can.

Now, we could argue that in the use case that is described in this issue, a json object with just Name and no Age is a value which is not "appropriate" for the struct target type which expects both, and therefore the code is completing the unmarshaling "as best it can".

In the counterexample of the manual SliceAppend, the creation of a new Person initialises the age to the zero value, hence the append overrides everything, while during unmarshalling this doesn't happen.

An option is to set the element value to the target type "zero value" before updating it by assigning the decoded value from the json: in such a way everything would be assigned to its zero value (as the new Person{} example) and the unmarshal would do its job "as best it can" for the other data, for example here:

if i < v.Len() {
   // Calculate zero value for target type 
   z := reflect.Zero(v.Type().Elem())
   // Assign the zero value to the element
   v.Index(i).Set(z)
   // Decode into element.
   d.value(v.Index(i))
}

In general, I miss the use case of passing an initialised slice to unmarshal new values that override the old values, where I could just pass a new empty slice (and I agree that the description in this case is misleading).

I have a green test, but I'm not sure of the performance implications of adding those steps (i.e. setting the value twice, zero and then decoded one).
If it makes sense to proceed, I can write more tests to cover types with different zero values, as at the moment I'm using strings and ints).

At a first sight, I would probably suggest to update the documentation as I expect someone is relying on that behaviour.

[update]
If I run all tests with this solution, jsonrpc tests break, therefore the solution indicated might need further investigation should we decide to proceed in that direction.

@dvyukov
Copy link
Member

dvyukov commented Jul 25, 2017

This does not look like working as intended and more like data corruption. Appending must overwrite the old value. This must print 1/1, but it prints 1/3:
https://play.golang.org/p/myimTJdn42

@dvyukov
Copy link
Member

dvyukov commented Jul 25, 2017

The code does not seem to regard even slice len. So if one uses this to get some kind of merging, she can get unexpected garbage with whatever was there between len and cap.

dvyukov added a commit to google/syzkaller that referenced this issue Jul 25, 2017
json decoding behavior is somewhat surprising
(see // golang/go#21092).
This behavior is especially easy to hit in tests
that reuse reply objects.
To avoid any surprises, we zero the reply.
@rsc
Copy link
Contributor

rsc commented Nov 22, 2017

I agree that when appending to a slice, unmarshal should start with fresh zeroed slice elements and not the old slice elements that happen to sit between len and cap. Too late for Go 1.10 though.

Dmitry's example is clearly incorrect behavior and should be fixed. However, the fix should preserve the current behavior in this variant: https://play.golang.org/p/H0kRWEEiEW. (Resetting the length to 0 does not mean resetting the capacity to 0.)

@rsc rsc modified the milestones: Go1.10, Go1.11 Nov 22, 2017
@rsc rsc added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Nov 22, 2017
@rsc rsc changed the title encoding/json: unmarshal into slice does not append encoding/json: unmarshal into slice reuses element data between len and cap Nov 22, 2017
@bradfitz
Copy link
Contributor

Duplicate (closed) bug with other repros: #24155

@gopherbot
Copy link

Change https://golang.org/cl/98516 mentions this issue: encoding/json: Fix for #21092 - Zero out target before decoding

@OneOfOne
Copy link
Contributor

As someone who actually depends on the fact that Unmarshal does NOT zero out maps/slices, this would break a lot of my code.

@JeremyLoy
Copy link

@OneOfOne out of curiosity, what is your usecase that requires data to be retained for maps/slices?

@OneOfOne
Copy link
Contributor

@JeremyLoy in one case to override the default values in a map and in a few other, filling the map from multiple input files.

@OneOfOne
Copy link
Contributor

Maybe you can add an extra field to the json decoder to make this optional.

@robmccoll
Copy link

@OneOfOne Based on reading @rsc's comment, I don't think your case will be affected.

Please excuse my rambling explanation: Unmarshalling onto existing slices would continue to merge data for elements up to len() and only zero elements between len() and cap(). If you read in a file with 3 slice elements and then read a second file with 4 slice elements, the first 3 elements will be merged (in that values in fields contained in the JSON objects in the second file will overwrite values from the JSON objects in the first but leave any values contained in the objects in the first file that were not also in the second file). The fourth element would just be exactly what was contained in the fourth element of the second file.

If you are for some reason reading in the first file, then setting your slice header back a = a[:0] before reading in the second file, this will affect you. But also, I don't see why you would be doing that.

@bradfitz bradfitz modified the milestones: Go1.11, Go1.12 Jun 13, 2018
@ianlancetaylor ianlancetaylor removed this from the Go1.12 milestone Dec 11, 2018
@ianlancetaylor ianlancetaylor added this to the Go1.13 milestone Dec 11, 2018
@andybons andybons modified the milestones: Go1.13, Go1.14 Jul 8, 2019
@gopherbot
Copy link

Change https://golang.org/cl/191783 mentions this issue: encoding/json: don't reuse slice/array elements when decoding

@mvdan
Copy link
Member

mvdan commented Aug 29, 2019

I wasn't happy with the approach in the previous CL, and it also lacked proper testing and benchmarking, so I've started from scratch. I think this should be merged well before the freeze, as some users might be depending on the opposite of what the documentation says, so the surprises should come sooner than later :)

@neild
Copy link
Contributor

neild commented Jun 25, 2020

This change has produced a small but significant number of test failures in Google's codebase. A reduced example of one case which was broken is:

type T struct {
  Index    int
  Elements []json.RawMessage
}

var message T
tmp := []interface{}{&message.Index, &message.Elements}
err = json.Unmarshal(raw[0], &tmp)
if err != nil {
  return message, err
}
return message, nil

https://play.golang.org/p/iNBD_-mhWTI

@rsc
Copy link
Contributor

rsc commented Jun 26, 2020

@neild, this issue is closed but #39427 has an active discussion.

@mvdan
Copy link
Member

mvdan commented Jul 1, 2020

The change is being reverted. Reopening to allow me to retry a smaller change in 1.16.

@mvdan mvdan reopened this Jul 1, 2020
@mvdan mvdan modified the milestones: Go1.15, Go1.16 Jul 1, 2020
@mvdan mvdan self-assigned this Jul 1, 2020
@mvdan
Copy link
Member

mvdan commented Oct 28, 2020

I didn't have time to try this again in 1.16; moving to 1.17.

@arvenil
Copy link

arvenil commented Jun 12, 2023

... and here I am, looking for a way to actually append new elements at the end of existing slice.

package main

import (
	"encoding/json"
	"fmt"
)

func unmarshal(v any) error {
	data1 := []byte(`[{"Name":"Tom"},{"Name":"Michael"}]`)
	if err := json.Unmarshal(data1, v); err != nil {
		return err
	}

	data2 := []byte(`[{"Name":"Alice"},{"Name":"Frank"}]`)
	if err := json.Unmarshal(data2, v); err != nil {
		return err
	}

	return nil
}

type Person struct {
	Name string
}

func main() {
	var people []Person
	if err := unmarshal(&people); err != nil {
		panic(err)
	}

	fmt.Println(people)
}

Would be nice to have a way to get [{Tom} {Michael} {Alice} {Frank}] instead of [{Alice} {Frank}].

@dvyukov
Copy link
Member

dvyukov commented Jun 13, 2023

What should it do with integer fields? Sum? Or calculate an average? ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

Successfully merging a pull request may close this issue.