Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: jsoniter stack corruption with 32-bit arches, potential GC issues with 64-bit arches #71408

Closed
bboreham opened this issue Jan 23, 2025 · 23 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. release-blocker
Milestone

Comments

@bboreham
Copy link
Contributor

Go version

go1.24rc2 linux/amd64

Output of go env in your module/workspace:

AR='ar'
CC='gcc'
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_ENABLED='0'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
CXX='g++'
GCCGO='gccgo'
GO111MODULE=''
GO386='sse2'
GOARCH='386'
GOAUTH='netrc'
GOBIN=''
GOCACHE='/home/bryan/.cache/go-build'
GOCACHEPROG=''
GODEBUG=''
GOENV='/home/bryan/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFIPS140='off'
GOFLAGS=''
GOGCCFLAGS='-fPIC -m32 -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build1397494606=/tmp/go-build -gno-record-gcc-switches'
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMOD='/tmp/jsoniter/go.mod'
GOMODCACHE='/home/bryan/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/bryan'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/home/bryan/pkg/mod/golang.org/toolchain@v0.0.1-go1.24rc2.linux-amd64'
GOSUMDB='sum.golang.org'
GOTELEMETRY='on'
GOTELEMETRYDIR='/home/bryan/.config/go/telemetry'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/home/bryan/pkg/mod/golang.org/toolchain@v0.0.1-go1.24rc2.linux-amd64/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.24rc2'
GOWORK=''
PKG_CONFIG='pkg-config'

What did you do?

When GOARCH was set to 386, and trying out the release candidates of Go 1.24, a unit test failed in the project Prometheus: https://github.com/prometheus/prometheus/actions/runs/12932314085/job/36070245239?pr=15707

I have simplified to just one use of the 'jsoniter' library.
https://go.dev/play/p/gtfdrAhV48r?v=gotip

(That link gets a different error: could not parse netrc (GOAUTH=netrc): $HOME is not defined, so I'll supply files too.)

go.mod
go.sum
jsoniter_test.go

What did you see happen?

$ GOARCH=386 go test
--- FAIL: TestJsonIter (0.00s)
    jsoniter_test.go:21: Result too short (28): "{\"item1\":[{\"type\":\"gauge\"}]}"
FAIL
exit status 1
FAIL    v1      0.001s

What did you expect to see?

The resulting JSON should be about 57 bytes long, with two items in it.
The test passes with GOARCH=amd64 and Go 1.24rc2, also with GOARCH=386 on Go 1.23.

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jan 23, 2025
@randall77
Copy link
Contributor

I can reproduce.
@prattmic Possibly a swiss map iteration bug on 32-bit.
I will look more later today.

@randall77
Copy link
Contributor

I suspect that https://github.com/modern-go/reflect2, which is imported by https://github.com/json-iterator/go, can not handle the switch to swissmaps. It reaches into internal details of the runtime with unsafe/linkname/etc. In fact I'm surprised it mostly works.
My immediate thought is that the hiter struct they use is now wrong. Maybe the new hiter is bigger?

@prattmic
Copy link
Member

I kept the key and value fields in the same place to keep casts OK (reflect2 only accesses those fields AFAICT), but the new maps.Iter type is indeed 8-bytes bigger (64 vs 56 bytes) on 386, so we'll corrupt the stack. maps.Iter is smaller on amd64 (96 vs 104 bytes). The different pointer layout is likely a problem for the GC, even on amd64.

@randall77
Copy link
Contributor

I opened a new issue against reflect2 here: modern-go/reflect2#32

Not sure what if anything the Go project should do about this. I'll leave this issue open in case anyone has some suggestions, but I'm inclined to say "tough luck". But reflect2 is quite popular, so if there was something simple we could do I would consider it.

@bboreham
Copy link
Contributor Author

Thanks for the investigation and quick response.

FWIW I was aware that jsoniter is unmaintained, but I didn’t realise it was this problematic.

A little research shows that Kubernetes decided years ago to move off (although work remains).

@prattmic
Copy link
Member

prattmic commented Jan 23, 2025

Thinking about potential for issues with GC, here are the overlaps on amd64:

type hiter struct {                       | type Iter struct {
        key         unsafe.Pointer        |     key         unsafe.Pointer
        value       unsafe.Pointer        |     elem        unsafe.Pointer
        t           unsafe.Pointer        |     typ         *abi.SwissMapType
        h           unsafe.Pointer        |     m           *Map
        buckets     unsafe.Pointer        |     entryOffset uint64
        bptr        unsafe.Pointer        |     dirOffset   uint64
        overflow    *[]unsafe.Pointer     |     clearSeq    uint64
        oldoverflow *[]unsafe.Pointer     |     globalDepth uint8
        startBucket uintptr               |     dirIdx      int
        offset      uint8                 |     tab         *table
        wrapped     bool                  |
        B           uint8                 |
        i           uint8                 |
        bucket      uintptr               |     group.data  unsafe.Pointer
        checkBucket uintptr               |     entryIdx    uint64
}                                         | }

(These structures are actually the same size. https://go.dev/cl/625275 added one more field to hiter, but that isn't reflected in reflect2, nor was it ever released.)

The pointer/non-pointer mismatches are:

Recorded as pointer, but not actually a pointer:

buckets     <-> entryOffset
bptr        <-> dirOffset
overflow    <-> clearSeq
oldoverflow <-> globalDepth

For these, the GC may throw if these look like pointers that aren't allocated. Also if they are non-zero but look too small to be a valid pointer [1]. The latter is more likely, all of these values will generally be quite small.

Recorded as non-pointer, but actually a pointer:

offset+wrapper+B+i <-> tab
bucket             <-> group.data

For these, if they aren't otherwise reachable then they could be freed early.

Both of these are normally reachable from the map itself, though I don't think reflect2 guarantees the map will still be reachable. (Edit: doh, the map is reachable from Iter). They will become reachable only from Iter if the table grows during iteration [2].

[1] I think? But now I can't find the code.
[2] I haven't double-checked, but I doubt the map can grow in json-iterator, as it wouldn't make sense for that package to mutate the map. But it may apply to other users of reflect2.


It looks like we could potentially work around this problem by rearranging the fields. Move tab and group up to align with a pointer.

Move entryOffset, dirOffset, clearSeq, globalDepth off of pointers. The only problem is that we have 5 non-pointer words, but hiter only has 4.

But globalDepth is small, so if we align it the top byte of a pointer, then that pointer will always look like a kernel address, which is probably OK.

@prattmic
Copy link
Member

Release blocker until we make a decision.

@prattmic prattmic added the NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. label Jan 23, 2025
@prattmic prattmic added this to the Go1.24 milestone Jan 23, 2025
@prattmic prattmic changed the title runtime: jsoniter fail with GOARCH=386 and 1.24rc{1,2} runtime: jsoniter stack corruption with 32-bit arches, potential GC issues with 64-bit arches Jan 23, 2025
@bboreham
Copy link
Contributor Author

we have 5 non-pointer words, but hiter only has 4.

I count 6 in Iter - entryOffset, dirOffset, clearSeq, globalDepth, dirIdx, entryIdx.

@prattmic
Copy link
Member

Here's what I've got so far:

type hiter struct {                       | type Iter struct {
        key         unsafe.Pointer        |     key         unsafe.Pointer
        value       unsafe.Pointer        |     elem        unsafe.Pointer
        t           unsafe.Pointer        |     typ         *abi.SwissMapType
        h           unsafe.Pointer        |     m           *Map
        buckets     unsafe.Pointer        |     tab         *table
        bptr        unsafe.Pointer        |     group.data  unsafe.Pointer
        overflow    *[]unsafe.Pointer     |     clearSeq    uint64
        oldoverflow *[]unsafe.Pointer     |     _unused     uint64 (64-bit) / struct{} (32-bit)
        startBucket uintptr               |     entryOffset uintptr
        offset      uint8                 |     dirOffset   uintptr
        wrapped     bool                  |
        B           uint8                 |
        i           uint8                 |
        bucket      uintptr               |     entryIdx    uint16
                                          |     globalDepth uint8
        checkBucket uintptr               |     dirIdx      int
}   
  • I shrank entryIdx to uint16 as it is always <1024. This lets it colocate with globalDepth.
  • entryOffset and dirOffset are now uintptr, as we don't need 64-bits of randomness on 32-bit platforms.
  • All of the pointers in Iter are now aligned with pointers in hiter.
  • All of the non-pointers in Iter are aligned with non-pointers in hiter except clearSeq.

clearSeq still overlaps with the pointer overflow (and oldoverflow on 32-bit). This is somewhat problematic. clearSeq is a count of calls to clear(). It will almost always be quite small, but can technically get arbitrarily large.

We could consider this acceptable. With all of the pointers in Iter covered, we won't get memory corruption. If clearSeq gets too high, applications might get spurious "found pointer to free object" throws, but they should be rare.

@prattmic
Copy link
Member

Radically different approach:

type hiter struct {                       | type Iter struct {
        key         unsafe.Pointer        |     key         unsafe.Pointer
        value       unsafe.Pointer        |     elem        unsafe.Pointer
        t           unsafe.Pointer        |     iter        *realIter
        h           unsafe.Pointer        |
        buckets     unsafe.Pointer        |
        bptr        unsafe.Pointer        |
        overflow    *[]unsafe.Pointer     |
        oldoverflow *[]unsafe.Pointer     |
        startBucket uintptr               |
        offset      uint8                 |
        wrapped     bool                  |
        B           uint8                 |
        i           uint8                 |
        bucket      uintptr               |
        checkBucket uintptr               |
}                                         | }

We could add an extra layer of indirection. This is unfortunate because it requires an extra allocation, and extra indirection in mapiternext, which will hurt performance. On the other hand, it is much simpler and arguably more future-proof [1], as we can freely modify realIter.

[1] On the other hand, if packages like reflect2 copy this new 3-word Iter type, then we can never grow this struct beyond 3 words, which would be very painful.

@prattmic
Copy link
Member

We could also mix these ideas, and make clearSeq a pointer to the sequence counter. It is rarely used, so wouldn't hurt performance.

@prattmic
Copy link
Member

I also propose we add a new function with an open linkname:

//go:linkname newmapiter
func runtime.newmapiter(t *abi.SwissMapType, m *maps.Map) *maps.Iter

This allocates (and initializes) a new iteration type. This would give projects using mapiterinit a migration path where they don't need to define their own iter type (they can just use it as unsafe.Pointer), even if they do keep using mapiternext.

To be clear, they should stop that as well, but this migration would be simpler and would avoid future issues with the iter type definition.

The changes to the internals of the map itself haven't been an issue because maps are allocated with make and users just cast them to unsafe.Pointer to pass to mapassign, etc. Thus the internal structure doesn't matter. This would make iteration similar.

@prattmic
Copy link
Member

(Sorry for using this issue as brainstorming board :P)

type Iter struct {
    key         unsafe.Pointer
    elem        unsafe.Pointer
    iter        *realIter
}

We could make mapiterinit, mapiternext, mapiterkey, mapiterelem operate on this wrapper type and introduce new functions used by compiler/std that operate on the real iteration type. Then only linkname users will take the indirection performance hit.

@randall77
Copy link
Contributor

The problem with the wrapper, or with newmapiter, is that part of the point of these unsafe shenanigans is to make iteration allocation-free. Both ways of solving this would introduce an allocation. (Unless we could inline newmapiter or mapiterinit somehow? Not sure that's possible when using linkname.)

@randall77
Copy link
Contributor

Actually, maybe reflect2 causes an allocation of an hiter anyway? It is a local variable but I think it escapes.

@prattmic
Copy link
Member

It does escape (source). I think we'd need quite clever escape analysis to avoid an escape there, even with inlining. It it turns out this function doesn't inline anyway, as it has cost 83...

Regardless, I am more concerned with correctness than performance of these unsafe packages.

@ianlancetaylor
Copy link
Member

In my opinion we should not introduce an extra allocation for all programs in order to support the reflect2 package. That is carrying compatibility too far.

@randall77
Copy link
Contributor

Regardless, I am more concerned with correctness than performance of these unsafe packages.

Sure, so am I. I just want to make sure that we don't add an extra allocation in any new solution that needs adopting, because with that new allocation they won't adopt it.

@prattmic
Copy link
Member

I am currently prototyping #71408 (comment).

In this approach, normal Go programs will use newly introduced mapiterinit2 and mapiternext2, which behave the same as mapiterinit and mapiternext today. No extra allocation for normal Go programs.

mapiterinit and mapiternext will remain for the sole use of //go:linkname users. mapiterinit will have an extra allocation vs today.

Personally, I think that is a good compromise. No impact on normal Go programs, maintains correctness for users of linkname, and doesn't require any migration.

It's true that linkname users will take a small performance hit. I think that is an acceptable trade-off for maintainability. It's true that there is a question of whether these users will try to work around this compatibility layer. It will be more difficult though, as mapiterinit2 and mapiternext2 won't allow linkname.

@thepudds
Copy link
Contributor

The wrapper in #71408 (comment) seems like a good solution that would not slow down normal code (though of course has the cost of needing to be implemented by Michael or whoever).

I think I recall at least one or maybe more of the linkname-related CLs that just redirected the unsafe linkname usage into the normal path, which meant it would function correctly but at "normal" speed (that is, no speed benefit from the linkname). I hunted for a couple of minutes just now to see if I could find an example, but didn't find one right away, but that's my recollection.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/643898 mentions this issue: runtime: rename mapiterinit and mapiternext

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/643899 mentions this issue: runtime: mapiter linkname compatibility layer

gopherbot pushed a commit that referenced this issue Jan 28, 2025
mapiterinit allows external linkname. These users must allocate their
own iter struct for initialization by mapiterinit. Since the type is
unexported, they also must define the struct themselves. As a result,
they of course define the struct matching the old hiter definition (in
map_noswiss.go).

The old definition is smaller on 32-bit platforms. On those platforms,
mapiternext will clobber memory outside of the caller's allocation.

On all platforms, the pointer layout between the old hiter and new
maps.Iter does not match. Thus the GC may miss pointers and free
reachable objects early, or it may see non-pointers that look like heap
pointers and throw due to invalid references to free objects.

To avoid these issues, we must keep mapiterinit and mapiternext with the
old hiter definition. The most straightforward way to do this is to use
mapiterinit and mapiternext as a compatibility layer between the old and
new iter types.

The first step to that is to move normal map use off of these functions,
which is what this CL does.

Introduce new mapIterStart and mapIterNext functions that replace the
former functions everywhere in the toolchain. These have the same
behavior as the old functions.

This CL temporarily makes the old functions throw to ensure we don't
have hidden dependencies on them. We cannot remove them entirely because
GOEXPERIMENT=noswissmap still uses the old names, and internal/goobj
requires all builtins to exist regardless of GOEXPERIMENT. The next CL
will introduce the compatibility layer.

I want to avoid using linkname between runtime and reflect, as that
would also allow external linknames. So mapIterStart and mapIterNext are
duplicated in reflect, which can be done trivially, as it imports
internal/runtime/maps.

For #71408.

Change-Id: I6a6a636c6d4bd1392618c67ca648d3f061afe669
Reviewed-on: https://go-review.googlesource.com/c/go/+/643898
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
@bboreham
Copy link
Contributor Author

bboreham commented Feb 6, 2025

FYI Prometheus unit tests pass on all architectures with 1.24RC3. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. release-blocker
Projects
None yet
Development

No branches or pull requests

6 participants