Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/json: potential deadlock in sync.WaitGroup usage within JSON encoder cache initialization #70753

Closed
lss25 opened this issue Dec 10, 2024 · 6 comments
Labels
WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.

Comments

@lss25
Copy link

lss25 commented Dec 10, 2024

What version of Go are you using (go version)?

$ go version
go version go1.20.14 darwin/amd64

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE="on"
GOARCH="amd64"
GOBIN="/Users/bytedance/go/bin"
GOCACHE="/Users/bytedance/Library/Caches/go-build"
GOENV="/Users/bytedance/Library/Application Support/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/bytedance/go/pkg/mod"
GONOPROXY="*.byted.org,*.everphoto.cn,git.smartisan.com"
GONOSUMDB="*.byted.org,*.everphoto.cn,git.smartisan.com"
GOOS="darwin"
GOPATH="/Users/bytedance/go"
GOPRIVATE="*.byted.org,*.everphoto.cn,git.smartisan.com"
GOPROXY="https://go-mod-proxy.byted.org|https://goproxy.cn|https://proxy.golang.org|direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.google.cn"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
GOVCS=""
GOVERSION="go1.20.14"
GCCGO="gccgo"
GOAMD64="v1"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="0"
GOMOD="/dev/null"
GOWORK=""
CGO_CFLAGS="-O2 -g"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-O2 -g"
CGO_FFLAGS="-O2 -g"
CGO_LDFLAGS="-O2 -g"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -arch x86_64 -m64 -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/tb/h9lbg10x1930ds42n9wxkffr0000gp/T/go-build3470492578=/tmp/go-build -gno-record-gcc-switches -fno-common"
GOROOT/bin/go version: go version go1.20.14 darwin/amd64
GOROOT/bin/go tool compile -V: compile version go1.20.14
uname -v: Darwin Kernel Version 23.2.0: Wed Nov 15 21:54:10 PST 2023; root:xnu-10002.61.3~2/RELEASE_X86_64
ProductName:		macOS
ProductVersion:		14.2.1
BuildVersion:		23C71
lldb --version: lldb-1500.0.200.58
Apple Swift version 5.9.2 (swiftlang-5.9.2.2.56 clang-1500.1.0.2.5)
gdb --version: GNU gdb (GDB) 11.2

What did you do?

I encountered an issue related to the concurrent initialization of type encoders in the JSON package while working with recursive types. The code problematic is located in:
File:
src/encoding/json/encode.go

What did you expect to see?

I expected the JSON package's typeEncoder function to handle recursive types reliably during concurrent initialization without any risk of deadlock, ensuring smooth and efficient operations.

What did you see instead?

Instead, I experienced a potential deadlock scenario caused by the use of sync.WaitGroup in the process of initializing these type encoders. Here's a Detailed Explanation of the problem:

The implementation uses a sync.WaitGroup to block execution for partially initialized encoders concerning recursive types. If two goroutines both attempt to initialize the encoder for the same reflect.Type simultaneously, a deadlock might occur. Both goroutines find the encoder uninitialized, add to the wait group, and attempt to store their respective partially constructed encoders. However, when one goroutine completes its initialization and calls wg.Done(), the second goroutine is left waiting indefinitely because the wait group's counter is incremented twice but decremented only once.

Impact: This deadlock may affect the performance and reliability of applications that use the encoding/json package for complex recursive types.

Suggested Solution: Ensure that every increment of the wait group counter is matched by its decrement across all execution paths. One approach is always to decrement the counter whenever a goroutine finds it is not the first to execute a store operation.

Proposed Code Snippet:

func typeEncoder(t reflect.Type) encoderFunc {
    if fi, ok := encoderCache.Load(t); ok {
        return fi.(encoderFunc)
    }

    var (
        wg sync.WaitGroup
        f  encoderFunc
    )
    wg.Add(1)
    fi, loaded := encoderCache.LoadOrStore(t, encoderFunc(func(e *encodeState, v reflect.Value, opts encOpts) {
        wg.Wait()
        f(e, v, opts)
    }))
    if loaded {
        wg.Done() // Ensure the WaitGroup is properly decremented.
        return fi.(encoderFunc)
    }

    f = newTypeEncoder(t, true)
    wg.Done()
    encoderCache.Store(t, f)
    return f
}

By improving the synchronization approach or redesigning the caching strategy, such as eliminating the need for WaitGroups in this context, we can safeguard against deadlocks in concurrent setups.

@cagedmantis cagedmantis changed the title src/encoding/json/encode.go: Potential Deadlock in sync.WaitGroup Usage within JSON Encoder Cache Initialization encoding/json: potential deadlock in sync.WaitGroup usage within JSON encoder cache initialization Dec 10, 2024
@cagedmantis
Copy link
Contributor

@ lss25 Is this still an issue with one of the currently supported versions of Go (1.22 or 1.23)? Can you provide a snippet of code that clearly demonstrates the bug?

@cagedmantis cagedmantis added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Dec 10, 2024
This was referenced Dec 11, 2024
@lss25
Copy link
Author

lss25 commented Dec 11, 2024

@cagedmantis The issue still persists. In Go version 1.23, within the file src/encoding/json/encode.go at line 373, the typeEncoder function does not call wg.Done() before the statement return fi.(encoderFunc), which can still lead to a deadlock.
image

This PR resolves the aforementioned deadlock issue.
#70766

@ianlancetaylor
Copy link
Member

@lss25 Thanks for pointing to the code. Do you have a test case that demonstrates the bug? I ask because we certainly want to add a test when making any change of this sort, and you may already have a test that we can reuse. Thanks.

@lss25
Copy link
Author

lss25 commented Dec 11, 2024

@ianlancetaylor The deadlock mentioned above is not a bug; it was my misunderstanding. There will be only one sync.WaitGroup whose count doesn't return to zero, but it will not cause a deadlock.

@ianlancetaylor
Copy link
Member

Thanks. As far as I can tell the change you suggest will never make a difference. There is no particular rule that we have to call wg.Done in all cases. When loaded is true nothing will ever call wg.Wait, so it's fine to not call wg.Done in that case.

@ianlancetaylor ianlancetaylor closed this as not planned Won't fix, can't repro, duplicate, stale Dec 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Projects
None yet
Development

No branches or pull requests

4 participants