Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/gob: gob.(*Decoder).Decode failed #69131

Closed
vvwo opened this issue Aug 29, 2024 · 10 comments
Closed

encoding/gob: gob.(*Decoder).Decode failed #69131

vvwo opened this issue Aug 29, 2024 · 10 comments
Labels
WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.

Comments

@vvwo
Copy link

vvwo commented Aug 29, 2024

Go version

go version 1.21.1 or newer

Output of go env in your module/workspace:

# go env
GO111MODULE=''
GOARCH='amd64'
GOBIN=''
GOCACHE='/root/.cache/go-build'
GOENV='/root/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/home/admin/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/admin/go'
GOPRIVATE=''
GOPROXY='https://goproxy.cn,direct'
GOROOT='/home/admin/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/home/admin/go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.21.1'
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/dev/null'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build2297454152=/tmp/go-build -gno-record-gcc-switches'

What did you do?

...
conn, _ := net.Dial("tcp", addr)
enc := gob.NewEncoder(conn)
dec := gob.NewDecoder(bufio.NewReader(conn))
...

Simply put, Encode serializes and Decode deserializes data for network transmission.

What did you see happen?

Data packets can vary in size. When Decode fails, the first error that occurs is 'read tcp 10.35.146.11:54754->10.35.146.7:9527: i/o timeout'.
After this, Decode may continuously produce a series of errors such as 'gob: duplicate type received' and 'gob: unknown type id or corrupted data'.

What did you expect to see?

I've tried to fix the bug and will attempt to submit a pull request.

@ianlancetaylor
Copy link
Member

Can you show us a small, complete, standalone program that demonstrates the problem? Thanks.

@ianlancetaylor ianlancetaylor added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Aug 29, 2024
@vvwo
Copy link
Author

vvwo commented Aug 30, 2024

I’m very sorry, but it's difficult to reproduce it as a small, complete, standalone program, so I can't provide a demo.
It only appears after running for some time in the production environment, and the likelihood of reproduction may depend on factors such as packet size and network conditions.

Stack:
https://github.com/golang/go/blob/master/src/syscall/zsyscall_linux_amd64.go#L736
https://github.com/golang/go/blob/master/src/syscall/syscall_unix.go#L183
https://github.com/golang/go/blob/master/src/internal/poll/fd_unix.go#L161
https://github.com/golang/go/blob/master/src/net/fd_posix.go#L55
https://github.com/golang/go/blob/master/src/net/net.go#L189
https://github.com/golang/go/blob/master/src/bufio/bufio.go#L227
https://github.com/golang/go/blob/master/src/io/io.go#L335
https://github.com/golang/go/blob/master/src/io/io.go#L354
https://github.com/golang/go/blob/master/src/internal/saferio/io.go#L37
https://github.com/golang/go/blob/master/src/encoding/gob/decoder.go#L103
https://github.com/golang/go/blob/master/src/encoding/gob/decoder.go#L91
https://github.com/golang/go/blob/master/src/encoding/gob/decoder.go#L148
https://github.com/golang/go/blob/master/src/encoding/gob/decoder.go#L227
https://github.com/golang/go/blob/master/src/encoding/gob/decoder.go#L204

My thoughts:
When reading data(https://github.com/golang/go/blob/master/src/internal/poll/fd_unix.go#L161), syscall.Read sometimes returns syscall.EAGAIN.
At this point, ReadAtLeast(https://github.com/golang/go/blob/master/src/io/io.go#L335) will return a network error 'i/o timeout'.
Therefore, ReadData(https://github.com/golang/go/blob/master/src/internal/saferio/io.go#L37) seems illogical.
When returning a network error 'i/o timeout', it does not handle the partially read data but instead discards it!!!
It seems that there are many similar usages throughout Go, such as (_, err := io.ReadFull(r, buf))...

What I did:
Because packages like strings and net cannot be used, which would cause 'import cycle not allowed,'.
In ReadAtLeast, after the Read, I added:
if err != nil && contains(err.Error(), "i/o timeout") {
err = nil
continue
}
ps:
func contains(s, substr string) bool {
if len(substr) == 0 {
return true
}
if len(s) < len(substr) {
return false
}
for i := 0; i+len(substr) <= len(s); i++ {
if s[i:i+len(substr)] == substr {
return true
}
}
return false
}
It may look ugly, but it works.
If possible, please give me some more elegant suggestions.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/609759 mentions this issue: encoding/gob: gob.(*Decoder).Decode failed #69131

@ianlancetaylor
Copy link
Member

Therefore, ReadData(https://github.com/golang/go/blob/master/src/internal/saferio/io.go#L37) seems illogical.
When returning a network error 'i/o timeout', it does not handle the partially read data but instead discards it!!!
It seems that there are many similar usages throughout Go, such as (_, err := io.ReadFull(r, buf))...

There are similar usages because there is nothing wrong with that usage. A call to io.ReadFull means that if we don't get all the data we expect then we have an error. In this context an I/O timeout error is an error. If you are in a situation where it's OK to receive some bytes and then an I/O timeout error, you should not call io.ReadFull.

The encoding/gob package does not set any timeouts, and it does not expect to operate over a connection that can have timeouts and then recover.

@seankhliao
Copy link
Member

I think we can close as working as intended for encoding/gob

@seankhliao seankhliao closed this as not planned Won't fix, can't repro, duplicate, stale Sep 2, 2024
@ianlancetaylor
Copy link
Member

I'll note that you could write a custom implementation of io.Reader that ignores timeout errors, and use that with encoding/gob.

@vvwo
Copy link
Author

vvwo commented Sep 3, 2024

There are similar usages because there is nothing wrong with that usage. A call to io.ReadFull means that if we don't get all the data we expect then we have an error. In this context an I/O timeout error is an error. If you are in a situation where it's OK to receive some bytes and then an I/O timeout error, you should not call io.ReadFull.

I understand that io.ReadFull works fine on its own; if an error occurs, it returns the data read up to that point. When there is an I/O timeout, it might be better to handle the partial data in readMessage rather than just using saferio.ReadData.

I'll note that you could write a custom implementation of io.Reader that ignores timeout errors, and use that with encoding/gob.

In fact, I tried using a custom io.Reader and set it to ignore timeouts, but the problem still persists. What’s puzzling is that I found the first syscall.Read encountering an syscall.EAGAIN(I/O timeout) occurs within the same second.

@vvwo
Copy link
Author

vvwo commented Sep 3, 2024

I think we can close as working as intended for encoding/gob

Sorry for the late reply.

@ianlancetaylor
Copy link
Member

It's entirely normal for syscall.Read to return EAGAIN. In Go network connections are in non-blocking mode, and managed through epoll or kqueue. That works by calling syscall.Read until it returns EAGAIN, and then pausing the outer Go Read call until the network connection has data available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants