-
Notifications
You must be signed in to change notification settings - Fork 17.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compress/flate: unrecognized failures #66383
Comments
Found new dashboard test flakes for:
2024-03-18 17:06 darwin-arm64-12 go@dda4b17e compress/flate (log)
|
This happens to us as well, since upgrading from Go 1.21 to 1.22.1. $ go version go version go1.22.1 linux/arm64 An example of such an error is attached - |
Update: our code was using the builtin gzip package specifically in this use case, and for a specific tenant, to compress large amounts of data. Once we disabled it the crash disappeared. This seems to match the original failure in this ticket, sugging issues in the compression code. |
This comment was marked as duplicate.
This comment was marked as duplicate.
This comment was marked as duplicate.
This comment was marked as duplicate.
I missed the comments here from back in late March/early April, but to be absolutely clear, this failure happened on our old macOS infrastructure which was known for having memory corruption bugs that haven't been present since we switched that infrastructure (though, there were other issues, but they didn't look like this). For everyone commenting here, I think we need much more information to be able to do anything about this.
Thanks in advance. |
(My only suspicion here is that there's a mishandling of "GC progs" which are used for the GC metadata for very large objects. I know for certain that |
Hello @mknyszek, thank you for the response. We don't have a small reproducer, because it doesn't happen on small payloads. It only happens for a single tenant in our environment, but for them it happens on a daily basis. So we know it depends on the actual data being compressed. |
Thanks. I looked at the failing line it looks like the type information is nil. arm64 is an interesting platform to fail on because of the weaker memory model -- perhaps GC prog objects are becoming visible to the GC sooner? (Assuming that is the codepath being taken here, which is still a bit unclear.) Go 1.23 actually already has a mitigation for this: if the type is nil, scanning is skipped (this is to support a different use-case). It's possible you just might not see the issue with Go 1.23 already (that is, if you try building Go from tip-of-tree). |
#67255 may be related. It can produce the same failure and a fix was recently landed. The situation is an append of a 0-sized slice, which may be something |
I tried reverting my CL for 67255 and adding this hack to the runtime:
And |
... but I'm still betting that a pointer-past-the-end bug somewhere is ultimately responsible. |
@aciduck Would it be possible for you to try out different things to help narrow down the root cause?
Since you don't have a small reproducer, I assume the only way to try this out is to just run with these modes for a while and see what happens? Meanwhile I'll try to reproduce myself. |
Issue created automatically to collect these failures.
Example (log):
— watchflakes
The text was updated successfully, but these errors were encountered: