compress/gzip: return different error for trailing garbage #61797

nikaiw · 2023-08-07T03:21:55Z

Hello,
Regarding #47809 , would you consider returning a different err ?
"gzip: invalid header" can be quite misleading when the error is actually because of trailing garbage.

ianlancetaylor · 2023-08-07T05:01:03Z

CC @dsnet

dsnet · 2023-08-07T22:39:45Z

According to RFC 1952, section 2.2:

A gzip file consists of a series of "members" (compressed data sets).

Thus, when a gzip member terminates, the parser should correctly check whether there is a header for another gzip member. An "invalid header" error seems to be the right error given this grammar.

cespare · 2023-08-07T22:58:00Z

In addition to what @dsnet said, I think it's possible to use the current API to figure out whether there might be trailing data. If z is a gzip.Reader, you can call z.Multistream(false) to have the reader read a single member at a time. Then:

If you know that your data always consists of a single data stream, you can read it to the end. If there's still more data available to read after that, you know that there is trailing junk.
If your data may consist of multiple concatenated streams, then it isn't really possible to disambiguate corrupted (say, truncated) data from trailing junk. But you could still read as many data streams as possible and then, once you hit an error, decide whether you think you've got trailing junk or a corrupted chunk, perhaps based on what the decompressed data looks like or something.

(The general ambiguity described in the second bullet is just what @dsnet pointed out; it's why the compress/gzip package isn't in a position to know whether there is trailing junk or a different kind of corruption. But you may be able to tell based on what your data looks like.)

nikaiw · 2023-08-08T12:27:18Z

Thanks for the details, that is making sense and I don't want to complicate the code for nothing. I must say I would still love to have a distinguished error that would allow a developper to quickly be able to make the difference between an error regarding the first gzip header and the attempt at reading another gzip member from a multistream gzip.

I imagine the value of the error could be changed to reflect that just after we manage to read the first member.

go/src/compress/gzip/gunzip.go

Line 273 in 24f83ed

// File is ok; check if there is another.

What are your thoughts?

Edit: Alternatively, if multistream is set to true, the error could be read as something such as "Invalid header in multi-stream gzip"

nikaiw added the pkgsite label Aug 7, 2023

ianlancetaylor changed the title ~~compress/gzip - trailing garbage~~ compress/gzip: return different error for trailing garbage Aug 7, 2023

ianlancetaylor added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. and removed pkgsite labels Aug 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compress/gzip: return different error for trailing garbage #61797

compress/gzip: return different error for trailing garbage #61797

nikaiw commented Aug 7, 2023

ianlancetaylor commented Aug 7, 2023

dsnet commented Aug 7, 2023

cespare commented Aug 7, 2023

nikaiw commented Aug 8, 2023 •

edited

compress/gzip: return different error for trailing garbage #61797

compress/gzip: return different error for trailing garbage #61797

Comments

nikaiw commented Aug 7, 2023

ianlancetaylor commented Aug 7, 2023

dsnet commented Aug 7, 2023

cespare commented Aug 7, 2023

nikaiw commented Aug 8, 2023 • edited

nikaiw commented Aug 8, 2023 •

edited