compress/flate: implement partial flush #31514

hanwen · 2019-04-17T10:17:33Z

What version of Go are you using (`go version`)?

go-1.11.5

while trying to implement zlib compression for SSH (by internal Google request,
#31369), I stumbled on lack of support for partial flush.

partial flush is described more precisely here: https://www.bolet.org/~pornin/deflate-flush.html (look for Z_PARTIAL_FLUSH). While this is deprecated functionality, it has also made its way into the SSH RFC.

For the write side, it looks like I can't truncate the sync marker from a full flush to get the same effect.

For the read side, feeding packets from OpenSSH into zlib.Reader leads to errors ("unexpected EOF").

The text was updated successfully, but these errors were encountered:

bradfitz · 2019-04-17T14:42:07Z

I assume this is in the flate package where you need this? At least initially?

/cc @nigeltao

hanwen · 2019-04-17T14:53:11Z

From what I could tell, the real work is in the flate package, but then the zlib package should also also expose it, because SSH produces and consumes the zlib version header.

dsnet · 2019-04-17T20:41:43Z

What about the reader side of things? Wouldn't a partial flush on the writer only be useful if the reader had API to synchronize on these specialized flushing modes?

I should note that this is more difficult on the reader side for API design, since the reader lacks an exported concrete type.

dsnet · 2019-04-17T20:47:30Z

Also, my reading of partial flush suggests that partial flush is a subset of sync flush, which the writer already supports.

hanwen · 2019-04-18T06:36:45Z

I've seen that the bytes already get decompressed correctly (so it kind of works), but then the reader wants to read on and finds EOF. The reader should know to stop further reads when it encounters a partial flush.

For API: when the reader encounters partial flush, it could just return the bytes thus far read without error. In SSH then, the caller could use bytes.Buffer{} as input to the reader, and stop requesting further zlib bytes when the buffer is empty.

dsnet · 2019-04-18T06:41:43Z

@hanwen, is there internal framing within the SSH protocol such that it can distinguish between a return from Read due to a partial flush, as a opposed a return due to other reasons (i.e., full buffer)?

hanwen · 2019-04-18T10:00:33Z

SSH protocol has packets, that are up to 32kb in size. The entire packet payload is compressed, so one would read until the entire packet was processed.

I think there is could be chunking going on, so that you may need to do multiple reads to process a single packet.

dsnet · 2019-04-19T23:46:32Z

The entire packet payload is compressed, so one would read until the entire packet was processed.

I'm not sure if that answered my question. More specifically: how does SSH handle framing for the packet? For the un-compressed data does it have a length-prefix or a delimiter marking the length or end of the packet? Or does it rely on a partial flush in DEFLATE to determine the end of a packet?

If the former, all that's required in compress/flate is to simply return from Read every time the block ends. If the later, we need additional API in compress/flate to synchronize on the different ways a DEFLATE block can end.

hanwen · 2019-04-22T16:58:08Z

there is no size prefix or delimiter. On the read side, you just keep reading until there are no compressed bytes left to process. If the read API would be changed to be

// Read up to buf bytes, or up to the next partial or full flush encountered in the compressed input
// stream
Read(buf []byte)

then you could do

  // init
  buf := bytes.NewBuffer(compressed)
  reader := zlib.Reader(buf)
 ..


// handle packet:
  buf.Write(compressedPacket)
  var uncompressed []byte
  for buf.Len() > 0 {
      var out [10240]byte 
      n, err := reader.Read(out[:])
      // err handling
      uncompressed = append(uncompressed, out[:n]...)
  }

this might break current users that expect to be able to read a certain amount of data in a single Read call, though.

There is packet framing, but it is around the compressed payload: the compressed data is padded, encrypted, mac'd and prefixed with total length for the encrypted packet.

branelmoro · 2019-07-19T18:57:22Z

@hanwen if we put limit on number of bytes read from underlying Reader and return error if maximum read limit is reached, then I think this will solve problem of reading till EOF.

We will need new interface for decompressor:

type Limiter interface {
    // Sets limit on number of bytes read from underlying Reader
    SetReadLimit(n int)
}

Also, we need to add new property to store limit on number of bytes read from underlying Reader.

type decompressor struct {
    ...
    l int
    ...
    ...
}

if l <= 0 (default 0), means that no read limit is set
if l > 0, means that maximum l number of bytes can be read from underlying Reader and it will return ReadLimitReached error if l number of bytes and already read and further read is done

Add f.l = -1 as default value in below two functions:

func NewReader(r io.Reader) io.ReadCloser {
    ...
    f.l = -1
    ...
    ...
}

func NewReaderDict(r io.Reader, dict []byte) io.ReadCloser {
    ...
    f.l = -1
    ...
    ...
}

We will need two more functions for decompressor:

func (f *decompressor) SetReadLimit(n int) {
    f.l = n
}

func (f *decompressor) readByte() (byte, error) {
    if f.l == 0 {
        return 0, errors.New("ReadLimitReached")
    }
    b, e := f.r.ReadByte()
    if e == nil && f.l != -1{
        f.l -= 1
    }
    return b, e
}

And finally, we need to replace f.r.ReadByte() with f.readByte()

Then in application, you can do

import "compress/flate"

decompressor := flate.NewReader(compressed_buffer)
size := 10 // or size := compressed_buffer.Len() or some known size of partial flush of compressed_buffer
decompressor.(flate.Limiter).SetReadLimit(size)
_, err = io.Copy(decompressed_buffer, decompressor)

Above code will return ReadLimitReached error after reading 10 bytes but it will not read till EOF
Now you can use decompressor further by setting limit again or by resetting underlying Reader

This will not break current users

branelmoro · 2019-07-19T19:17:02Z

@hanwen I have created pull request for this - #33218
I hope this works...

gopherbot · 2019-07-22T09:40:58Z

Change https://golang.org/cl/187018 mentions this issue: Fixes #31514 - implement partial flush for decompressor

gopherbot · 2019-10-03T20:51:12Z

Change https://golang.org/cl/198757 mentions this issue: compress/flate : add a check to recognize partial flush while decompressing

nhooyr · 2019-12-30T12:41:27Z

The docs on the synchronization example at https://golang.org/pkg/compress/flate/ need to be updated.

They state:

			// Read the message length.
			// This is guaranteed to return for every corresponding
			// Flush and Close on the transmitter side.

But that is not true for Flush on a writer without pending data. compress/flate will write the 4 byte sync marker but the read side will never return.

See

go/src/compress/flate/inflate.go

Line 336 in 4b43700

if len(f.toRead) > 0 {

The loop only returns if there are additional bytes in the buffer to be read.

nhooyr · 2019-12-31T01:17:41Z

In order to implement context takeover compression for WebSockets, I was able to get around the above issue by using panic in the underlying io.Reader after the sync marker is read and then recovering. The reader remains useable after this with the correct sliding window.

Janky but works for now.

nigeltao · 2020-01-20T00:01:33Z

I think it's @dsnet's call about what to do re @nhooyr's latest comments.

diamondburned · 2020-04-06T18:09:06Z

Funnily enough, the hack of ignoring io.ErrUnexpectedEOF would've worked very well if (compress/flate).Reader doesn't check the last error on later reads.

hhrutter · 2023-10-26T23:42:01Z

There are also PDF files out there with seemingly embedded partial flushed flate encoded streams.
Hard to tell the source for these because of various PDFWriters out there.
What's the current state of this issue?

bradfitz changed the title ~~zlib: implement partial flush~~ compress/flate: implement partial flush Apr 17, 2019

bradfitz added FeatureRequest help wanted NeedsFix The path to resolution is known, but the work has not been done. labels Apr 17, 2019

bradfitz added this to the Unplanned milestone Apr 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compress/flate: implement partial flush #31514

compress/flate: implement partial flush #31514

hanwen commented Apr 17, 2019

bradfitz commented Apr 17, 2019

hanwen commented Apr 17, 2019

dsnet commented Apr 17, 2019

dsnet commented Apr 17, 2019

hanwen commented Apr 18, 2019

dsnet commented Apr 18, 2019

hanwen commented Apr 18, 2019

dsnet commented Apr 19, 2019 •

edited

hanwen commented Apr 22, 2019

branelmoro commented Jul 19, 2019 •

edited

branelmoro commented Jul 19, 2019 •

edited

gopherbot commented Jul 22, 2019

gopherbot commented Oct 3, 2019

nhooyr commented Dec 30, 2019 •

edited

nhooyr commented Dec 31, 2019

nigeltao commented Jan 20, 2020

diamondburned commented Apr 6, 2020

hhrutter commented Oct 26, 2023

compress/flate: implement partial flush #31514

compress/flate: implement partial flush #31514

Comments

hanwen commented Apr 17, 2019

What version of Go are you using (go version)?

bradfitz commented Apr 17, 2019

hanwen commented Apr 17, 2019

dsnet commented Apr 17, 2019

dsnet commented Apr 17, 2019

hanwen commented Apr 18, 2019

dsnet commented Apr 18, 2019

hanwen commented Apr 18, 2019

dsnet commented Apr 19, 2019 • edited

hanwen commented Apr 22, 2019

branelmoro commented Jul 19, 2019 • edited

branelmoro commented Jul 19, 2019 • edited

gopherbot commented Jul 22, 2019

gopherbot commented Oct 3, 2019

nhooyr commented Dec 30, 2019 • edited

nhooyr commented Dec 31, 2019

nigeltao commented Jan 20, 2020

diamondburned commented Apr 6, 2020

hhrutter commented Oct 26, 2023

What version of Go are you using (`go version`)?

dsnet commented Apr 19, 2019 •

edited

branelmoro commented Jul 19, 2019 •

edited

branelmoro commented Jul 19, 2019 •

edited

nhooyr commented Dec 30, 2019 •

edited