Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compress/flate: implement partial flush #31514

Open
hanwen opened this issue Apr 17, 2019 · 18 comments
Open

compress/flate: implement partial flush #31514

hanwen opened this issue Apr 17, 2019 · 18 comments
Labels
FeatureRequest help wanted NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@hanwen
Copy link
Contributor

hanwen commented Apr 17, 2019

What version of Go are you using (go version)?

go-1.11.5

while trying to implement zlib compression for SSH (by internal Google request,
#31369), I stumbled on lack of support for partial flush.

partial flush is described more precisely here: https://www.bolet.org/~pornin/deflate-flush.html (look for Z_PARTIAL_FLUSH). While this is deprecated functionality, it has also made its way into the SSH RFC.

For the write side, it looks like I can't truncate the sync marker from a full flush to get the same effect.

For the read side, feeding packets from OpenSSH into zlib.Reader leads to errors ("unexpected EOF").

@bradfitz bradfitz changed the title zlib: implement partial flush compress/flate: implement partial flush Apr 17, 2019
@bradfitz
Copy link
Contributor

I assume this is in the flate package where you need this? At least initially?

/cc @nigeltao

@bradfitz bradfitz added FeatureRequest help wanted NeedsFix The path to resolution is known, but the work has not been done. labels Apr 17, 2019
@bradfitz bradfitz added this to the Unplanned milestone Apr 17, 2019
@hanwen
Copy link
Contributor Author

hanwen commented Apr 17, 2019

From what I could tell, the real work is in the flate package, but then the zlib package should also also expose it, because SSH produces and consumes the zlib version header.

@dsnet
Copy link
Member

dsnet commented Apr 17, 2019

What about the reader side of things? Wouldn't a partial flush on the writer only be useful if the reader had API to synchronize on these specialized flushing modes?

I should note that this is more difficult on the reader side for API design, since the reader lacks an exported concrete type.

@dsnet
Copy link
Member

dsnet commented Apr 17, 2019

Also, my reading of partial flush suggests that partial flush is a subset of sync flush, which the writer already supports.

@hanwen
Copy link
Contributor Author

hanwen commented Apr 18, 2019

I've seen that the bytes already get decompressed correctly (so it kind of works), but then the reader wants to read on and finds EOF. The reader should know to stop further reads when it encounters a partial flush.

For API: when the reader encounters partial flush, it could just return the bytes thus far read without error. In SSH then, the caller could use bytes.Buffer{} as input to the reader, and stop requesting further zlib bytes when the buffer is empty.

@dsnet
Copy link
Member

dsnet commented Apr 18, 2019

@hanwen, is there internal framing within the SSH protocol such that it can distinguish between a return from Read due to a partial flush, as a opposed a return due to other reasons (i.e., full buffer)?

@hanwen
Copy link
Contributor Author

hanwen commented Apr 18, 2019

SSH protocol has packets, that are up to 32kb in size. The entire packet payload is compressed, so one would read until the entire packet was processed.

I think there is could be chunking going on, so that you may need to do multiple reads to process a single packet.

@dsnet
Copy link
Member

dsnet commented Apr 19, 2019

The entire packet payload is compressed, so one would read until the entire packet was processed.

I'm not sure if that answered my question. More specifically: how does SSH handle framing for the packet? For the un-compressed data does it have a length-prefix or a delimiter marking the length or end of the packet? Or does it rely on a partial flush in DEFLATE to determine the end of a packet?

If the former, all that's required in compress/flate is to simply return from Read every time the block ends. If the later, we need additional API in compress/flate to synchronize on the different ways a DEFLATE block can end.

@hanwen
Copy link
Contributor Author

hanwen commented Apr 22, 2019

there is no size prefix or delimiter. On the read side, you just keep reading until there are no compressed bytes left to process. If the read API would be changed to be

// Read up to buf bytes, or up to the next partial or full flush encountered in the compressed input
// stream
Read(buf []byte)

then you could do

  // init
  buf := bytes.NewBuffer(compressed)
  reader := zlib.Reader(buf)
 ..


// handle packet:
  buf.Write(compressedPacket)
  var uncompressed []byte
  for buf.Len() > 0 {
      var out [10240]byte 
      n, err := reader.Read(out[:])
      // err handling
      uncompressed = append(uncompressed, out[:n]...)
  }

this might break current users that expect to be able to read a certain amount of data in a single Read call, though.

There is packet framing, but it is around the compressed payload: the compressed data is padded, encrypted, mac'd and prefixed with total length for the encrypted packet.

@branelmoro
Copy link

branelmoro commented Jul 19, 2019

@hanwen if we put limit on number of bytes read from underlying Reader and return error if maximum read limit is reached, then I think this will solve problem of reading till EOF.

We will need new interface for decompressor:

type Limiter interface {
    // Sets limit on number of bytes read from underlying Reader
    SetReadLimit(n int)
}

Also, we need to add new property to store limit on number of bytes read from underlying Reader.

type decompressor struct {
    ...
    l int
    ...
    ...
}

if l <= 0 (default 0), means that no read limit is set
if l > 0, means that maximum l number of bytes can be read from underlying Reader and it will return ReadLimitReached error if l number of bytes and already read and further read is done

Add f.l = -1 as default value in below two functions:

func NewReader(r io.Reader) io.ReadCloser {
    ...
    f.l = -1
    ...
    ...
}

func NewReaderDict(r io.Reader, dict []byte) io.ReadCloser {
    ...
    f.l = -1
    ...
    ...
}

We will need two more functions for decompressor:

func (f *decompressor) SetReadLimit(n int) {
    f.l = n
}

func (f *decompressor) readByte() (byte, error) {
    if f.l == 0 {
        return 0, errors.New("ReadLimitReached")
    }
    b, e := f.r.ReadByte()
    if e == nil && f.l != -1{
        f.l -= 1
    }
    return b, e
}

And finally, we need to replace f.r.ReadByte() with f.readByte()

Then in application, you can do

import "compress/flate"

decompressor := flate.NewReader(compressed_buffer)
size := 10 // or size := compressed_buffer.Len() or some known size of partial flush of compressed_buffer
decompressor.(flate.Limiter).SetReadLimit(size)
_, err = io.Copy(decompressed_buffer, decompressor)

Above code will return ReadLimitReached error after reading 10 bytes but it will not read till EOF
Now you can use decompressor further by setting limit again or by resetting underlying Reader

This will not break current users

@branelmoro
Copy link

branelmoro commented Jul 19, 2019

@hanwen I have created pull request for this - #33218
I hope this works...

@gopherbot
Copy link

Change https://golang.org/cl/187018 mentions this issue: Fixes #31514 - implement partial flush for decompressor

@gopherbot
Copy link

Change https://golang.org/cl/198757 mentions this issue: compress/flate : add a check to recognize partial flush while decompressing

@nhooyr
Copy link
Contributor

nhooyr commented Dec 30, 2019

The docs on the synchronization example at https://golang.org/pkg/compress/flate/ need to be updated.

They state:

			// Read the message length.
			// This is guaranteed to return for every corresponding
			// Flush and Close on the transmitter side.

But that is not true for Flush on a writer without pending data. compress/flate will write the 4 byte sync marker but the read side will never return.

See

if len(f.toRead) > 0 {

The loop only returns if there are additional bytes in the buffer to be read.

@nhooyr
Copy link
Contributor

nhooyr commented Dec 31, 2019

In order to implement context takeover compression for WebSockets, I was able to get around the above issue by using panic in the underlying io.Reader after the sync marker is read and then recovering. The reader remains useable after this with the correct sliding window.

Janky but works for now.

@nigeltao
Copy link
Contributor

I think it's @dsnet's call about what to do re @nhooyr's latest comments.

@diamondburned
Copy link

Funnily enough, the hack of ignoring io.ErrUnexpectedEOF would've worked very well if (compress/flate).Reader doesn't check the last error on later reads.

@hhrutter
Copy link

There are also PDF files out there with seemingly embedded partial flushed flate encoded streams.
Hard to tell the source for these because of various PDFWriters out there.
What's the current state of this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FeatureRequest help wanted NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

9 participants