bytes: bytes.Reader returns EOF on zero-byte Read, which doesn't conform with io.Reader interface documentation #40385

metala · 2020-07-24T12:03:41Z

What version of Go are you using (`go version`)?

$ go version

go version go1.14.4 linux/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (`go env`)?

go env Output

$ go env
GOARCH="amd64"

GOHOSTARCH="amd64"

GOHOSTOS="linux"

What did you do?

I was deserialising binary data and I hit an unexpected io.EOF when reading zero bytes. Here is a minimal example that illustrates the problem.

package main

import (
	"fmt"
	"bytes"
	"encoding/binary"
)

func main() {
    r := bytes.NewReader([]byte{0,0,0,0})
    
    var length uint32                                                               
    err := binary.Read(r, binary.BigEndian, &length)                   
    if err != nil{
        panic(err)
    }   
    fmt.Printf("length = %d\n", length)
    
    rest := make([]byte, length)
    _, err = r.Read(rest)
    fmt.Printf("error = %s\n", err)
}

What did you expect to see?

length = 0
error = %!s(<nil>)

What did you see instead?

length = 0
error = EOF

On conformity with io.Reader and other standards

An excerpt from the documentation of io.Reader with the important parts emboldened:

type Reader interface {
   Read(p []byte) (n int, err error)
}

Implementations of Read are discouraged from returning a zero byte count
with a nil error, except when len(p) == 0. Callers should treat a return of
0 and nil as indicating that nothing happened; in particular it does not
indicate EOF.

An excerpt from the man page of read(3):

This volume of IEEE Std 1003.1-2001 requires that no action be taken for read() or write() when nbyte is zero. This is not intended to take precedence over detection of errors (such as invalid buffer pointers or file descriptors). This is consistent with the rest of this volume of IEEE Std 1003.1-2001, but the phrasing here could be misread to require detection of the zero case before any other errors. A value of zero is to be considered a correct value, for which the semantics are a no-op.

Related issues

It looks like this issue is in stark contrast to Go2 proposal issue #27531.
There is also an interesting discussion (#5310) about returning (0, nil) from a Read(p []byte).

The text was updated successfully, but these errors were encountered:

metala · 2020-07-24T12:58:26Z

@gopherbot please remove Documentation

ianlancetaylor · 2020-07-24T18:18:51Z

It seems to me that the behavior of bytes.Reader is permitted by the documentation of io.Reader. The data is at EOF, after all. I don't see anything in the documentation of io.Reader that prohibits returning 0, io.EOF in such a case.

You are asking for a special case in bytes.Reader.Read: if the caller asks for zero bytes, always return 0, nil. And if we want to make this the required behavior for io.Reader, you are asking for a special check for a read of zero bytes in many other readers as well. I think we would need a pretty convincing argument to require existing readers to change in that way.

metala · 2020-07-24T21:16:44Z

I am hoping for a discussion on whether this behaviour is expected or natural. Getting an EOF error on zero-byte read looks like a replacement for a missing EOF() method.

However, I agree that changing multiple Read methods is not a favourable option, because it could break a lot of stuff.
If you think that there couldn't be any discussion on the topic, we can close this issue and leave it for future reference. It is a case when the result (0, nil) makes sense.

Edit.
The reason I even posted the issue, is because it felt wrong that I need to change only the last if-statement, by removing the err != nil || check and leaving only the len(field) != fieldLen, at the last field in the deserialisation function.

ianlancetaylor · 2020-07-24T21:46:17Z

I think there can be discussion on the topic. My comment above was my attempt at discussion, by pointing out the issues that I found relevant. Sorry if I seem to be preventing discussion.

metala · 2020-07-24T22:19:51Z

Is there a reason why there is no EOF() bool method in Reader structs in the standard library?
I think it is why zero-length Read would return an io.EOF, because it's the only way to check if you are at the end, except for pos, _ := Seek(0, current); pos == Len() or Size(), which is probably not available for all Reader structs.

Add.
I will reluctantly agree that the documentation does not prohibit (0, io.EOF) in this case, but it was my take on "except when len(p) == 0", which made me think that it's just natural to have a (0, nil) on a zero-byte Read().

ianlancetaylor · 2020-07-24T23:11:28Z

On something like a Unix pipe, an EOF method can only be implemented by calling read, in which case we need to have somewhere to store the data and return it on a subsequent Read. Simpler to not require the Reader to handle that, and push the issue onto the caller.

metala · 2020-07-24T23:19:09Z

As far as I remember, read(fd, buf, len) = 0, when there is an EOF. The errno is not set and there is not even a EEOF errno.
But I see your point that performance-wise it's better to handle syscall-dependant streams this way.

metala · 2020-07-24T23:39:50Z

Read()-ing seems to be inconsistent in the standard library:

$ cat main.go 
package main

import (
	"os"
	"fmt"
)

func main() {
	buf := make([]byte, 0)
	n, err := os.Stdin.Read(buf)
	fmt.Printf("n=%d, err=%s\n", n, err)
}
$ go run main.go  < /dev/null
n=0, err=%!s(<nil>)

In the bytes.Reader the last position is interpreted as io.EOF, but os.Stdin.Read that makes the syscall read(0, buf, 0) = 0, is interpreted as nil error.

davecheney · 2020-07-24T23:46:07Z

The io.Reader interface is like no other in the Go ecosystem. Read is the only method where the caller must examine the other values returned from a function/method call before examining the error value.

metala · 2020-07-25T00:15:22Z

It is sad that I had to figure it out in runtime. When writing the unit tests, I was not expecting the last field to have zero length, but that's on me.

as · 2020-07-25T00:25:55Z

It seems incorrect to make the EOF condition directly a function of the input's length. Your example reads from a data source that contains no data. It reads 0 bytes from that data source and triggers the end of file condition because it knows that the next read will also return io.EOF.

To me it does not make sense for the first read to be successful on a data source that contains nothing. The same Read would not return io.UnexpectedEOF if your slice was 100 bytes. It would still return io.EOF. The length of the actual input slice provided to the reader has no effect on the returned error.

metala · 2020-07-25T00:59:20Z

It seems incorrect to make the EOF condition directly a function of the input's length. Your example reads from a data source that contains no data. It reads 0 bytes from that data source and triggers the end of file condition because it knows that the next read will also return io.EOF.

To me it does not make sense for the first read to be successful on a data source that contains nothing. The same Read would not return io.UnexpectedEOF if your slice was 100 bytes. It would still return io.EOF. The length of the actual input slice provided to the reader has no effect on the returned error.

I appreciate the way the discussion goes. Let's say we want the Reader to report io.EOF as soon as it is aware of it and let the caller handle it. But then we also care about consistency.
In Go a byte slice source in a bytes.Reader would trigger io.EOF on the last position. But reading from an empty Linux file or stdin will return (0, nil).

And then there is this:

$ cat main3.go 
package main
import "fmt"
func main() {
	data := []byte{0}
	s := data[1:1]
	fmt.Printf("%#v\n", s)
}

$ go run main3.go 
[]byte{}

A zero-length slice that is out-of-bound by a byte is just an empty slice, which I find to be similar to reading zero bytes from the end of a bytes buffer. It's kind of the no-op I expect, when making zero-length reads...

metala · 2020-07-30T14:27:32Z

I have decided to do some tests:

Case / Setup	Operation(s)	Linux 5.4.0 amd64
r := os.Stdin	n, err := r.Read(make([]byte, 0)) fmt.Printf("n=%d, err=%s\n", n, err)	n=0, err=%!s(<nil>)
// Empty file f, _ := os.Open("./empty"	n, err := r.Read(make([]byte, 0)) fmt.Printf("n=%d, err=%s\n", n, err)	n=0, err=%!s(<nil>)
// Empty file, trigger EOF first f, _ := os.Open("./empty")	n, err := f.Read(make([]byte, 1)) fmt.Printf("n=%d, err=%s\n", n, err) n, err = f.Read(make([]byte, 0)) fmt.Printf("n=%d, err=%s\n", n, err)	n=0, err=EOF n=0, err=%!s(<nil>)
// bytes.Buffer r := bytes.NewBuffer([]byte{})	n, err := r.Read(make([]byte, 0)) fmt.Printf("n=%d, err=%s\n", n, err)	n=0, err=%!s(<nil>)
// bytes.Buffer, trigger EOF first r := bytes.NewBuffer([]byte{})	n, err := r.Read(make([]byte, 1)) fmt.Printf("n=%d, err=%s\n", n, err) n, err = r.Read(make([]byte, 0)) fmt.Printf("n=%d, err=%s\n", n, err)	n=0, err=EOF n=0, err=%!s(<nil>)
// bytes.Reader r := bytes.NewReader([]byte{})	n, err := r.Read(make([]byte, 0)) fmt.Printf("n=%d, err=%s\n", n, err)	n=0, err=EOF
// byte slice data := []byte{0x00}	slice := data[1:1] fmt.Printf("%#v\n", slice)	[]byte{}

Contrary to my belief, there were no discrepancies between OSes.
However, I found out that bytes.Reader behaviour is different compared to all other cases.

ianlancetaylor · 2020-08-06T19:12:53Z

There are many different kinds of readers. The io.Reader type defines a contract for all different kinds of readers.

Currently that contract permits returning io.EOF on a read of zero bytes. This is not discussed explicitly, but nothing prohibits a reader from doing that.

The options I see here are:

Change io.Reader to prohibit returning io.EOF for a read of zero bytes. Require it to always return 0, nil in such a case. This would break an unknown number of existing readers, including bytes.Reader. We would have to identify and fix all broken readers. In the standard library this would be straightforward, but of course any type defined by any package can implement a Read method.
Explicitly document that on a read of zero bytes a Read method is permitted, but not required, to return 0, io.EOF if the input stream is at the end of the file.
Do nothing.

Does anybody see any other options? Thanks.

davecheney · 2020-08-06T23:54:31Z

I vote for 2. /cc @minux who spent a lot of time arguing for this a few years back.

metala · 2020-08-18T09:20:08Z

It would be nice to have a warning in the bytes.Reader documentation that it returns io.EOF on zero-byte Read(), unlike bytes.Buffer. This way people can pick which structure to use as a io.Reader depending on their needs.

davecheney · 2020-08-18T09:59:57Z

Why would this need a warning? It seems like the correct behaviour.

metala · 2020-08-18T12:45:56Z

For me at least, bytes.Buffer behaves correctly and bytes.Reader does not. I have switched from bytes.Reader to bytes.Buffer to avoid running into bugs. When I wrote the code, I was expecting it behaved like bytes.Buffer.

hrissan · 2020-10-01T17:45:24Z

I have a code which reads string in format #bytes, [bytes]

So, I read #bytes, create byte array of that size, then read into it.

Now, when string of length 0 appears in the middle of reader, it will read successfully, but when the string of length 0 is at the end of reader, it will not.

IMHO this is an indicator of a problem in design, reading of 0 bytes is not special and should always succeed independent of read position.

ianlancetaylor · 2020-10-01T17:57:39Z

@hrissan As far as I can see the choices are as listed at #40385 (comment). What do you recommend?

hrissan · 2020-10-02T12:10:30Z

@ianlancetaylor Format some_encoding(#bytes), [bytes], is common, and any well-tested parser (and all fuzz-tested parsers) definitely already have "if #bytes != 0" protection around reading [bytes] part, so they already behave as if approach 2) was implemented.

Any code which has no "if #bytes != 0" protection around reading of potentially zero bytes will behave differently at the middle and at the end of the bytes.Reader and all other readers which have this bug. It seems to me, most if not all code without such protection is already incorrect due to this.

So approach 1) will break already incorrect code, which might be actually good for that code.

BTW all similar code which uses readers which return error on every attempt to read 0 bytes, independent of stream position, already has "if #bytes != 0" protection and will not break from approach 1).

ianlancetaylor · 2020-10-02T19:31:16Z

Right now, today, the behavior of bytes.Reader is permitted according to the docs. If we change the docs as suggested in option 1 of #40385 (comment), then bytes.Reader will be broken. This is not a matter of code that checks #bytes != 0. This is a matter of bytes.Reader itself. So I don't agree that approach 1 will break already incorrect code. It will break already correct code. If that code is not changed to conform to the new requirements, then it will break future code that assumes that the documented requirements are implemented by existing readers.

metala · 2020-10-05T23:22:21Z

I wasn't expecting another user to have exactly the same issue like mine, so soon. I will invoke the timeless mantra WE DO NOT BREAK USERSPACE! and say that adding a warning to bytes.Reader should be enough. This way we can defer any discussions on whether io.Reader it should be able to return (0, nil) or not.

davecheney · 2020-10-05T23:32:34Z

@metala Ian explained that bytes.Reader is not broken as described by the io.Reader contract. What warning do you think should be added?

metala · 2020-10-05T23:42:05Z

@davecheney Yes, indeed. I am not arguing that bytes.Reader is broken according to io.Reader contract, but it feels inconsistent.

About the warning / notice, I am thinking of something like:
The bytes.Reader returns io.EOF as soon as it reaches the end of the byte slice, after which a read of zero bytes will yield an error. This may pose an issue in cases where you are deserialising data and the last field is length-prefixed with length equals zero.

davecheney · 2020-10-05T23:50:47Z

@metala I don't understand how bytes.Reader returning n = 0 could be mistaken for a n = 1 where the buffer contains []byte{ 0 }. Could you perhaps provide a code sample that illustrates the problem?

metala · 2020-10-06T00:49:28Z

@davecheney I am not sure what you want, but lets take those two cases:

// Deserialisation using bytes.Reader
https://play.golang.org/p/3qRu_LlDa6h

// Deserialisation using bytes.Buffer... the same code, single line changed.
https://play.golang.org/p/0Lr0_9rFi1O

The deserialisation of fields follow the structure, read field length, if necessary, then read content. If there is an error or the length is different, return an error.
The first example fails when the last fields is length-prefixed with zero length and the second example succeeds just because we are using bytes.Buffer, instead of bytes.Reader.

davecheney · 2020-10-06T03:28:49Z

There is an error in your code

	n, err := r.Read(header)
	if err != nil || n != int(headerLen)  {
		return fmt.Errorf("failed to read header: %w", err)
	}

The io.Reader contract states the caller must process n before inspecting the error value.

https://godoc.org/io#ReadFull might be a better choice for your application.

metala · 2020-10-06T07:26:44Z

You are probably referring to this paragraph, which is either ambiguous or it doesn't cover the case n = 0.

Callers should always process the n > 0 bytes returned before considering the error err. Doing so correctly handles I/O errors that happen after reading some bytes and also both of the allowed EOF behaviors.

Since the input is a byte slice, and not a stream, it doesn't make much sense to waste CPU cycles on io.ReadFull(), if bytes.Reader.Read() would do the same job.
However, it seems that io.ReadFull actually fixes the issue.

davecheney · 2020-10-06T09:16:39Z

Since the input is a byte slice, and not a stream, it doesn't make much sense to waste CPU cycles on io.ReadFull(), if bytes.Reader.Read() would do the same job.
However, it seems that io.ReadFull actually fixes the issue.

This argument is specious; if you're worried about the overhead of io.ReadFull when you have a []byte slice then you're probably also worried about the overhead of an interface call over just scrobbling in the []byte slice directly.

metala · 2020-10-06T09:25:48Z

You are right, but I did not think about that when I wrote the code. However, it's a bit cleaner to use a reader instead of incrementing and passing indices.

dashjay · 2023-05-25T05:29:32Z

In fact, I don't care why read on empty slice can be returned immediately, it has hardly no affect for me and most developers.
But I found that, for two kind of reader: gzip.Reader and bytes.Reader, they have differently behavior on reading the last bytes.

For gzip.Reader, it return 1 and an EOF for last byte read, but for bytes.Reader, it return 1 and nil(error). I write this codes: https://go.dev/play/p/En4DOWYnJXO

a function like this

func compareReader(r, b io.Reader) error {
	var bufa [1]byte
	var bufb [1]byte
	for {

		na, erra := r.Read(bufa[:])
		nb, errb := b.Read(bufb[:])

		if erra == nil && errb == nil && na == nb && bufa[0] == bufb[0] {
			continue
		}
		if erra == errb && erra == io.EOF {
			return nil
		}
		if erra != nil {
			if erra == io.EOF && errb != io.EOF {
				return fmt.Errorf("reader b has more data than a")
			}
			return fmt.Errorf("read on a error: %s", erra)
		}
		if errb != nil {
			if errb == io.EOF && erra != io.EOF {
				return fmt.Errorf("reader a has more data than b")
			}
			return fmt.Errorf("read on b error: %s", erra)
		}
		return nil
	}
}

I don't care about any thing like POSIX , linux man page, or any other, I just think that all std lib should be treated according to uniform standards.

And because the caller only know that this is an io.Reader, he doesn't care the underlying implement of this reader.

ianlancetaylor · 2023-05-25T18:28:35Z

@dashjay There is no expectation that all Readers behave the same way, even all Readers in the standard library. Different Readers are free to employ different buffering and error handling strategies.

gopherbot · 2023-05-25T18:33:05Z

Change https://go.dev/cl/498355 mentions this issue: io: clarify that Read(nil) can return 0, EOF

Fixes golang#40385 Change-Id: I965b5db985fd4418a992e883073cbc8309b2cb88 Reviewed-on: https://go-review.googlesource.com/c/go/+/498355 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Rob Pike <r@golang.org> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Ian Lance Taylor <iant@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Auto-Submit: Ian Lance Taylor <iant@google.com>

gopherbot added the Documentation label Jul 24, 2020

gopherbot removed the Documentation label Jul 24, 2020

cagedmantis added this to the Backlog milestone Jul 27, 2020

cagedmantis added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jul 27, 2020

thierolm mentioned this issue Nov 14, 2021

TPLink linter error evcc-io/evcc#1878

Closed

dmitshur added Documentation NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels May 26, 2023

dmitshur modified the milestones: Backlog, Go1.21 May 26, 2023

gopherbot closed this as completed in 1ff8900 May 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bytes: bytes.Reader returns EOF on zero-byte Read, which doesn't conform with io.Reader interface documentation #40385

bytes: bytes.Reader returns EOF on zero-byte Read, which doesn't conform with io.Reader interface documentation #40385

metala commented Jul 24, 2020 •

edited

metala commented Jul 24, 2020

ianlancetaylor commented Jul 24, 2020

metala commented Jul 24, 2020 •

edited

ianlancetaylor commented Jul 24, 2020

metala commented Jul 24, 2020 •

edited

ianlancetaylor commented Jul 24, 2020

metala commented Jul 24, 2020 •

edited

metala commented Jul 24, 2020 •

edited

davecheney commented Jul 24, 2020 •

edited

metala commented Jul 25, 2020 •

edited

as commented Jul 25, 2020

metala commented Jul 25, 2020 •

edited

metala commented Jul 30, 2020

ianlancetaylor commented Aug 6, 2020

davecheney commented Aug 6, 2020

metala commented Aug 18, 2020

davecheney commented Aug 18, 2020 •

edited

metala commented Aug 18, 2020 •

edited

hrissan commented Oct 1, 2020

ianlancetaylor commented Oct 1, 2020

hrissan commented Oct 2, 2020

ianlancetaylor commented Oct 2, 2020

metala commented Oct 5, 2020

davecheney commented Oct 5, 2020

metala commented Oct 5, 2020 •

edited

davecheney commented Oct 5, 2020

metala commented Oct 6, 2020

davecheney commented Oct 6, 2020 •

edited

metala commented Oct 6, 2020 •

edited

davecheney commented Oct 6, 2020

metala commented Oct 6, 2020 •

edited

dashjay commented May 25, 2023 •

edited

ianlancetaylor commented May 25, 2023

gopherbot commented May 25, 2023

bytes: bytes.Reader returns EOF on zero-byte Read, which doesn't conform with io.Reader interface documentation #40385

bytes: bytes.Reader returns EOF on zero-byte Read, which doesn't conform with io.Reader interface documentation #40385

Comments

metala commented Jul 24, 2020 • edited

What version of Go are you using (go version)?

Does this issue reproduce with the latest release?

What operating system and processor architecture are you using (go env)?

What did you do?

What did you expect to see?

What did you see instead?

On conformity with io.Reader and other standards

Related issues

metala commented Jul 24, 2020

ianlancetaylor commented Jul 24, 2020

metala commented Jul 24, 2020 • edited

ianlancetaylor commented Jul 24, 2020

metala commented Jul 24, 2020 • edited

ianlancetaylor commented Jul 24, 2020

metala commented Jul 24, 2020 • edited

metala commented Jul 24, 2020 • edited

davecheney commented Jul 24, 2020 • edited

metala commented Jul 25, 2020 • edited

as commented Jul 25, 2020

metala commented Jul 25, 2020 • edited

metala commented Jul 30, 2020

ianlancetaylor commented Aug 6, 2020

davecheney commented Aug 6, 2020

metala commented Aug 18, 2020

davecheney commented Aug 18, 2020 • edited

metala commented Aug 18, 2020 • edited

hrissan commented Oct 1, 2020

ianlancetaylor commented Oct 1, 2020

hrissan commented Oct 2, 2020

ianlancetaylor commented Oct 2, 2020

metala commented Oct 5, 2020

davecheney commented Oct 5, 2020

metala commented Oct 5, 2020 • edited

davecheney commented Oct 5, 2020

metala commented Oct 6, 2020

davecheney commented Oct 6, 2020 • edited

metala commented Oct 6, 2020 • edited

davecheney commented Oct 6, 2020

metala commented Oct 6, 2020 • edited

dashjay commented May 25, 2023 • edited

ianlancetaylor commented May 25, 2023

gopherbot commented May 25, 2023

metala commented Jul 24, 2020 •

edited

What version of Go are you using (`go version`)?

What operating system and processor architecture are you using (`go env`)?

metala commented Jul 24, 2020 •

edited

metala commented Jul 24, 2020 •

edited

metala commented Jul 24, 2020 •

edited

metala commented Jul 24, 2020 •

edited

davecheney commented Jul 24, 2020 •

edited

metala commented Jul 25, 2020 •

edited

metala commented Jul 25, 2020 •

edited

davecheney commented Aug 18, 2020 •

edited

metala commented Aug 18, 2020 •

edited

metala commented Oct 5, 2020 •

edited

davecheney commented Oct 6, 2020 •

edited

metala commented Oct 6, 2020 •

edited

metala commented Oct 6, 2020 •

edited

dashjay commented May 25, 2023 •

edited