Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: io: io.MultiReader should read until the buffer is full or there are no Readers left #57750

Closed
838239178 opened this issue Jan 12, 2023 · 8 comments

Comments

@838239178
Copy link

838239178 commented Jan 12, 2023

It's an example

func (mr *multiReader) Read(p []byte) (n int, err error) {
	bufSize := len(p)
	for len(mr.readers) > 0 {
		// Optimization to flatten nested multiReaders (Issue 13558).
		if len(mr.readers) == 1 {
			if r, ok := mr.readers[0].(*multiReader); ok {
				mr.readers = r.readers
				continue
			}
		}
		var nr int
		nr, err = mr.readers[0].Read(p[n:])
		n += nr
		if err == io.EOF {
			// Use eofReader instead of nil to avoid nil panic
			// after performing flatten (Issue 18232).
			mr.readers[0] = eofReader{} // permit earlier GC
			mr.readers = mr.readers[1:]
		}
		if n >= bufSize {
			if err == io.EOF && len(mr.readers) > 0 {
				// Don't return EOF yet. More readers remain.
				err = nil
			}
			return
		}
	}
	return n, io.EOF
}
@gopherbot gopherbot added this to the Proposal milestone Jan 12, 2023
@ianlancetaylor
Copy link
Contributor

The current code seems OK and implements the documented semantics for io.Reader. Why should it be changed?

@838239178
Copy link
Author

838239178 commented Jan 12, 2023

@ianlancetaylor Thank you for your reply. When using io.Copy or io.CopyBuffer, it is often desirable to fill the buffer as much as possible before writing (except for the last write), the reason being to align memory and disk (probably when using direct-io) for more efficient write performance.

When using direct-io, in order to align 4KB, it may be necessary to first read a portion of the file to form a multiple of 4KB with the newly added data. In this scenario, the two parts can form a MultiReader, and io.Copy does not care if the buffer is full when writing.

@seankhliao
Copy link
Member

sounds like a case where should wrap an io.MultiReader in a bufio.Reader?

@838239178
Copy link
Author

838239178 commented Jan 12, 2023

This is a part of bufio source code. @seankhliao

// Read reads data into p.
// It returns the number of bytes read into p.
// The bytes are taken from at most one Read on the underlying Reader,
// hence n may be less than len(p).
// To read exactly len(p) bytes, use io.ReadFull(b, p).
// If the underlying Reader can return a non-zero count with io.EOF,
// then this Read method can do so as well; see the [io.Reader] docs.
func (b *Reader) Read(p []byte) (n int, err error) {
	n = len(p)
	if n == 0 {
		if b.Buffered() > 0 {
			return 0, nil
		}
		return 0, b.readErr()
	}
	if b.r == b.w {
		if b.err != nil {
			return 0, b.readErr()
		}
		if len(p) >= len(b.buf) {
			// Large read, empty buffer.
			// Read directly into p to avoid copy.
			n, b.err = b.rd.Read(p)
			if n < 0 {
				panic(errNegativeRead)
			}
			if n > 0 {
				b.lastByte = int(p[n-1])
				b.lastRuneSize = -1
			}
			return n, b.readErr()
		}
		// One read.
		// Do not use b.fill, which will loop.
		b.r = 0
		b.w = 0
		n, b.err = b.rd.Read(b.buf)
		if n < 0 {
			panic(errNegativeRead)
		}
		if n == 0 {
			return 0, b.readErr()
		}
		b.w += n
	}

	// copy as much as we can
	// Note: if the slice panics here, it is probably because
	// the underlying reader returned a bad count. See issue 49795.
	n = copy(p, b.buf[b.r:b.w])
	b.r += n
	b.lastByte = int(b.buf[b.r-1])
	b.lastRuneSize = -1
	return n, nil
}

As I understand the code, bufio.Reader only reads from the underlying reader once at a time, which is not enough to ensure that the buffer is filled.

I think that io.MultiReader should make multi files as one file to the caller. If a file can be read again to fill the buffer, then it should also do it when it is divided into multiple parts and combined by the io.MultiReader.

@polomsky
Copy link

MultiReader can be used even for TCP streams and many more. If you call Read on anything it must block until it returns at least a single byte (or error).
If you need just this feature for disk operations, you need to make your own buffer. MultiReader is not good enough fix, in case you would use a single Reader (without MultiReader) you would have the same problems.

@838239178
Copy link
Author

@polomsky maybe you are right

@838239178 838239178 closed this as not planned Won't fix, can't repro, duplicate, stale Jan 19, 2023
@rsc
Copy link
Contributor

rsc commented Mar 29, 2023

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@rsc
Copy link
Contributor

rsc commented Apr 6, 2023

This proposal has been declined as infeasible.
— rsc for the proposal review group

@golang golang locked and limited conversation to collaborators Apr 5, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants