Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: io: add NewReaderAt #58679

Closed
sding3 opened this issue Feb 24, 2023 · 13 comments
Closed

proposal: io: add NewReaderAt #58679

sding3 opened this issue Feb 24, 2023 · 13 comments

Comments

@sding3
Copy link
Contributor

sding3 commented Feb 24, 2023

io.ReadSeeker can be adapted to io.ReaderAt. Once you have io.ReaderAt, you can do useful things with it like constructing io.SectionReader.

I propose adding such an adapter to the io package, for example:

// NewReaderAt returns a ReaderAt that uses the provided ReadSeeker as the
// underlying data source.
//
// The ReaderAt performs reading at the specified offset by getting the current
// offset, seeking, performing the read, and restoring the offset. The attempt to
// restore the offset occurs if the read has advanced the offset. Any errors
// encountered will be returned in a form testable via errors.Is and errors.As.
//
// If the n = len(p) bytes returned by ReadAt are at the end of the input
// source r, the ReadAt returns EOF only if r returns EOF.
func NewReaderAt(r io.ReadSeeker) io.ReaderAt {
	return &seekReaderAt{
		data: r,
	}
}

type seekReaderAt struct {
	l    sync.Mutex
	data io.ReadSeeker
}

func (r *seekReaderAt) ReadAt(p []byte, off int64) (int, error) {
	r.l.Lock()
	defer r.l.Unlock()

	// get current offset

	// seek to specified offset

	// perform read

	// restore offset back
}

Update: restore offset after read
Update: add more detail to explain the returned errors
Update: tweaked the godoc a little to increase clarity

@gopherbot gopherbot added this to the Proposal milestone Feb 24, 2023
@DeedleFake
Copy link

The docs for io.ReaderAt state

Clients of ReadAt can execute parallel ReadAt calls on the same input source.

I don't think that there's any way to preserve that property using an implementation that's calling Seek() behind the scenes. If a user wants to do that manually for something they know is safe, that's one thing, but I don't know if a general adapter that is essentially guaranteed to violate the contract of the thing it's adapting to is a good idea.

@rittneje
Copy link

io.ReaderAt also states:

If ReadAt is reading from an input source with a seek offset, ReadAt should not affect nor be affected by the underlying seek offset.

That constraint is clearly violated.

@DeedleFake
Copy link

I think that one's slightly less clear, actually, since the returned io.ReaderAt does not technically have a Seek() method. That constraint reads more to me like it's intended to deal with the case of a type that has both available publicly, such as os.File.

@sding3
Copy link
Contributor Author

sding3 commented Feb 24, 2023

Clients of ReadAt can execute parallel ReadAt calls on the same input source.

Since we have the lock on it, and ReadAt takes the offset as an input, I think we're okay on this one?

If ReadAt is reading from an input source with a seek offset, ReadAt should not affect nor be affected by the underlying seek offset.

Would this requirement be met if the original offset is put back? It could error, but we can bubble the error up.

@sding3
Copy link
Contributor Author

sding3 commented Feb 24, 2023

It appears that os.File.ReadAt calls pread. And looks like GNU pread perform a seek-read-reset maneuver.

@rittneje
Copy link

It appears that os.File.ReadAt calls pread. And looks like GNU pread perform a seek-read-reset maneuver.

That is libc's pread, which is distinct from the pread syscall. See https://man7.org/linux/man-pages/man2/pread.2.html

Would this requirement be met if the original offset is put back? It could error, but we can bubble the error up.

Technically no. Even though your ReadAt implementation has a lock to prevent concurrent use, nothing stops another goroutine from calling Seek on the underlying io.ReadSeeker concurrently. This includes indirect calls, like what happens if you call Read on the underlying io.ReadSeeker.

@sding3
Copy link
Contributor Author

sding3 commented Feb 24, 2023

nothing stops another goroutine from calling Seek on the underlying io.ReadSeeker concurrently

Well, I'm not too concerned about this case, because it would be the caller's responsibility to ensure the appropriate concurrency control. The similar sort of expectations exists in say io.MultiReader where the caller could be having multiple goroutines calling Read on io.Readers that were passed into the io.MultiReader and Read on the io.MultiReader concurrently.

@rsc
Copy link
Contributor

rsc commented Mar 15, 2023

I don't believe this is a great idea. This implementation is suboptimal at best, in that it does not allow concurrent reads. If we make it io.NewReaderAt, people will use it without realizing they are getting a subpar implementation.

@rsc
Copy link
Contributor

rsc commented Mar 29, 2023

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@cpuguy83
Copy link

If it is up to the caller to not call read or seek on the underlying ReadSeeker then the implementation probably doesn't need to reset the position either and call this out in the docs like it does for io.MultiReader

How many people would really need this implementation and is it worth it to have it upstreamed to go vs having people implement this as needed (eschewing concurrency and/or seeking concerns).

@sding3
Copy link
Contributor Author

sding3 commented Apr 4, 2023

This does allow for concurrent ReadAt calls, although all the competing seek-read-seek calls do effectively get serialized due to the mutex lock. The godoc on the function will be making it explicit that the implementation does seek-read-seek and the users can decide whether that's acceptable or not for performance. This is an adapter type intended to enable using io.ReadSeeker types at APIs where an io.ReaderAt is the input. So I think it's fine, and expected, that it will be sub-optimal when compared to true io.ReaderAt implementations. We could do a type assertion to see if the input io.ReadSeeker already satisfies io.ReaderAt and return that if so - this will be convenient to users of this API who might be working different io.ReadSeeker implementations where a subset of those also happen to satisfy io.ReaderAt.

@rsc
Copy link
Contributor

rsc commented Apr 6, 2023

Based on the discussion above, this proposal seems like a likely decline.
— rsc for the proposal review group

@rsc
Copy link
Contributor

rsc commented Apr 12, 2023

No change in consensus, so declined.
— rsc for the proposal review group

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants