Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

io: add OffsetWriter, NewOffsetWriter #45899

Closed
nilsocket opened this issue May 1, 2021 · 24 comments
Closed

io: add OffsetWriter, NewOffsetWriter #45899

nilsocket opened this issue May 1, 2021 · 24 comments

Comments

@nilsocket
Copy link
Contributor

CopyAt(dst WriterAt, src Reader, off int64) (written int64, err error)
CopyAtBuffer(dst WriterAt, src Reader, off int64, buf []byte) written int64, err error)

CopyFileRange(dst WriterAt, woff int64, src ReaderAt, roff int64) (written int64, err error)

or

func NewSectionWriter(r WriterAt, off int64, n int64) *SectionWriter
func (s *SectionWriter) Write(p []byte) (n int, err error)
func (s *SectionWriter) WriteAt(p []byte, off int64) (n int, err error)
func (s *SectionWriter) Seek(offset int64, whence int) (int64, error)
func (s *SectionWriter) Size() int64

Helpful for concurrent writes.

@gopherbot gopherbot added this to the Proposal milestone May 1, 2021
@nilsocket nilsocket changed the title proposal: io: Provide concurrent helper functions io: Provide concurrent helper functions May 1, 2021
@seankhliao seankhliao changed the title io: Provide concurrent helper functions proposal: io: Provide concurrent helper functions May 1, 2021
@seankhliao
Copy link
Member

Where would this be used? does it come up often enough to be in the stdlib?

@seankhliao seankhliao added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label May 1, 2021
@nilsocket
Copy link
Contributor Author

@seankhliao

  • When downloading file concurrently using http ranges.
  • Merging multiple files into one.
  • Any use case where multiple go routines try to write concurrently at specific locations of a file.

Inconvenience with current implementations:

  • WriteAt() needs to be wrapped in for loop with temporary buffer and need to keep track to check if everything is written.
  • WriteAt() with buffer length equal to data being read, uses too much of memory.
  • Seek(), using seek with multiple go routines isn't a valid option.

Better Performance for concurrent writes.

  • Zero Copy functions like ReadFrom(), and WriteTo() can be used for concurrent writes, which is not possible now.

or another option would be to povide ReadFromAt() and WriteToAt()

In-case of unix copyfilerange is a system call, which exactly does this.

Thank you.

@ianlancetaylor ianlancetaylor added this to Incoming in Proposals (old) May 2, 2021
@ianlancetaylor ianlancetaylor removed the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label May 2, 2021
@ianlancetaylor
Copy link
Contributor

Note that we use the copy_file_range system call with io.Copy or more generally with os.File.ReadFrom. But it's true that we currently always pass the offsets arguments as nil.

In CopyAt it's not immediately clear whether the off argument applies to the reader or the writer. I can infer that it's the writer because the type is WriterAt. But in general I think if we pass an offset for one we should pass an offset for both, which just gives us the CopyFileRange function.

I think in general having both io.Copy and io.CopyBuffer was a mistake, and I wouldn't want to repeat that mistake.

We already have io.WriterAt with a WriteAt method, and os.File already has a WriteAt method. And you can already call that method concurrently from multiple goroutines. So I don't have a clear handle on what new functionality we get from these new functions.

@nilsocket
Copy link
Contributor Author

We already have WriteAt

Yes, But some book-keeping is needed. WriteAt() is similar to write().

I don't have a clear handle on what new functionality we get from these new functions.

Like io.Copy() for Write(), io.CopyAt() for WriteAt()

// CopyBufferAt , copies `src` to `dst` at `off` using `buf`
//
// copied from io.copyBuffer
func CopyBufferAt(dst io.WriterAt, src io.Reader, off int64, buf []byte) (written int64, err error) {
	for {
		nr, er := src.Read(buf)
		if nr > 0 {
			nw, ew := dst.WriteAt(buf[0:nr], off+written)
			if nw < 0 || nr < nw {
				nw = 0
				if ew == nil {
					ew = errors.New("invalid write result")
				}
			}
			written += int64(nw)
			if ew != nil {
				err = ew
				break
			}
			if nr != nw {
				err = io.ErrShortWrite
				break
			}
		}
		if er != nil {
			if er != io.EOF {
				err = er
			}
			break
		}
	}
	return written, err
}

As for my limited understanding goes,
Currently it's not possible to do zero copy operations at specific locations.
If this feature is added, it would be good.

Thank you.

@rsc
Copy link
Contributor

rsc commented May 5, 2021

Adding more variants of Copy seems like a mistake, as others have noted.
Adding SectionWriter to match SectionReader seems plausible.
Does anyone object to adding SectionWriter?

@rsc
Copy link
Contributor

rsc commented May 5, 2021

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@rsc rsc moved this from Incoming to Active in Proposals (old) May 5, 2021
@ianlancetaylor
Copy link
Contributor

Does a call to NewSectionWriter(w, off, n) imply that the size of w will always become at least off + n? Or does it depend on subsequent writes? Or is it an error if the size is not already at least off + n?

@nilsocket
Copy link
Contributor Author

nilsocket commented May 6, 2021

@ianlancetaylor

I don't think n is necessary here.

I just copied definitions from SectionReader and changed it to SectionWriter.

Edit:

Taking cue from NewSectionReader():

  • For NewSectionReader(), off represents the starting point and n ending point, on trying to read further than n results in io.EOF. Thus representing a section from off to n.
  • For NewSectionWriter(), off represents the starting point and n ending point, on trying to write further than n results in error x. Thus representing a section from off to n.

If n is to stay, then what error should NewSectionWriter() return, should a new error be defined?

If n is removed from NewSectionWriter(), and allows writes to go as further as user needs it.
Should it still be called SectionWriter, as it doesn't represent section anymore.

I think n should stay, and a new error be defined.

@ianlancetaylor
Copy link
Contributor

If n should stay, then I think you need to answer my earlier questions:

Does a call to NewSectionWriter(w, off, n) imply that the size of w will always become at least off + n? Or does it depend on subsequent writes? Or is it an error if the size is not already at least off + n?

Thanks.

@nilsocket
Copy link
Contributor Author

It depends on subsequent writes.

@ghost
Copy link

ghost commented May 7, 2021

It depends on subsequent writes.

How would that be helpful for concurrent writes, then?

@nilsocket
Copy link
Contributor Author

@opennota

Isn't this going to work?

func main() {
	var length, ps, i int64 = 1000000, 100000, 0
	f, _ := os.Create("file")

	for ; i < length; i += ps {

		go func(from, to int64) {
			req, _ := http.NewRequest(http.MethodGet, "https://someurl.com/file", nil)

			setRange(req, from, to)
			resp, _ := http.DefaultClient.Do(req)

			nw := io.NewSectionWriter(f, from, to)
			io.Copy(nw, resp.Body)
			resp.Body.Close()
		}(i, i+ps)
	}
}

func setRange(req *http.Request, start, end int64) {
	req.Header.Set("Range", fmt.Sprintf("bytes=%d-%d", start, end-1))
}

@rsc
Copy link
Contributor

rsc commented May 12, 2021

It's true, you don't seem to need n (the section length). But that makes it not really a SectionWriter but more like an OffsetWriter. That's fine of course.

@rsc rsc changed the title proposal: io: Provide concurrent helper functions proposal: io: add OffsetWriter, NewOffsetWriter May 12, 2021
@rsc
Copy link
Contributor

rsc commented May 12, 2021

OK so it sounds like the API is:

// An OffsetWriter maps writes at offset o to offset o+off in the underlying writer.
type OffsetWriter struct { }

func NewOffsetWriter(r WriterAt, off int64) *OffsetWriter
func (s *OffsetWriter) Write(p []byte) (n int, err error)
func (s *OffsetWriter) WriteAt(p []byte, off int64) (n int, err error)
func (s *OffsetWriter) Seek(offset int64, whence int) (int64, error)

Do I have that right? Does anyone object to this?

@nilsocket
Copy link
Contributor Author

Does anyone object to this?

Personally no.

@rsc
Copy link
Contributor

rsc commented May 19, 2021

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

@rsc rsc moved this from Active to Likely Accept in Proposals (old) May 19, 2021
@kortschak
Copy link
Contributor

Just clairfying, the offsets in these methods end up being on top of the original offset used to create the *OffsetWriter, correct?

func (s *OffsetWriter) WriteAt(p []byte, off int64) (n int, err error)
func (s *OffsetWriter) Seek(offset int64, whence int) (int64, error)

@nilsocket
Copy link
Contributor Author

nilsocket commented May 19, 2021

@kortschak

on top of the original offset?

I'm not sure what do mean by on top of but if you mean relatively, yes.

NewOffsetWriter(x, 10)

WriteAt(y, 10) // y is written at offset 20 of x.

@kortschak
Copy link
Contributor

but if you mean relatively, yes.

Yes. Thanks.

@rsc rsc moved this from Likely Accept to Accepted in Proposals (old) May 26, 2021
@rsc
Copy link
Contributor

rsc commented May 26, 2021

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

@rsc rsc changed the title proposal: io: add OffsetWriter, NewOffsetWriter io: add OffsetWriter, NewOffsetWriter May 26, 2021
@rsc rsc modified the milestones: Proposal, Backlog May 26, 2021
@gopherbot
Copy link

Change https://go.dev/cl/406776 mentions this issue: io: add OffsetWriter, NewOffsetWriter

@nightlyone
Copy link
Contributor

Sadly I didn't see this proposal while it was still in the comments period. This API is supposed to help with concurrent writes in the multi section download. Since the upper bound of the file write is not limited anymore this looks very dangerous to use.

I fear it will lead to concurrent writes to overlapping file sections, probably resulting in file corruptions.

@rsc / @ianlancetaylor could you please reconsider this API addition before it gets implemented or do I overlook something? The original API with sections seemed much safer to me.

@ianlancetaylor
Copy link
Contributor

@nightlyone Perhaps we should separate OffsetWriter from LimitedWriter.

@nilsocket
Copy link
Contributor Author

Since the upper bound of the file write is not limited anymore this looks very dangerous to use.
I fear it will lead to concurrent writes to overlapping file sections, probably resulting in file corruptions.

But, it is similar to other api's like WriteAt(), user can make a mistake anywhere.
I think it is the responsibility of user to be careful.

Another possible solution, is to wrap your source (reader) with LimitReader.

nw := io.NewOffsetWriter(f, from)
io.Copy(nw, io.LimitReader(resp.Body, n))

@golang golang locked and limited conversation to collaborators Aug 19, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
No open projects
Development

No branches or pull requests

7 participants