Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: encoding/csv: add Buffered #51539

Closed
veezhang opened this issue Mar 8, 2022 · 13 comments
Closed

proposal: encoding/csv: add Buffered #51539

veezhang opened this issue Mar 8, 2022 · 13 comments
Labels
FrozenDueToAge Proposal WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Milestone

Comments

@veezhang
Copy link
Contributor

veezhang commented Mar 8, 2022

When calculating the progress of reading, since *bufio.Reader have buffer, it cannot be calculated correctly.

package csv

// Buffered returns the number of bytes that can be read from the current buffer of Reader.
func (r *Reader) Buffered() int {
	return r.r.Buffered()
}
@gopherbot gopherbot added this to the Proposal milestone Mar 8, 2022
@mengzhuo
Copy link
Contributor

mengzhuo commented Mar 8, 2022

Please follow the instruction/template of proposal.
https://github.com/golang/proposal#readme

@mengzhuo mengzhuo added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Mar 8, 2022
@ianlancetaylor
Copy link
Contributor

Is FieldPos not sufficient (see https://pkg.go.dev/encoding/csv#Reader.FieldPos)? If not, what do you want instead?

I note that encoding/json.Decoder and encoding/xml.Decoder both have a InputOffset method.

@veezhang
Copy link
Contributor Author

veezhang commented Mar 8, 2022

@ianlancetaylor
If use FieldPos, it's need to count the number of rows first, which will affect performance.

Here is an examples

package main

import (
	"bufio"
	"encoding/csv"
	"fmt"
	"io"
	"os"
	"reflect"
	"unsafe"
)

type ProgressReader struct {
	r          io.Reader
	totalBytes int64
	bytes      int64
}

func NewProgressReader(totalBytes int64, r io.Reader) *ProgressReader {
	if totalBytes <= 0 {
		panic("the totalBytes must greater than 0")
	}
	return &ProgressReader{
		r:          r,
		totalBytes: totalBytes,
	}
}

func (r *ProgressReader) Read(p []byte) (n int, err error) {
	n, err = r.r.Read(p)
	r.bytes += int64(n)
	return
}

func (r *ProgressReader) Percentage() float64 {
	return 100 * float64(r.bytes) / float64(r.totalBytes)
}

func (r *ProgressReader) PercentageOffset(n int64) float64 {
	return 100 * float64(r.bytes+n) / float64(r.totalBytes)
}

func (r *ProgressReader) Bytes() int64 {
	return r.bytes
}

func main() {
	f, err := os.Open("x.csv")
	if err != nil {
		panic(err)
	}
	stat, err := f.Stat()
	if err != nil {
		panic(err)
	}

	totalBytes := stat.Size()
	pr := NewProgressReader(totalBytes, f)
	r := csv.NewReader(pr)

	rf := reflect.ValueOf(r).Elem().FieldByName("r")
	rf = reflect.NewAt(rf.Type(), unsafe.Pointer(rf.UnsafeAddr())).Elem()
	br := rf.Interface().(*bufio.Reader)
	// Can *bufio.Reader be exported?

	for {
		_, err := r.Read()
		if err != nil {
			if err == io.EOF {
				break
			}
			panic(err)
		}
		fmt.Printf(
			"%.3f %.3f %d\n",
			pr.Percentage(),
			pr.PercentageOffset(int64(-br.Buffered())), // or exported the Buffered
			-br.Buffered(),
		)
	}
}

@veezhang veezhang changed the title proposal: encoding/csv: support calculate the progress proposal: encoding/csv: add Buffered Mar 8, 2022
@ianlancetaylor
Copy link
Contributor

OK, if you don't want FieldPos, then what do you want instead?

If performance is an issue, then consider that anything we do is going to cost a little bit of performance, only it will cost that performance for everybody. But I don't see why performance would be a significant issue here.

@veezhang
Copy link
Contributor Author

veezhang commented Mar 8, 2022

@ianlancetaylor
This will read twice. When the file is relatively large, we need to consider performance.

Is it possible to add the following functions?

// Buffered returns the number of bytes that can be read from the current buffer of Reader.
func (r *Reader) Buffered() int {
	return r.r.Buffered()
}

Therefore, it is possible to know the number of bytes that were actually read.

@veezhang
Copy link
Contributor Author

veezhang commented Mar 8, 2022

Please follow the instruction/template of proposal. https://github.com/golang/proposal#readme
@mengzhuo
Is this okay?

@ianlancetaylor
Copy link
Contributor

Adding a Buffered method is easy but seems awkward to use. More importantly, it ties the encoding/csv API closely to bufio.Reader, which is currently an implementation detail.

Can we take a step back? What is the problem that you are trying to address?

@veezhang
Copy link
Contributor Author

veezhang commented Mar 9, 2022

OK, it's make sense.

I want to calculate the percentage of bytes that have been read from the file. Due to the buffer of *bufio.Reader, the percentage cannot be accurately calculated.

How about check the type of io.Reader in NewReader, use it directly when io.Reader is already *bufio.Reader?

Or others.

@ianlancetaylor
Copy link
Contributor

How about check the type of io.Reader in NewReader, use it directly when io.Reader is already *bufio.Reader?

That already happens. See the implementation of bufio.NewReader.

@veezhang
Copy link
Contributor Author

Thanks. It's work.

package main

import (
	"bufio"
	"encoding/csv"
	"fmt"
	"io"
	"os"
)

type ProgressReader struct {
	r          io.Reader
	totalBytes int64
	bytes      int64
}

func NewProgressReader(totalBytes int64, r io.Reader) *ProgressReader {
	if totalBytes <= 0 {
		panic("the totalBytes must greater than 0")
	}
	return &ProgressReader{
		r:          r,
		totalBytes: totalBytes,
	}
}

func (r *ProgressReader) Read(p []byte) (n int, err error) {
	n, err = r.r.Read(p)
	r.bytes += int64(n)
	return n, err
}

func (r *ProgressReader) Percentage() float64 {
	return 100 * float64(r.bytes) / float64(r.totalBytes)
}

func (r *ProgressReader) PercentageOffset(n int64) float64 {
	return 100 * float64(r.bytes+n) / float64(r.totalBytes)
}

func (r *ProgressReader) Bytes() int64 {
	return r.bytes
}

func main() {
	f, err := os.Open("xxx.csv")
	if err != nil {
		panic(err)
	}
	stat, err := f.Stat()
	if err != nil {
		panic(err)
	}

	totalBytes := stat.Size()
	pr := NewProgressReader(totalBytes, f)
	br := bufio.NewReader(pr)
	r := csv.NewReader(br)

	for {
		_, err := r.Read()
		if err != nil {
			if err == io.EOF {
				break
			}
			panic(err)
		}
		fmt.Printf(
			"%.3f %.3f %d\n",
			pr.Percentage(),
			pr.PercentageOffset(int64(-br.Buffered())),
			-br.Buffered(),
		)
	}
}

@ianlancetaylor
Copy link
Contributor

OK, should we consider this proposal to be retracted? Thanks.

@veezhang
Copy link
Contributor Author

OK, thanks again!

@veezhang
Copy link
Contributor Author

veezhang commented Dec 6, 2022

#43401

@golang golang locked and limited conversation to collaborators Dec 6, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge Proposal WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Projects
None yet
Development

No branches or pull requests

4 participants