Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: encoding/csv: add a Reader.IndexFields method #21768

Closed
earthboundkid opened this issue Sep 5, 2017 · 6 comments
Closed

proposal: encoding/csv: add a Reader.IndexFields method #21768

earthboundkid opened this issue Sep 5, 2017 · 6 comments

Comments

@earthboundkid
Copy link
Contributor

earthboundkid commented Sep 5, 2017

I previously opened and closed #21370, which was a proposal to add a "field reader" type, like Python's DictReader. My new proposal is the addition of a method like this:

func (r *csv.Reader) IndexFields(fields map[string]*int) error {
    headerFields, err := r.Read()
    if err != nil {
        return err
    }

    // Initialize pointer values to sentinal
    for _, p := range fields {
        *p = -1
    }

    for i, header := range headerFields {
        if p, ok := fields[header]; ok {
            *p = i
        }
    }

    // Check for unset header
    for header, p := range fields {
        if *p == -1 {
            return fmt.Errorf("missing header %q", header)
        }
    }
    return nil
}

Usage is like this:

	r := csv.NewReader(f)

	var fooIdx, barIdx, bazIdx int
	if err := r.IndexFields(map[string]*int{
		"foo":   &fooIdx,
		"bar":   &barIdx,
		"baz":   &bazIdx,
	}); err != nil {
		return  err
	}

	// Use r as normal below...

I think this strikes a better balance than adding a whole new type to the repo, while serving a very common usecase.

@gopherbot gopherbot added this to the Proposal milestone Sep 5, 2017
@ianlancetaylor
Copy link
Contributor

I don't understand how this would be used. Can you show an example input file, and example code to read that file? Or just write the docs for the IndexFields method? It seems to me that this can only read different orderings of "foo", "bar", and "baz", which seems useless. Probably I misunderstand.

@earthboundkid
Copy link
Contributor Author

Sorry, I didn't see this before. Here is a playground link with a more complete usage example and a first pass at function documentation.

The usecase for this is that you have a CSV file, which you know has certain headers, but you either don't know which index they are at, or don't want to hardcode those indices, in case they change in the future. Obviously, you could just write this yourself because it's a short function that doesn't use csv.Reader internals, but I think it smooths out a common occurrence.

I often see CSV parsing code with magic numbers like record[3] + " <" + record[5] + ">". You can improve that code significantly by writing record[nameIdx] + " <" + record[emailIdx] + ">". To do that you can hardcode const nameIdx, emailIdx = 3, 5, but if the file already has a header, why not use that to get the indices?

@earthboundkid
Copy link
Contributor Author

earthboundkid commented Oct 10, 2017

It occurs to me that it would be a nicer interface to take in a struct and use reflection to set the values. I don't have time ATM to whip up a demo, but usage like:

    in := `first_name,last_name,username
"Rob","Pike",rob
Ken,Thompson,ken
"Robert","Griesemer","gri"
`
    r := csv.NewReader(strings.NewReader(in))
    var idxs struct { 
        Username int 
    }
    if err := r.IndexFields(&idxs); err != nil {
        log.Fatal(err)
    }

    // Print all usernames
    for {
        record, err := r.Read()
        if err == io.EOF {
            break
        }
        if err != nil {
            log.Fatal(err)
        }

        fmt.Println(record[idxs.Username])
    }

@ianlancetaylor
Copy link
Contributor

Thanks for the example. There are a hundred ways we could extend the encoding/csv package, to account for all the many different ways that people use CSV. Our preference has been to stick to RFC 4180. It's easy enough to implement your suggestion outside of the encoding/csv package, so my vote would be to do that rather than to extend the standard package. See https://golang.org/doc/faq#x_in_std .

@egonelbre
Copy link
Contributor

There is a somewhat nicer solution to the problem (error handling and implementation left as exercise):

data := mycsv.NewReader(input)
firstname := data.String("first_name")
lastname := data.String("last_name")
username := data.String("username")
age := data.Int("age")

for data.Next() {
	fmt.Println(*firstname, *lastname, *username, *age)
}

But, I agree with Ian, that this probably doesn't belong in standard package.

@rsc
Copy link
Contributor

rsc commented Feb 5, 2018

Per discussion above, let's leave this for add-on packages that can more easily customize to very specific use cases.

@rsc rsc closed this as completed Feb 5, 2018
@golang golang locked and limited conversation to collaborators Feb 5, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants