Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/csv: BOM presents in fields read with Reader #9588

Closed
tp opened this issue Jan 14, 2015 · 1 comment
Closed

encoding/csv: BOM presents in fields read with Reader #9588

tp opened this issue Jan 14, 2015 · 1 comment

Comments

@tp
Copy link

tp commented Jan 14, 2015

I was checking the header fields of an external CSV file and noticed, that the file BOM is part of the first field when reading with csv.Reader.

package main

import (
    "bytes"
    "encoding/csv"
    "fmt"
)

func main() {
    csvData := []byte("\uFEFFa,b")

    r := bytes.NewReader(csvData)

    csvR := csv.NewReader(r)

    header, err := csvR.Read()

    if err != nil {
        fmt.Println(err.Error())
        return
    }

    fmt.Printf("%q", header[0]) // prints "\ufeffa" where I expected "a"
}

snippet on playground

Related: Since U+FEFF is called a "[...] space", I was expected string.TrimSpace to remove it, which it did not. (Which would have been my preferable work-around to remove and spaces around fields). I would guess this is also the reason why csvR.TrimLeadingSpace = true does not remove the BOM.

@mikioh mikioh changed the title BOM presents in fields read with csv.Reader encoding/csv: BOM presents in fields read with Reader Jan 14, 2015
@ianlancetaylor
Copy link
Contributor

The BOM is a bizarre idea in general, and it makes absolutely no sense when using UTF-8. It's not appropriate for encoding/csv to do anything special with a BOM. If you have to deal with it, deal it with before passing your reader to encoding/csv. If you have a file that is not UTF-8, you will to use a translating reader anyhow, as encoding/csv, like all Go code, expects UTF-8.

While it's true that U+FEFF is a space, the UTF-8 representation of U+FEFF is not the literal bytes FEFF.

@golang golang locked and limited conversation to collaborators Jun 25, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants