New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
io/ioutil: ReadFile fails on Darwin, FreeBSD reading 2G file #7812
Labels
Milestone
Comments
The error is coming out of the ioutil.readFile. That error is coming from the go libraries. Did you run the program on a file that was at least 2GB in size? My sample program is just this: package main import ( "fmt" "io/ioutil" "os" ) func main() { // Calculate the MD5 sum of all files under the specified directory, // then print the results sorted by path name. _, err := ioutil.ReadFile(os.Args[1]) if err != nil { fmt.Println(err) return } } I'm not sure where you want me to put a print statement... I also don't have gdb available on OSX Mavericks, as they have replaced gcc and gdb with clang and lldb. I tried using lldb, but am getting errors from lldb. |
Did you try it with the binary download for OSX? No offense, but I don't have the time nor inclination to debug the Go standard library... I am new to Go, and would just be fumbling around. If you think you know where your library might be failing, maybe you could put some print statements in and send me those files so I can build with them? Thanks. |
Okay, I just confirmed I can reproduce it on both Go 1.2.1 and Go tip on OS X 10.9.2: ba12:~ bradfitz$ ls -l /Users/bradfitz/Downloads/many-ubuntu.iso -rw-r--r-- 1 bradfitz staff 2332381184 Apr 21 14:41 /Users/bradfitz/Downloads/many-ubuntu.iso ba12:~ bradfitz$ cat test3.go package main import ( "fmt" "io/ioutil" "os" ) func main() { // Calculate the MD5 sum of all files under the specified directory, // then print the results sorted by path name. b, err := ioutil.ReadFile(os.Args[1]) if err != nil { fmt.Println(err) return } println(len(b)) } For both: go version devel +2b0a7f247bb3 Fri Apr 18 08:11:31 2014 -0700 darwin/amd64 and: go version go1.2.1 darwin/amd64 .. as built by tip and downloaded from the website, respectively, I get: ba12:~ bradfitz$ go run test3.go /Users/bradfitz/Downloads/many-ubuntu.iso read /Users/bradfitz/Downloads/many-ubuntu.iso: invalid argument Labels changed: added release-go1.3, removed release-go1.3maybe. Status changed to Accepted. |
Added logging to ioutil.ReadFile: it's not panicing. It's an error from bytes.Buffer.ReadFrom. Adding more logging there: m, e := r.Read(b.buf[len(b.buf):cap(b.buf)]) if e != nil { log.Printf("Read error from %T for read %d to %d: %d, %v", r, len(b.buf), cap(b.buf), m, \ e) } b.buf = b.buf[0 : len(b.buf)+m] n += int64(m) if e == io.EOF { break } if e != nil { return n, e } And I get: 2014/04/21 15:07:37 Read error from *os.File for read 2147483136 to 4294966784: 0, read /Users/bradfitz/Downloads/many-ubuntu.iso: invalid argument That offset is 512 bytes shy of 2 GiB. 512 bytes is probably bytes.MinRead (also 512). Why is the slice capacity even getting up to 4GB? Oh, because ReadFile does: if fi, err := f.Stat(); err == nil { // Don't preallocate a huge buffer, just in case. if size := fi.Size(); size < 1e9 { n = size } } So the doubling keeps going, even beyond the os.Stat size. So then why the invalid argument from *os.File? |
Confirmed: On OS X, syscall.Read's size can only be max 2<<30 - 1. n, err := syscall.Read(int(f.Fd()), buf) log.Printf("For buf size %d: Read = %v, %v", len(buf), n, err) 2014/04/21 15:20:33 For buf size 2147483647: Read = 2147483647, <nil> 2014/04/21 15:20:39 For buf size 2147483648: Read = -1, invalid argument 2014/04/21 15:20:17 For buf size 2147483649: Read = -1, invalid argument |
On Linux 3.8 x86_64, big reads are fine: $ go run bigread.go 2014/04/21 15:33:53 os.Stat = 2560001329 bytes in file 2014/04/21 15:33:53 For buf size 1: Read = 1, <nil> 2014/04/21 15:33:53 For buf size 500: Read = 500, <nil> 2014/04/21 15:34:00 For buf size 2147483647: Read = 2147479552, <nil> 2014/04/21 15:34:01 For buf size 2147483648: Read = 2147479552, <nil> 2014/04/21 15:34:03 For buf size 2147483649: Read = 2147479552, <nil> 2014/04/21 15:34:23 For buf size 2560001329: Read = 2147479552, <nil> .... they don't return everything that was requested, which surprises me, but it's at least fine for the io.Reader contract. |
CL https://golang.org/cl/89900044 mentions this issue. |
You need an Apple Developer account. I have one and would be happy to file it if you guys want to attach your sample code and a description that you think would be useful. If you have an account you can use this url: https://idmsa.apple.com/IDMSWebAuth/login.html?appIdKey=77e2a60d4bdfa6b7311c854a56505800be3c24e3a27a670098ff61b69fc5214b&sslEnabled=true&rv=3 |
As I summarized in https://golang.org/cl/89900044#msg9, I think there it's not a bug per se. FreeBSD also has the same behavior, and then made configurable with a sysctl. I'd think it's a feature for 32-bit compatibility (e.g. if you don't declare read/write syscall, its argument and return type will default to int, so they avoided >2GB case so that even in that case the error return value will be negative and all success return value will be >=0.) |
I see the behaviour you're describing, but I don't know how to characterize that except as a bug. These systems are supposedly POSIX compliant, and I don't see anything in http://pubs.opengroup.org/onlinepubs/009695399/functions/read.html that permits this behaviour (unless they define SSIZE_MAX as 2G). In any case it doesn't matter for our purposes, we still need something along the lines of Brad's patch. |
While this appears to be a valid issue, it's quite unnecessary to read the entire large file into memory if all you intend to do is compute its md5. Below is a (completely untested -- I typed it into a comment box) example of how this kind of thing is more generally done when you don't actually need the entire blob resident: f, err := os.Open("averylargefile") if err != nil { // appropriate error handling } defer f.Close() h := md5.New() _, err = io.Copy(h, f) if err != nil { // appropriate error handling } Now h has your md5 and very little memory was consumed. |
This issue was closed by revision 8409dea. Status changed to Fixed. |
Re comment #23, if you don't think ReadFile is a good solution for md5ing files, it might be helpful if you guys update your bounded.go example found here -> http://blog.golang.org/pipelines/bounded.go to use the preferred method. The bounded.go example script is what led me to discover and file this bug in the first place. Thanks again for all of the work you guys put into this issue! |
CL https://golang.org/cl/94070044 mentions this issue. |
This issue was closed.
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
by jeffreydwalter:
Attachments:
The text was updated successfully, but these errors were encountered: