Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

io/ioutil: ReadFile fails on Darwin, FreeBSD reading 2G file #7812

Closed
gopherbot opened this issue Apr 17, 2014 · 26 comments
Closed

io/ioutil: ReadFile fails on Darwin, FreeBSD reading 2G file #7812

gopherbot opened this issue Apr 17, 2014 · 26 comments
Milestone

Comments

@gopherbot
Copy link

by jeffreydwalter:

Before filing a bug, please check whether it has been fixed since the
latest release. Search the issue tracker and check that you're running the
latest version of Go:

Run "go version" and compare against
http://golang.org/doc/devel/release.html  If a newer version of Go exists,
install it and retry what you did to reproduce the problem.

Thanks.

What does 'go version' print?
go version go1.2.1 darwin/amd64

What steps reproduce the problem?
If possible, include a link to a program on play.golang.org.

1. Create a 1GB and 2GB file. In this case I just picked a random .img file off of my
computer, but file format/type do not seem to be an issue in reproducing this.
In the following, test.1gb.img is a 1GB file, and I just cat'd it out twice to
test.2gb.img to make a 2GB file to test with.

jw$ cat test.1gb.img >> test.2gb.img
jw$ cat test.1gb.img >> test.2gb.img

jw$ ls -ahl
-rw-r--r--   1 jw  staff   2.0G Apr 17 15:44 test.2gb.img
-rw-r--r--   1 jw  staff   1.0G Apr 17 15:42 test.1gb.img
-rw-r--r--   1 jw  staff   287B Apr 15 19:33 test3.go

2. Run test3.go (attached) against test.1gb.img (the 1GB file.) Notice, it completes
without error.
jw$ go run test3.go test.1gb.img 

3. Run test3.go (attached) against test.2gb.img (the 2GB file.) It returns with
"read test.2gb.img: invalid argument"
jw$ go run test3.go test.2gb.img 
read test.2gb.img: invalid argument

What should have happened instead?
I expect that ioutil.ReadFile() would read in the 2GB file. Or at least fail more
gracefully, or with a better error message if this isn't supported.

Please provide any additional information below.
This appears to be a limitation of ioutil.ReadFile() in handling files over a certain
file size. I was able to reproduce this with .cdr, .iso, .mp4, etc. files, so it doesn't
appear to be related to type of file, or filename.

Attachments:

  1. test3.go (287 bytes)
@ianlancetaylor
Copy link
Contributor

Comment 1:

I could not reproduce this on my GNU/Linux system.
Can you find out where the error value is coming from?

Labels changed: added repo-main, release-go1.3maybe, os-macosx.

@bradfitz
Copy link
Contributor

Comment 3:

Works for me at tip at least, and I don't recall this changing anytime recently.
It won't work on a 32-bit machine, though.  It looks like you're on 64-bit, though.

@gopherbot
Copy link
Author

Comment 4 by jeffreydwalter:

I'm running the 64-bit version of OS-X 10.9.2 with a 2.4 GHz Intel Core i7 processor and
16GB RAM.
Re comment #1, how do I do that? I assume it's coming from go.

@ianlancetaylor
Copy link
Contributor

Comment 5:

Run the program under a debugger or add print statements.  ReadFile is just passing
along an error message that came from somewhere else.  Where did it come from?  Since I
can't reproduce the problem myself, I don't know.

@gopherbot
Copy link
Author

Comment 6 by jeffreydwalter:

The error is coming out of the ioutil.readFile. That error is coming from the go
libraries. Did you run the program on a file that was at least 2GB in size?
My sample program is just this:
package main
import (
  "fmt"
  "io/ioutil"
  "os"
)
func main() {
  // Calculate the MD5 sum of all files under the specified directory,
  // then print the results sorted by path name.
  _, err := ioutil.ReadFile(os.Args[1])
  if err != nil {
    fmt.Println(err)
    return
  }
}
I'm not sure where you want me to put a print statement... I also don't have gdb
available on OSX Mavericks, as they have replaced gcc and gdb with clang and lldb. I
tried using lldb, but am getting errors from lldb.

@bradfitz
Copy link
Contributor

Comment 7:

I tried with a 2.1 GB file, yes.
You'll want to add some print statements around the Go standard library, which means
building Go from source, and not using the binary downloads.

Status changed to WaitingForReply.

@gopherbot
Copy link
Author

Comment 8 by jeffreydwalter:

Did you try it with the binary download for OSX?
No offense, but I don't have the time nor inclination to debug the Go standard
library... I am new to Go, and would just be fumbling around. If you think you know
where your library might be failing, maybe you could put some print statements in and
send me those files so I can build with them?
Thanks.

@bradfitz
Copy link
Contributor

Comment 9:

Okay, I just confirmed I can reproduce it on both Go 1.2.1 and Go tip on OS X 10.9.2:
ba12:~ bradfitz$ ls -l /Users/bradfitz/Downloads/many-ubuntu.iso
-rw-r--r--  1 bradfitz  staff  2332381184 Apr 21 14:41
/Users/bradfitz/Downloads/many-ubuntu.iso
ba12:~ bradfitz$ cat test3.go
package main
import (
    "fmt"
    "io/ioutil"
    "os"
)
func main() {
    // Calculate the MD5 sum of all files under the specified directory,
    // then print the results sorted by path name.
    b, err := ioutil.ReadFile(os.Args[1])
    if err != nil {
        fmt.Println(err)
        return
    }
    println(len(b))
}
For both:
go version devel +2b0a7f247bb3 Fri Apr 18 08:11:31 2014 -0700 darwin/amd64
and:
go version go1.2.1 darwin/amd64
.. as built by tip and downloaded from the website, respectively, I get:
ba12:~ bradfitz$ go run test3.go /Users/bradfitz/Downloads/many-ubuntu.iso
read /Users/bradfitz/Downloads/many-ubuntu.iso: invalid argument

Labels changed: added release-go1.3, removed release-go1.3maybe.

Status changed to Accepted.

@bradfitz
Copy link
Contributor

Comment 10:

Worth noting: I have 4GB of RAM and had all applications closed, including Chrome.

@bradfitz
Copy link
Contributor

Comment 11:

Added logging to ioutil.ReadFile: it's not panicing. It's an error from
bytes.Buffer.ReadFrom.
Adding more logging there:
                m, e := r.Read(b.buf[len(b.buf):cap(b.buf)])
                if e != nil {
                        log.Printf("Read error from %T for read %d to %d: %d, %v", r, len(b.buf), cap(b.buf), m, \
e)
                }
                b.buf = b.buf[0 : len(b.buf)+m]
                n += int64(m)
                if e == io.EOF {
                        break
                }
                if e != nil {
            return n, e
                }
And I get:
2014/04/21 15:07:37 Read error from *os.File for read 2147483136 to 4294966784: 0, read
/Users/bradfitz/Downloads/many-ubuntu.iso: invalid argument
That offset is 512 bytes shy of 2 GiB.  512 bytes is probably bytes.MinRead (also 512).
Why is the slice capacity even getting up to 4GB?
Oh, because ReadFile does:
        if fi, err := f.Stat(); err == nil {
                // Don't preallocate a huge buffer, just in case.
                if size := fi.Size(); size < 1e9 {
                        n = size
                }
        }
So the doubling keeps going, even beyond the os.Stat size.
So then why the invalid argument from *os.File?

@bradfitz
Copy link
Contributor

Comment 12:

The Read system call probably doesn't like a 2GB size_t is my guess.

@bradfitz
Copy link
Contributor

Comment 13:

Confirmed:
On OS X, syscall.Read's size can only be max 2<<30 - 1.
    n, err := syscall.Read(int(f.Fd()), buf)
    log.Printf("For buf size %d: Read = %v, %v", len(buf), n, err)
2014/04/21 15:20:33 For buf size 2147483647: Read = 2147483647, <nil>
2014/04/21 15:20:39 For buf size 2147483648: Read = -1, invalid argument
2014/04/21 15:20:17 For buf size 2147483649: Read = -1, invalid argument

@bradfitz
Copy link
Contributor

Comment 14:

On Linux 3.8 x86_64, big reads are fine:
$ go run bigread.go
2014/04/21 15:33:53 os.Stat = 2560001329 bytes in file
2014/04/21 15:33:53 For buf size 1: Read = 1, <nil>
2014/04/21 15:33:53 For buf size 500: Read = 500, <nil>
2014/04/21 15:34:00 For buf size 2147483647: Read = 2147479552, <nil>
2014/04/21 15:34:01 For buf size 2147483648: Read = 2147479552, <nil>
2014/04/21 15:34:03 For buf size 2147483649: Read = 2147479552, <nil>
2014/04/21 15:34:23 For buf size 2560001329: Read = 2147479552, <nil>
.... they don't return everything that was requested, which surprises me, but it's at
least fine for the io.Reader contract.

@gopherbot
Copy link
Author

Comment 15:

CL https://golang.org/cl/89900044 mentions this issue.

@ianlancetaylor
Copy link
Contributor

Comment 16:

That certainly sounds like a bug in Darwin.  The man page says the length argument can
be size_t, and the return type of read is ssize_t, as it should be.  I assume that for
Darwin amd64 size_t is 64 bytes.
What happens with the corresponding C code that calls read?

@bradfitz
Copy link
Contributor

Comment 17:

Same thing with C, for both read and pread.

@ianlancetaylor
Copy link
Contributor

Comment 18:

Does anybody know how to file a bug against Darwin?

@gopherbot
Copy link
Author

Comment 19 by jeffreydwalter:

You need an Apple Developer account. I have one and would be happy to file it if you
guys want to attach your sample code and a description that you think would be useful.
If you have an account you can use this url:
https://idmsa.apple.com/IDMSWebAuth/login.html?appIdKey=77e2a60d4bdfa6b7311c854a56505800be3c24e3a27a670098ff61b69fc5214b&sslEnabled=true&rv=3

@minux
Copy link
Member

minux commented Apr 22, 2014

Comment 20:

As I summarized in https://golang.org/cl/89900044#msg9,
I think there it's not a bug per se. FreeBSD also has the same behavior,
and then made configurable with a sysctl.
I'd think it's a feature for 32-bit compatibility (e.g. if you don't declare
read/write syscall, its argument and return type will default to int, so
they avoided >2GB case so that even in that case the error return value
will be negative and all success return value will be >=0.)

@ianlancetaylor
Copy link
Contributor

Comment 21:

I see the behaviour you're describing, but I don't know how to characterize that except
as a bug.  These systems are supposedly POSIX compliant, and I don't see anything in
http://pubs.opengroup.org/onlinepubs/009695399/functions/read.html that permits this
behaviour (unless they define SSIZE_MAX as 2G).
In any case it doesn't matter for our purposes, we still need something along the lines
of Brad's patch.

@bradfitz
Copy link
Contributor

Comment 22:

Labels changed: added os-freebsd.

@dustin
Copy link

dustin commented Apr 22, 2014

Comment 23:

While this appears to be a valid issue, it's quite unnecessary to read the entire large
file into memory if all you intend to do is compute its md5.
Below is a (completely untested -- I typed it into a comment box) example of how this
kind of thing is more generally done when you don't actually need the entire blob
resident:
f, err := os.Open("averylargefile")
if err != nil {
    // appropriate error handling
}
defer f.Close()
h := md5.New()
_, err = io.Copy(h, f)
if err != nil {
    // appropriate error handling
}
Now h has your md5 and very little memory was consumed.

@bradfitz
Copy link
Contributor

Comment 24:

This issue was closed by revision 8409dea.

Status changed to Fixed.

@quarnster
Copy link

Comment 25:

> (unless they define SSIZE_MAX as 2G)
FYI they do on !64bit. From /usr/include/i386/limits.h:
#define SSIZE_MAX   LONG_MAX    /* max value for a ssize_t */
#define LONG_MAX    2147483647L /* max signed long */

@gopherbot
Copy link
Author

Comment 26 by jeffreydwalter:

Re comment #23, if you don't think ReadFile is a good solution for md5ing files, it
might be helpful if you guys update your bounded.go example found here ->
http://blog.golang.org/pipelines/bounded.go to use the preferred method. The bounded.go
example script is what led me to discover and file this bug in the first place.
Thanks again for all of the work you guys put into this issue!

@gopherbot
Copy link
Author

Comment 27:

CL https://golang.org/cl/94070044 mentions this issue.

This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants