Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os: "async" file IO #6817

Open
dvyukov opened this issue Nov 22, 2013 · 26 comments
Open

os: "async" file IO #6817

dvyukov opened this issue Nov 22, 2013 · 26 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Milestone

Comments

@dvyukov
Copy link
Member

dvyukov commented Nov 22, 2013

Read/Write file operations must not consume threads, and use the polling similar to net
package.

This was raised several times before.

Here is a particular example with godoc:
https://groups.google.com/d/msg/golang-nuts/TeNvQqf4tO4/dskZuFH5QVYJ
@minux
Copy link
Member

minux commented Nov 22, 2013

Comment 1:

what about stat syscalls? readdir?
one more reason to expose the poller?

@dvyukov
Copy link
Member Author

dvyukov commented Nov 22, 2013

Comment 2:

> what about stat syscalls? readdir?
I don't know for now. Do you have any thoughts?
> one more reason to expose the poller?
Yes, it is related.

@minux
Copy link
Member

minux commented Nov 22, 2013

Comment 3:

I don't think there is asynchronous stat syscall available, and IIRC, most
event based web servers take great pain to optimize the stat(2)-taking-up-a-thread
problem (e.g. dedicated stat(2) thread pools)
Similarly for readdir, is there a pollable version available?
I don't know if readdir/stat is contributing to the godoc problem, but I think they
might be a problem if the GOPATH is large enough.

@ianlancetaylor
Copy link
Contributor

Comment 4:

The pollable version of readdir is getdents, which the syscall package already uses.

@bradfitz
Copy link
Contributor

Comment 5:

This continually bites me.  I have an interface that has both network and filesystem
implementations and the network one works great (firing off a bounded number of
goroutines: say, 1000) but then the filesystem implementation of the interface kills the
OS, and my code has to manually limit itself, which feels like a layering violation. 
The runtime should do the right thing for me.
runtime/debug.SetMaxThreads sets the max threads Go uses before it blows up.
If we can't do async filesystem IO everywhere (and I don't think we can), then I think
we should have something like runtime/debug.SetMaxFSThreads that controls the size of
the pool of thread doing filesystem operations but blocks instead of crashes.  That
means for the read/write syscalls, we'd have to know which fds are marked non-blocked
(network stuff) and which aren't.
Or we put all this capping logic in pkg/os, perhaps opt-in.
pkg os
func SetMaxFSThreads(n int)

@dvyukov
Copy link
Member Author

dvyukov commented Nov 23, 2013

Comment 6:

Do you use read/write? Or more involved ops like readdir?
Can you create a repro for this issue?

@bradfitz
Copy link
Contributor

Comment 7:

I use read, write, open, close, readdir, stat, lstat.  godoc is a repro.  camlistore is
a repro.  My original Go bug report to Russ on May 5, 2010 was a repro.
I'll write something small, though, and attach it here.

@rsc
Copy link
Contributor

rsc commented Dec 4, 2013

Comment 8:

Labels changed: added release-none, removed go1.3maybe.

@rsc
Copy link
Contributor

rsc commented Dec 4, 2013

Comment 9:

Labels changed: added repo-main.

@dvyukov
Copy link
Member Author

dvyukov commented Oct 27, 2014

Comment 10:

FTR, net.FileConn does not work for serial ports, etc. FileConn calls newFileFD which
does syscall.GetsockoptInt(fd, syscall.SOL_SOCKET, syscall.SO_TYPE) which will fail for
non-sockets.

@niemeyer
Copy link
Contributor

Comment 11:

Also FTR, the typical way to unblock a f.Read operation that is waiting for more data is
to f.Close, and the current implementation of these methods makes that process racy. An
implementation similar to the net package would also fix that.

@tv42
Copy link

tv42 commented Feb 13, 2015

(This came up in a conversation today and I wanted to make sure people don't start off with incorrect assumptions.)

I want to be really clear on this: there is no such thing as regular file or dir I/O that wouldn't wait for disk on cache miss. I am not talking about serial ports or such here, but files and directories. Regular files are always "ready" as far as select(2) and friends are concerned, so technically they never "block", and "non-blocking" is the wrong word. But they may wait for actual disk IO to happen, and there is realistically no way to avoid that, in the general case (in POSIX/Linux).

The network poller / epoll has nothing to contribute to here. There is no version of read(2) and friends where the syscall would return early, without waiting for disk, if there's nothing cached. Go really has very little to do there.

People have been talking about extending Linux to implement non-waiting file IO (e.g. http://lwn.net/Articles/612483/ ) but that's not realistic today.

I don't see Go having much choice beyond threads doing syscalls. The real question in my mind is, is there a way to limit syscall concurrency to avoid swamping the CPU/OS with too many threads, while still avoiding deadlocks.

And just to minimize chances of confusion, file AIO ("Async I/O") is something very different, and not applicable to this conversation. It's a very restrictive API (actually, multiple), bypasses useful features like caching, and doesn't necessarily perform well at all.

@dvyukov
Copy link
Member Author

dvyukov commented Feb 13, 2015

What is wrong with io_submit?
http://man7.org/linux/man-pages/man2/io_submit.2.html

@tv42
Copy link

tv42 commented Feb 13, 2015

@dvyukov io_submit is the Linux AIO API (as opposed to POSIX AIO). It's a separate codepath and dependent on the filesystem doing the right thing; the implementations have been problematic, and using aio introduces a whole bunch of risk. The original implementation assumed O_DIRECT and this air remains; non-O_DIRECT operation is even more problematic. O_DIRECT is not safe to use for generic file operations because others accessing the file will use buffer cache. Without O_DIRECT e.g. the generic version of io_submit falls back to synchronous processing. Some filesystems don't handle unaligned accesses well. In some circumstances (e.g. journaling details, file space not preallocated, etc), io_submit has to wait until the operation completes, instead of just submitting an async request; this tends to be more typical without O_DIRECT. The default limit for pending requests is only 128; after that io_submit starts blocking. Finally, io_submit only helps with the basic read/write workload, no open(2), stat(2) etc.

I'm not saying it won't work, but I also would not be surprised if a change moving file IO to io_submit got reverted within a few months.

@dvyukov
Copy link
Member Author

dvyukov commented Feb 13, 2015

OK, then everybody should switch to Windows :)

@minux
Copy link
Member

minux commented Feb 13, 2015 via email

@gopherbot
Copy link

CL https://golang.org/cl/36799 mentions this issue.

@gopherbot
Copy link

CL https://golang.org/cl/36800 mentions this issue.

gopherbot pushed a commit that referenced this issue Feb 13, 2017
This will make it possible to use the poller with the os package.

This is a lot of code movement but the behavior is intended to be
unchanged.

Update #6817.
Update #7903.
Update #15021.
Update #18507.

Change-Id: I1413685928017c32df5654ded73a2643820977ae
Reviewed-on: https://go-review.googlesource.com/36799
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
gopherbot pushed a commit that referenced this issue Feb 15, 2017
This changes the os package to use the runtime poller for file I/O
where possible. When a system call blocks on a pollable descriptor,
the goroutine will be blocked on the poller but the thread will be
released to run other goroutines. When using a non-pollable
descriptor, the os package will continue to use thread-blocking system
calls as before.

For example, on GNU/Linux, the runtime poller uses epoll. epoll does
not support ordinary disk files, so they will continue to use blocking
I/O as before. The poller will be used for pipes.

Since this means that the poller is used for many more programs, this
modifies the runtime to only block waiting for the poller if there is
some goroutine that is waiting on the poller. Otherwise, there is no
point, as the poller will never make any goroutine ready. This
preserves the runtime's current simple deadlock detection.

This seems to crash FreeBSD systems, so it is disabled on FreeBSD.
This is issue 19093.

Using the poller on Windows requires opening the file with
FILE_FLAG_OVERLAPPED. We should only do that if we can remove that
flag if the program calls the Fd method. This is issue 19098.

Update #6817.
Update #7903.
Update #15021.
Update #18507.
Update #19093.
Update #19098.

Change-Id: Ia5197dcefa7c6fbcca97d19a6f8621b2abcbb1fe
Reviewed-on: https://go-review.googlesource.com/36800
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
@manishrjain
Copy link

Just to add some numbers to the discussion, we're facing this problem as well.

Running fio on an Amazon EC2 i3.large instance, we're able to get 64K IOPS on a 4K block size, using 8 jobs, file size = 4G for random reads. (Other times, I've seen it go close to 100K IOPS)

We created a small Go program to do the exact same thing using Goroutines. And it doesn't budge above 20K IOPS. In fact, the throughput won't increase any further once the number of Goroutines reach the number of cores. This strongly indicates that Go is paying the cost of context switching, because of doing blocking read in every iteration of the loop.

Full Go code here: https://github.com/dgraph-io/badger-bench/blob/master/randread/main.go
go build . && ./randread --dir /mnt/data/fio --preads 6500000 --jobs <num-cores>

It seems like using an async IO is the only way to achieve IO throughput in Go. SSDs are able to push more and more throughput with every release; so there has to be a way in Go to realize that.

@bradfitz
Copy link
Contributor

@manishrjain, what fio command line are you comparing against?

Btw, your benchmark has a global mutex (don't use rand.Intn in goroutines if you want performance). That would show up if you look at contention profiles.

@manishrjain
Copy link

This is the fio command on my computer, and the output:

$ fio --name=randread --ioengine=libaio --iodepth=32 --rw=randread --bs=4k --direct=0 --size=4G --numjobs=4 --runtime=60 --group_reporting
randread: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=32
...
fio-2.19
Starting 4 processes
Jobs: 4 (f=4): [r(4)][100.0%][r=159MiB/s,w=0KiB/s][r=40.8k,w=0 IOPS][eta 00m:00s]
randread: (groupid=0, jobs=4): err= 0: pid=19525: Wed May 17 09:47:41 2017
   read: IOPS=43.8k, BW=171MiB/s (179MB/s)(10.3GiB/60001msec)
    slat (usec): min=2, max=13539, avg=90.15, stdev=98.95
    clat (usec): min=1, max=27856, avg=2829.90, stdev=708.48
     lat (usec): min=6, max=27873, avg=2920.05, stdev=724.33
    clat percentiles (usec):
     |  1.00th=[ 1512],  5.00th=[ 1816], 10.00th=[ 1992], 20.00th=[ 2224],
     | 30.00th=[ 2416], 40.00th=[ 2608], 50.00th=[ 2800], 60.00th=[ 2992],
     | 70.00th=[ 3184], 80.00th=[ 3408], 90.00th=[ 3696], 95.00th=[ 3920],
     | 99.00th=[ 4448], 99.50th=[ 4704], 99.90th=[ 7200], 99.95th=[ 8896],
     | 99.99th=[15168]
    lat (usec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 250=0.01%
    lat (usec) : 500=0.01%, 750=0.01%, 1000=0.01%
    lat (msec) : 2=10.11%, 4=85.93%, 10=3.92%, 20=0.03%, 50=0.01%
  cpu          : usr=1.10%, sys=10.23%, ctx=1464338, majf=0, minf=153
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued rwt: total=2627651,0,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
   READ: bw=171MiB/s (179MB/s), 171MiB/s-171MiB/s (179MB/s-179MB/s), io=10.3GiB (10.8GB), run=60001-60001msec

And the corresponding Go program:
go build . && ./randread --dir ~/diskfio --jobs 4 --num 1000000 --mode 1

I switched from using global rand to local rand, and it doesn't show up in block profiler or cpu profiler. Fio is getting 43.8K IOPS. My program in Go is giving me ~25K, checked via sar -d 1 -p (the Go program is reporting lesser than what I see via sar, so must be a flaw in my code somewhere).

@bergwolf
Copy link

bergwolf commented Apr 2, 2019

@tv42 @rsc Sorry for jumping late in the dead/old discussion.

I'm not saying it won't work, but I also would not be surprised if a change moving file IO to io_submit got reverted within a few months.

Would it be acceptable to expose the file IO semantics (DIO/AIO) and let programers decide? It is a hard decision for golang because compiler/runtime can't know the underlying storage media speed. But programers especially those who write purpose-built storage components with go, should know better the targeting storage. As a targeting example, it would be possible to write a similar program like fio in Go, instead of letting the compiler/runtime decide everything about file IO.

@ianlancetaylor
Copy link
Contributor

@bergwolf In a sense the semantics are already exposed via the golang.org/x/sys/unix package, which lets you do anything the system supports.

I don't see how it would make sense to expose these semantics in the os package. That would add a lot of complexity for the benefit of very very few users. I've got nothing against rewriting the os package to use a different underlying mechanism, such as io_submit, while retaining the same API, if that makes sense. But I would vote against complicating the API.

@tv42
Copy link

tv42 commented Jun 22, 2019

New development: io_uring is two ringbuffers used to message requests and completions about file I/O. It might be promising. Only filesystem files supported right now, can't use it on sockets, pipes etc at this time.
https://lwn.net/Articles/776703/

@dmitshur dmitshur added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jun 23, 2019
@ianlancetaylor
Copy link
Contributor

@tv42 There is some discussion of io_uring on #31908.

@The-Alchemist
Copy link

It's probably dated, but there's a nice little paper at https://pdfs.semanticscholar.org/d4a6/852f0f4cda6cf0431e04b81771eea08f88e2.pdf:

"An Attempt at Reducing Costs of Disk I/O in Go"
by Sean Wilson, Riccardo Mutschlechner

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Projects
None yet
Development

No branches or pull requests