Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: broken sendfile on SmartOS/Solaris? large files garbled when sent over network #13892

Closed
doublerebel opened this issue Jan 9, 2016 · 17 comments

Comments

@doublerebel
Copy link

What version of Go are you using (go version)? go version go1.5.1 solaris/amd64
What operating system and processor architecture are you using? SunOS 5.11 joyent_20141030T081701Z i86pc i386 i86pc Solaris, SunOS 5.11 joyent_20151002T091442Z i86pc i386 i86pc Solaris
What did you do? Retrieved a >250K file over http from a golang server
What did you expect to see? The correct contents of the file
What did you see instead? A file with the correct size, but the first ~25% of the file repeated to fill the file.

If I access the file through localhost, the file is served correctly. If I send it over any network interface, the bug appears. Different lengths of the file are repeated to fill the space in each case. All smaller files come through without error.

I attempted to step through using godebug but ran into mailgun/godebug#12, since the issue is clearly somewhere in the standard library. So I have setup a temporary fileserver to demonstrate the issue. text.text.2 contains the beginning of Sherlock Holmes from Project Gutenberg (thanks Norvig).

Control: python -m SimpleHTTPServer
http://165.225.174.156:8000/
http://165.225.174.156:8000/test.txt.2
Serves correct contents of file.

Bug: gohttp (itang/gohttp)
http://165.225.174.156:8080/
http://165.225.174.156:8080/test.txt.2
Serves garbled file, every time.

Localhost: gohttp with a barebones nginx proxy from localhost
http://165.225.174.156/
http://165.225.174.156/test.txt.2
Serves correct contents of file.

I discovered this bug because we just recently have hashicorp/consul building on SmartOS (hashicorp/consul#159), and the Consul UI contains a ~750K minified JS file. Consul uses gorilla/mux and http.FileServer, but I am able to reproduce with the much simpler case of gohttp.

The test suite for the http response methods doesn't have tests from a remote machine, or use large files, so this bug can slip through in most cases unnoticed. Please let me know if I can provide more details. We really need a way to step through and debug the standard libs!

@bradfitz
Copy link
Contributor

bradfitz commented Jan 9, 2016

I suspect the sendfile implementation for Solaris is broken, then.

Can you modify src/net/sendfile_solaris.go and add a return statement on the first line of func sendfile?

func sendFile(c *netFD, r io.Reader) (written int64, err error, handled bool) {
    return

And see if that fixes it? (it would revert to Read+Write)

/cc @4ad

@bradfitz bradfitz added this to the Go1.6Maybe milestone Jan 9, 2016
@bradfitz bradfitz changed the title net/http: large files are garbled when sent to a network interface on SmartOS/Solaris net: broken sendfile on SmartOS/Solaris? large files garbled when sent over network Jan 9, 2016
@doublerebel
Copy link
Author

@bradfitz You nailed it. I added a log line and the return statement, and I can now see the file being served correctly. Really appreciate the fast diagnosis.

@4ad
Copy link
Member

4ad commented Jan 9, 2016

Could you please try with the latest SmartOS image? Thanks.

@doublerebel
Copy link
Author

@4ad sure thing, tested both versions on the latest base-64 image: SunOS 5.11 joyent_20151002T091442Z i86pc i386 i86pc Solaris

The current version fails as on the earlier SmartOS image, and the patched version suggested by @bradfitz works.

@4ad
Copy link
Member

4ad commented Jan 10, 2016

Just for confirmation, you tested on a recent SmartOS image, a.i. a recent kernel, not an older SmartOS image with a recent base-64 image?

@4ad 4ad self-assigned this Jan 10, 2016
@doublerebel
Copy link
Author

@4ad the tests were all run on instances in the Joyent Public Cloud, base-64 is the name of their default basic 64-bit SmartOS image. joyent_20151002T091442Z is SmartOS Base Image v15.3.0. I can donate an instance for testing, if that helps.

EDIT: FWIW, I just rebuilt Consul with net/sendfile patched, and it now too works as expected.

@bradfitz
Copy link
Contributor

Perhaps we should just disable sendfile on Solaris for Go 1.6 as a safe option. We can re-enable in Go 1.7 once we understand and fix the problem.

@davecheney
Copy link
Contributor

I think that is a good idea

On Tue, 12 Jan 2016, 05:57 Brad Fitzpatrick notifications@github.com
wrote:

Perhaps we should just disable sendfile on Solaris for Go 1.6 as a safe
option. We can re-enable in Go 1.7 once we understand and fix the problem.


Reply to this email directly or view it on GitHub
#13892 (comment).

gopherbot pushed a commit that referenced this issue Jan 11, 2016
There are reports of corruption. Let's disable it for now (for Go 1.6,
especially) until we can investigate and fix properly.

Update #13892

Change-Id: I557275e5142fe616e8a4f89c00ffafb830eb3b78
Reviewed-on: https://go-review.googlesource.com/18540
Reviewed-by: Dave Cheney <dave@cheney.net>
@bradfitz bradfitz modified the milestones: Go1.7, Go1.6Maybe Jan 13, 2016
@bradfitz
Copy link
Contributor

Moved milestone to Go1.7. In Go 1.6, sendfile will just be disabled for Solaris and it will fall back to the path that works.

@bradfitz
Copy link
Contributor

Any news here?

@4ad
Copy link
Member

4ad commented Mar 31, 2016

+cc @binarycrusader

@binarycrusader
Copy link
Contributor

I'll look into this.

@binarycrusader
Copy link
Contributor

Have reproduced on Solaris development build using decompressed Mark.Twain-Tom.Sawyer.txt.bz2 from src/compress/bzip2/testdata.

@binarycrusader
Copy link
Contributor

It consistently fails after the first 262,144 characters (256KiB), where it starts to the repeat the input all over again.

@binarycrusader
Copy link
Contributor

So the root cause of this issue is that the implementation of sendfile() for Solaris is not correct. In particular, for sizes beyond the size of the socket buffer, the same data will be sent (data_len / socket_buffer_size) + (data_len % socket_buffer_size) times.

This is only vaguely referenced in the Solaris sendfile(3c) man page. The FreeBSD sendfile(2) man page however makes this quite clear:

When using a socket marked for non-blocking I/O, sendfile() may send
fewer bytes than requested.  In this case, the number of bytes success-
fully written is returned in *sbytes (if specified), and the error    EAGAIN
is returned.

As a result, I believe the implementations for the following platforms may not actually be correct:

  • dragonfly
  • freebsd
  • solaris

I am currently testing to determine whether any other platforms need a similar fix.

@binarycrusader
Copy link
Contributor

So my new test works on Solaris, but did not fail as expected on DragonFly/FreeBSD, despite the fact that they clearly do not account for the EAGAIN scenario. I would welcome any insight others might have to as to why it did not fail. The only guess I have is that the file is not big enough to trigger the failure on FreeBSD.

@gopherbot
Copy link

CL https://golang.org/cl/21769 mentions this issue.

@golang golang locked and limited conversation to collaborators Apr 13, 2017
@rsc rsc unassigned 4ad Jun 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants