Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: fatal error: sweep increased allocation count #15762

Closed
mikioh opened this issue May 20, 2016 · 10 comments
Closed

runtime: fatal error: sweep increased allocation count #15762

mikioh opened this issue May 20, 2016 · 10 comments
Milestone

Comments

@mikioh
Copy link
Contributor

mikioh commented May 20, 2016

See https://build.golang.org/log/a4d48c3831004aae0a064e53c67558d84864cd00

ok      golang.org/x/net/lex/httplex    0.032s
runtime: nelems=64 nfree=58 nalloc=6 previous allocCount=5 nfreed=65535
fatal error: sweep increased allocation count

runtime stack:
runtime.throw(0x68c14b, 0x20)
    /tmp/workdir/go/src/runtime/panic.go:566 +0x8b
runtime.(*mspan).sweep(0x7fe68201ff40, 0x300000000, 0x201)
    /tmp/workdir/go/src/runtime/mgcsweep.go:287 +0x78a
runtime.sweepone(0x42f84c)
    /tmp/workdir/go/src/runtime/mgcsweep.go:112 +0xf4
runtime.gosweepone.func1()
    /tmp/workdir/go/src/runtime/mgcsweep.go:124 +0x21
runtime.systemstack(0xc42002e000)
    /tmp/workdir/go/src/runtime/asm_amd64.s:298 +0x79
runtime.mstart()
    /tmp/workdir/go/src/runtime/proc.go:1087

goroutine 3 [running]:
runtime.systemstack_switch()
    /tmp/workdir/go/src/runtime/asm_amd64.s:252 fp=0xc420046f58 sp=0xc420046f50
runtime.gosweepone(0x0)
    /tmp/workdir/go/src/runtime/mgcsweep.go:125 +0x40 fp=0xc420046f80 sp=0xc420046f58
runtime.bgsweep(0xc42006e000)
    /tmp/workdir/go/src/runtime/mgcsweep.go:66 +0xb1 fp=0xc420046fb8 sp=0xc420046f80
runtime.goexit()
    /tmp/workdir/go/src/runtime/asm_amd64.s:2059 +0x1 fp=0xc420046fc0 sp=0xc420046fb8
created by runtime.gcenable
    /tmp/workdir/go/src/runtime/mgc.go:195 +0x53
(snip)
goroutine 1127 [runnable]:
net.(*netFD).connect.func2(0x794e20, 0xc420760900, 0xc4204185b0, 0xc4206741e0)
    /tmp/workdir/go/src/net/fd_unix.go:109
created by net.(*netFD).connect
    /tmp/workdir/go/src/net/fd_unix.go:118 +0x1c0
FAIL    golang.org/x/net/netutil    0.272s
@mikioh mikioh added this to the Go1.7 milestone May 20, 2016
@mikioh
Copy link
Contributor Author

mikioh commented May 20, 2016

/cc @aclements

@mikioh
Copy link
Contributor Author

mikioh commented May 20, 2016

Looks like it's not related to race detector. Sorry for the noise, Dmitry.

@mikioh
Copy link
Contributor Author

mikioh commented May 20, 2016

@aclements
Copy link
Member

/cc @RLH

@mikioh
Copy link
Contributor Author

mikioh commented May 21, 2016

Another appearance: https://build.golang.org/log/da60bdb4e8f5429bba33b4a6407b514ae8b31917

ok      golang.org/x/net/lex/httplex    0.003s
panic: runtime error: makeslice: cap out of range

goroutine 906 [running]:
panic(0x643be0, 0xc4209188d0)
    /tmp/workdir/go/src/runtime/panic.go:500 +0x18c
net/http.Header.sortedKeyValues(0xc4209f2060, 0xc42009aa50, 0xc4206ae300, 0xc4206b6c80, 0x53b301, 0x68b10b)
    /tmp/workdir/go/src/net/http/header.go:130 +0x9f
net/http.Header.WriteSubset(0xc4209f2060, 0x79b820, 0xc4206ae300, 0xc42009aa50, 0x0, 0x1)
    /tmp/workdir/go/src/net/http/header.go:150 +0xd9
net/http.(*Request).write(0xc4209eb950, 0x79b820, 0xc4206ae300, 0x0, 0xc4208713e0, 0x0, 0x0, 0x0)
    /tmp/workdir/go/src/net/http/request.go:526 +0x616
net/http.(*persistConn).writeLoop(0xc420090a00)
    /tmp/workdir/go/src/net/http/transport.go:1581 +0x18e
created by net/http.(*Transport).dialConn
    /tmp/workdir/go/src/net/http/transport.go:1002 +0x4c4
FAIL    golang.org/x/net/netutil    0.534s

@aclements
Copy link
Member

That last failure isn't obviously the same, though the first failure could indicate we're freeing objects we shouldn't be (if things go wrong, but not wrong enough for that check to catch), which could conceivably lead to just about anything.

@aclements aclements self-assigned this May 23, 2016
@aclements
Copy link
Member

I can reproduce this on the linux-amd64-nocgo builder in a few minutes with the following:

cd austin/15762
export GOPATH=$PWD
go get -d golang.org/x/net/...

VM=$(gomote create linux-amd64-nocgo) && \
gomote push $VM && \
gomote run $VM /bin/sh -c 'cd go/src; ./make.bash' && \
gomote run $VM /bin/sh -c 'mkdir -p gopath/src/golang.org/x' && \
tar cz -C src/golang.org/x . | gomote puttar -dir gopath/src/golang.org/x $VM - && \
gomote run $VM /bin/sh -c 'PATH=$PWD/go/bin:$PATH; export GOPATH=$PWD/gopath; while go test -short golang.org/x/net/...; do true; done'

Simply running the httplex test in a loop doesn't seem to do it. Also no luck yet reproducing on my linux/amd64 laptop (even with cgo disabled).

@aclements
Copy link
Member

This is happening because we're greying an unallocated object, so sweep sees an object that was marked by the last GC cycle but not before that and detects that the "allocated object" count has increased.

The following is with a tweak to set the alloc bit on allocation and check it in greyobject:

runtime: greying unallocated object 0xc420838180 (3) in span 0x7f4a33e2ed60
base=0xc4205d34a0 k=0x62102e9 s.base()=0xc4205d2000 s.limit=0xc4205d3fe0 s.sizeclass=16 s.elemsize=240
 *(base+0) = 0x68520b
 *(base+8) = 0x3
 *(base+16) = 0xc420838180 <==
 *(base+24) = 0x686127
 *(base+32) = 0x8
 *(base+40) = 0x1
 *(base+48) = 0x1
 *(base+56) = 0xc420844060
 *(base+64) = 0x0
 *(base+72) = 0x0
 *(base+80) = 0x0
 *(base+88) = 0x0
 *(base+96) = 0x0
 *(base+104) = 0x0
 *(base+112) = 0x0
 *(base+120) = 0xc420834087
 *(base+128) = 0xf
 *(base+136) = 0x0
 *(base+144) = 0x0
 *(base+152) = 0x0
 *(base+160) = 0x0
 *(base+168) = 0x0
 *(base+176) = 0x0
 *(base+184) = 0x0
 *(base+192) = 0x0
 *(base+200) = 0x0
 *(base+208) = 0xc4206acfc0
 *(base+216) = 0x0
 *(base+224) = 0x0
 *(base+232) = 0x0
obj=0xc420838180 k=0x621041c s.base()=0xc420838000 s.limit=0xc42083a000 s.sizeclass=9 s.elemsize=128
 *(obj+0) = 0xc420834080
 *(obj+8) = 0x4
 *(obj+16) = 0x0
 *(obj+24) = 0x0
 *(obj+32) = 0x0
 *(obj+40) = 0xc420834087
 *(obj+48) = 0xf
 *(obj+56) = 0x0
 *(obj+64) = 0x0
 *(obj+72) = 0x0
 *(obj+80) = 0x0
 *(obj+88) = 0x0
 *(obj+96) = 0x0
 *(obj+104) = 0x0
 *(obj+112) = 0x0
 *(obj+120) = 0x0
runtime: nelems=64 nfree=51 nalloc=13 previous allocCount=12 nfreed=65535
span=0x7f4a33e2ed60 base=0xc420838000 elemsize=128
0xff 0x1f 0x0 0x0 0x0 0x0 0x0 0x0 
0xf7 0x1f 0x0 0x0 0x0 0x0 0x0 0x0 
fatal error: sweep increased allocation count

It typically occurs with elemsize=128, but not always. I think when it doesn't occur with elemsize=128, it's occurring because the "base" object shown above is itself not allocated and we're finding that.

@aclements
Copy link
Member

This failure happens even in stop-the-world mode, which indicates that there's some root we're not finding on one GC cycle, but we then find the pointer on the next GC cycle (either because we do find the root then, or because the object has been made reachable some other way). In particular, in STW mode, it looks like the object we're missing is being found on the stack in the second cycle.

One possible explanation could be that the liveness maps are wrong. If this is the case, this is probably the same bug as #15747 and may have been caused by 3572c64.

@aclements
Copy link
Member

This failure does not occur at 3572c64^, but does at 3572c64, and the hacky (and somewhat incorrect) workaround for #15747 works for this test case, too, so I'm declaring this a duplicate of #15747. We'll double check this test case once we have a fix for #15747 (the test case over there is apparently easier to reproduce).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants