Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

testing: why AllocsPerRun tests fails when GOMAXPROCS>1 #5000

Closed
minux opened this issue Mar 7, 2013 · 19 comments
Closed

testing: why AllocsPerRun tests fails when GOMAXPROCS>1 #5000

minux opened this issue Mar 7, 2013 · 19 comments

Comments

@minux
Copy link
Member

minux commented Mar 7, 2013

revision cb98d0bcf923 skips allocation tests if GOMAXPROCS > 1, however,
AllocsPerRun already set GOMAXPROCS to 1 before starting tests, so the
extra allocations shouldn't be from other goroutines (or the scheduler is
not obeying GOMAXPROCS).

Dig deeper. What does the extra allocations come from?
I expect it's a bug somewhere in the runtime because the allocation test failures
don't come up until recently so this perhaps is a regression.

See also comments in the original issue report: issue #4974.
@dvyukov
Copy link
Member

dvyukov commented Mar 7, 2013

Comment 1:

I have only 1 idea:
I've removed flush of per-P mcache's after GC (because there are fewer caches and they
are constantly in use). Perhaps it's somehow increases/decreases number of mallocs in
the beginning/end of the test.

@dvyukov
Copy link
Member

dvyukov commented Mar 8, 2013

Comment 2:

It is flaky with old scheduler as well, here is what I've just observed on revision
15951 from Wed Feb 27 11:59:36 2013 -0800:
$ ./fmt.test -test.v -test.run=Mallocs -test.cpu=1,2,3,4,5,6,7,8,9,10
=== RUN TestCountMallocs
--- PASS: TestCountMallocs (0.00 seconds)
=== RUN TestCountMallocs-2
--- PASS: TestCountMallocs-2 (0.00 seconds)
=== RUN TestCountMallocs-3
--- PASS: TestCountMallocs-3 (0.00 seconds)
=== RUN TestCountMallocs-4
--- PASS: TestCountMallocs-4 (0.00 seconds)
=== RUN TestCountMallocs-5
--- PASS: TestCountMallocs-5 (0.00 seconds)
=== RUN TestCountMallocs-6
--- PASS: TestCountMallocs-6 (0.00 seconds)
=== RUN TestCountMallocs-7
--- FAIL: TestCountMallocs-7 (0.00 seconds)
    fmt_test.go:607: Fprintf(buf, "%s"): got 1.19 allocs, want <=1
=== RUN TestCountMallocs-8
--- PASS: TestCountMallocs-8 (0.00 seconds)
=== RUN TestCountMallocs-9
--- PASS: TestCountMallocs-9 (0.00 seconds)
=== RUN TestCountMallocs-10
--- PASS: TestCountMallocs-10 (0.00 seconds)
FAIL

@dvyukov
Copy link
Member

dvyukov commented Mar 8, 2013

Comment 3:

It happens during GC, the rate and value of failures directly correlate to GOGC, with
GOGC=off it does not happen.

@dvyukov
Copy link
Member

dvyukov commented Mar 8, 2013

Comment 4:

Here it is:
  #0: 0x1c340 runtime.mallocgc src/pkg/runtime/zmalloc_darwin_amd64.c:36
  #1: 0x1d56b runtime.settype_flush src/pkg/runtime/zmalloc_darwin_amd64.c:449
  #2: 0xe0ee gc src/pkg/runtime/mgc0.c:1822
  #3: 0xe01e runtime.gc src/pkg/runtime/mgc0.c:1790
  #4: 0x1c51c runtime.mallocgc src/pkg/runtime/zmalloc_darwin_amd64.c:116
  #5: 0x1f34b gostringsize src/pkg/runtime/zstring_darwin_amd64.c:47
  #6: 0x1f95f runtime.slicebytetostring src/pkg/runtime/zstring_darwin_amd64.c:300
  #7: 0x7c390 fmt.Sprintf src/pkg/fmt/print.go:229
  #8: 0x3fa0f fmt_test.func·006 src/pkg/fmt/fmt_test.go:597
  #9: 0x2a110 testing.AllocsPerRun src/pkg/testing/allocs.go:32
  #10: 0x323ca fmt_test.TestCountMallocs src/pkg/fmt/fmt_test.go:609
  #11: 0x2d02a testing.tRunner src/pkg/testing/testing.go:347
  #12: 0x14b00 runtime.goexit src/pkg/runtime/proc.c:1217

@gopherbot
Copy link

Comment 6:

This issue is simple to solve if we omit efficiency considerations. The fix would be to
pass "true" as argument to settype_flush() in mgc0.c. However, I don't recommend this
fix because it wastes memory and increases the number of system calls.
The efficient solution appears to be a new memory allocator for internal use by Go's
runtime with support for explicit deallocation only (like malloc+free in C, although
simpler code). The allocated memory would be completely separate from garbage-collected
memory.  Work on this allocator cannot start before Go1.1 is released.

@dvyukov
Copy link
Member

dvyukov commented Mar 8, 2013

Comment 7:

>This issue is simple to solve if we omit efficiency considerations. The fix would be to
pass "true" as argument to settype_flush() in mgc0.c.
Yeah it does not look like a good solution.
I have a very dirty "solution":
https://golang.org/cl/7625044/diff/5001/src/pkg/testing/allocs.go
It fixes ./fmt.test -test.v -test.run=Mallocs
-test.cpu=1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
on my machine.
I can think of other hacky solutions:
1. Provide testing.runtime_allocTestStart/End() which will somehow offset the stats (use
system memory in settype_flush(), explicitly offset stats, etc).
2. Just explicitly offset mem stats in settype_flush, i.e. pretend that it does not
allocate.

@ianlancetaylor
Copy link
Contributor

Comment 8:

What if we add a field to MemStats that counts the number of allocations for internal
runtime reasons.  Then we let testing.AllocsPerRun ignore that, perhaps with a
parameter.  AllocsPerRun was not in Go 1 so we can still change it.
It's ugly but it could work reasonably well if we are careful about what consider an
internal runtime reason--e.g., we would have to be sure that it doesn't correlate with
something the program is doing.

@dvyukov
Copy link
Member

dvyukov commented Mar 8, 2013

Comment 9:

I have some ideas on how to eliminate settype_flush() entirely (as part of making
malloc/GC faster), so perhaps it easier/simple to "undo" memstats in settype_flush()
after mallocgc() pretending that it does not allocate. (but we must be careful to redo
them again when the memory is freed, to not cause persistent slant)

@gopherbot
Copy link

Comment 10:

1. By eliminating settype_flush() entirely do you mean to set the typeinfo in function
mallocgc()?
2. I do not see how to implement the redo operation when the memory is freed.

@dvyukov
Copy link
Member

dvyukov commented Mar 8, 2013

Comment 11:

> 1. By eliminating settype_flush() entirely do you mean to set the typeinfo
> in function mallocgc()?
Yes, but I don't have the exact plan.
> 2. I do not see how to implement the redo operation when the memory is
> freed.
Won't settype_sysfree() do?
I mean we must know when the span is freed/reused.

@gopherbot
Copy link

Comment 12:

settype_sysfree() will do. There is an implicit deallocation in case MTypes_Bytes in
settype_flush().

@rsc
Copy link
Contributor

rsc commented Mar 12, 2013

Comment 13:

Labels changed: added go1.1maybe, removed go1.1.

@alberts
Copy link
Contributor

alberts commented Mar 13, 2013

Comment 14:

hit some more flakiness in the http tests that is probably this issue.
https://golang.org/issue/5005?c=15

@robpike
Copy link
Contributor

robpike commented May 18, 2013

Comment 15:

Labels changed: added go1.2maybe, removed go1.1maybe.

@dvyukov
Copy link
Member

dvyukov commented May 28, 2013

Comment 16:

Another flake:
http://build.golang.org/log/72e239cad5bccb88912758d623659d9231c23c4b
I think we need to offset malloc stats in settype_flush() (pretending that it does not
allocate) and revert it back in settype_sysfree() for now.

@robpike
Copy link
Contributor

robpike commented May 28, 2013

Comment 17:

My input, copied from the mailing list:
AllocsPerRun is inherently flaky. There could be background stuff
going on we'll never control. Perhaps we should instead write our
tests not to be so sensitive (and build-breaking). I suggest all
AllocsPerRun tests should be skipped if -test.short is true, so they
are evaluated by the test owners but not in every build configuration
in the world.

@alberts
Copy link
Contributor

alberts commented May 28, 2013

Comment 18:

The problem comes in with automated builds that regularly run all the tests. Flakiness
causes noise that constantly needs to be triaged. Should automated builds not run the
long tests? Is there some third class of tests? -test.flaky=true?
In my simplistic world view, tests should either pass with any and all values of
GOMAXPROCS, GOGC, GOGCTRACE, test.cpu, etc., or control for these variables, or get
fixed or get deleted.

@robpike
Copy link
Contributor

robpike commented Aug 21, 2013

Comment 19:

This issue was updated by revision f578726.

Should reduce the flakiness a little. Malloc counting is important
to general testing but not to the build dashboard, which uses -short.
R=golang-dev, dsymonds
CC=golang-dev
https://golang.org/cl/12866047

@rsc
Copy link
Contributor

rsc commented Sep 9, 2013

Comment 20:

I think we've done enough here.

Labels changed: removed go1.2maybe.

Status changed to Fixed.

@golang golang locked and limited conversation to collaborators Jun 24, 2016
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants