New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/compile: go statement desugaring results in additional allocations #46349
Comments
Thanks for the report. I believe this is due to a known change in Go 1.17 related to CC @thanm @cherrymui |
Not marking this a release blocker until we have more information. Also because to some degree this is expected, and for the most part we've found the benefits to outweigh the costs overall by a significant margin. |
I'm seeing a huge increase of allocations in all multiple parts of the codebase that are called from within a loop an unbounded number of times. In some realistic non-worth-case benchmarks I'm seeing 1000% allocation increase. The codebase that I'm benchmarking parses 3D software-generated files, such as point clouds or 3D meshes, which can have millions of 3D points. Until now the parsing algorithm had On the other hand, I'm also seeing a 12% speed performance gain in most of the benchmarks, which is much appreciated when processing big files. Still, I'm not sure that this speed gain is realistic as there are more allocations so I expect the GC to eat part of this improvement. |
@qmuntal I think we need more details as to the nature of these heap allocations. Do they all involve a You can use the |
@mknyszek Not all the offending lines use val, err := strconv.ParseUint(string(a.Value), 10, 24) // a.Value is a byte slice if d.OnStart != nil {
// t is a complex struct holding a byte slice reference
d.OnStart(t) // <- Ofending!
} wn := runtime.GOMAXPROCS(0)
var wg sync.WaitGroup
wg.Add(wn)
for wi := 0; wi < wn; wi++ {
go func(wi int) { // <- Ofending!
// ...
}
} // This comes from https://tip.golang.org/src/mime/mediatype.go?s=3622:3707#152
params = make(map[string]string) I will do the inlining and scape analysis tomorrow morning (CET), I'm running out of time now. Thanks for jumping into this. |
Yes, with register ABI there are more go'd closures needs allocation now. But I'm not sure whether it is really a problem for performance. We already needs to allocate the new goroutine's stack (which is not counted as memory allocation in profile, but it's there). Now we need to do an extra allocation for that closure (which is typically a much smaller allocation). As you posted, the overall performance is improved. We could do something clever, e.g. allocating the closure on the new goroutine's stack. But there is a complexity for that. I'm not sure it's worth doing. We could consider that if it is a real problem. |
The escape analysis isn't really relevant here. For go statement, it runs asynchronously and so in general we always need to allocate that closure. Except the case that it is not a real closure, i.e. it doesn't capture anything. With register ABI, there are more closures become "real closures", in that we need to capture arguments as well. |
Right, it's not really relevant to the desugaring, I looped in escape analysis people because reacting to @qmuntal claim that lots of other non-go, non-defer sites now end up allocating. But, it's still unclear to me where the new allocations are coming from. |
My codebase is memory intensive and I've always tried to reduce the allocations in order to get performance gains indirectly. In this case, I haven't seen any performance regression but just much more allocations everywhere, so If you say this is the expected behavior when migrating to I let you decide what to do with this issue, I can either close it or provide you more data to better understand where to new allocations are coming from. |
In that case, I'll at least move this issue out of the Go 1.17 milestone. |
@cherrymui would it make sense to also hide the allocations for that closure from the profile? I tend to hunt down allocations for the sake of performance and from For example, in my codebase the following line allocates in
In |
I think we probably don't want to do that. Under the hood it is just a regular Go allocation. And In general, while memory profile counts all allocations, not all of them have the same impact to performance (CPU and memory usage).
At a quick look it is not obvious to me that this is register ABI related. (The only thing expected due to register ABI is for the go statement (and perhaps defer statement inside a loop, which is rare).) Maybe there is a real issue here. I could take a careful look. |
Could you explain more about this? Could you show the output of -gcflags=-m on this line, for both Go 1.16 and tip? For a simple example I run, there is an allocation for the []byte to string conversion, and it is the same with Go 1.16 and tip. The underlying a.Value does not necessarily escape, also same with Go 1.16 and tip. |
I also saw a change in allocations (some 10 extra) in my benchmarks stemming from
235,281c235,247
< <autogenerated>:1: parameter .this leaks to {heap} with derefs=1:
< <autogenerated>:1: flow: {heap} = *.this:
< <autogenerated>:1: from .this.file (dot of pointer) at <autogenerated>:1
< <autogenerated>:1: from .this.file.close() (call parameter) at <autogenerated>:1
< <autogenerated>:1: leaking param content: .this
< <autogenerated>:1: .this does not escape
< <autogenerated>:1: parameter .this leaks to {heap} with derefs=1:
< <autogenerated>:1: flow: {heap} = *.this:
< <autogenerated>:1: from .this.Common (dot of pointer) at <autogenerated>:1
< <autogenerated>:1: from .this.Common (address-of) at <autogenerated>:1
< <autogenerated>:1: from .this.Common.RunDefaultServeMux() (call parameter) at <autogenerated>:1
< <autogenerated>:1: leaking param content: .this
< <autogenerated>:1: .this does not escape
< <autogenerated>:1: parameter .this leaks to {heap} with derefs=1:
< <autogenerated>:1: flow: {heap} = *.this:
< <autogenerated>:1: from .this.Common (dot of pointer) at <autogenerated>:1
< <autogenerated>:1: from .this.Common (address-of) at <autogenerated>:1
< <autogenerated>:1: from .this.Common.RunDefaultServeMux() (call parameter) at <autogenerated>:1
< <autogenerated>:1: leaking param content: .this
< <autogenerated>:1: .this does not escape
< <autogenerated>:1: parameter .this leaks to {heap} with derefs=1:
< <autogenerated>:1: flow: {heap} = *.this:
< <autogenerated>:1: from .this.Common (dot of pointer) at <autogenerated>:1
< <autogenerated>:1: from .this.Common (address-of) at <autogenerated>:1
< <autogenerated>:1: from .this.Common.RunDefaultServeMux() (call parameter) at <autogenerated>:1
< <autogenerated>:1: leaking param content: .this
< <autogenerated>:1: .this does not escape
< <autogenerated>:1: parameter .this leaks to {heap} with derefs=1:
< <autogenerated>:1: flow: {heap} = *.this:
< <autogenerated>:1: from .this.Common (dot of pointer) at <autogenerated>:1
< <autogenerated>:1: from .this.Common (address-of) at <autogenerated>:1
< <autogenerated>:1: from .this.Common.RunDefaultServeMux() (call parameter) at <autogenerated>:1
< <autogenerated>:1: leaking param content: .this
< <autogenerated>:1: .this does not escape
< <autogenerated>:1: parameter .this leaks to {heap} with derefs=1:
< <autogenerated>:1: flow: {heap} = *.this:
< <autogenerated>:1: from .this.Common (dot of pointer) at <autogenerated>:1
< <autogenerated>:1: from .this.Common (address-of) at <autogenerated>:1
< <autogenerated>:1: from .this.Common.RunDefaultServeMux() (call parameter) at <autogenerated>:1
< <autogenerated>:1: leaking param content: .this
< <autogenerated>:1: .this does not escape
< <autogenerated>:1: parameter .this leaks to {heap} with derefs=1:
< <autogenerated>:1: flow: {heap} = *.this:
< <autogenerated>:1: from .this.Common (dot of pointer) at <autogenerated>:1
< <autogenerated>:1: from .this.Common (address-of) at <autogenerated>:1
< <autogenerated>:1: from .this.Common.RunDefaultServeMux() (call parameter) at <autogenerated>:1
< <autogenerated>:1: leaking param content: .this
---
> <autogenerated>:1: .this does not escape
> <autogenerated>:1: .this does not escape
> <autogenerated>:1: .this does not escape
> <autogenerated>:1: .this does not escape
> <autogenerated>:1: .this does not escape
> <autogenerated>:1: .this does not escape
> <autogenerated>:1: .this does not escape
> <autogenerated>:1: .this does not escape
> <autogenerated>:1: .this does not escape
> <autogenerated>:1: .this does not escape
> <autogenerated>:1: .this does not escape
> <autogenerated>:1: .this does not escape
> <autogenerated>:1: .this does not escape But, maybe there's something else I can inspect. If it's of interest and I'm not sidetracking things, I'm happy to run whatever is needed. I know I was a little surprised to see more allocations when the regabi was enabled. |
I was wrong when saying that line did not allocate in go 1.16 but did allocate in go 1.17. In both versions this line is allocating and This is the output of -gcflags=-m for both versions: I found another strange thing that could be related to this issue. When comparing This is an svg dump from the |
@zikaeroh Thanks for the information. If I see it correctly, the difference in -m=2 output above should not make a difference in allocation, at least by itself. Maybe the extra allocations you saw came from go'd (or deferred in a loop) closures, which is expected. |
On that line? Could you post the assembly output for that line, using -gcflags=-S, for 1.16 and tip? And maybe provide more context about that line of code?
I'd be surprised for this being register ABI related. Are your comparing 1.16 vs. tip, or register ABI off ( |
I'm comparing register ABI off ( |
Just this line of code is in a public repo, so you can check it yourself: https://github.com/HPInc/go3mf/blob/master/decoder.go#L462 These are the assembly dumps: Go tip decoder.go:462) MOVQ "".a+248(SP), R8
decoder.go:462) MOVQ "".a+256(SP), CX
decoder.go:462) XORL AX, AX
decoder.go:462) MOVQ R8, BX
decoder.go:462) PCDATA $1, $4
decoder.go:462) CALL runtime.slicebytetostring(SB)
decoder.go:462) MOVL $10, CX
decoder.go:462) MOVL $24, DI
decoder.go:462) CALL strconv.ParseUint(SB) Go 1.16 decoder.go:462) MOVQ "".a+256(SP), AX
decoder.go:462) MOVQ "".a+264(SP), DI
decoder.go:462) MOVQ $0, (SP)
decoder.go:462) MOVQ AX, 8(SP)
decoder.go:462) MOVQ DI, 16(SP)
decoder.go:462) PCDATA $1, $1
decoder.go:462) CALL runtime.slicebytetostring(SB)
decoder.go:462) MOVQ 24(SP), AX
decoder.go:462) MOVQ 32(SP), CX
decoder.go:462) MOVQ AX, (SP)
decoder.go:462) MOVQ CX, 8(SP)
decoder.go:462) MOVQ $10, 16(SP)
decoder.go:462) MOVQ $24, 24(SP)
decoder.go:462) CALL strconv.ParseUint(SB)
decoder.go:462) MOVQ 32(SP), AX Edited: Headers swapped. |
That second excerpt (labeled "Go tip") does not look like Go tip to me-- it looks like Go tip plus GOEXPERIMENT=noregabi. All of the values being passed to subroutines are on the stack. |
You are partially right, I swapped the headers... I will edit the post to fix it. Thanks! |
@qmuntal thanks for posting the assembly. From allocation's perspective they look the same. It is unclear to me where the extra allocation comes from. |
@gopherbot remove WaitingForInfo |
Timed out in state WaitingForInfo. Closing. (I am just a bot, though. Please speak up if this is a mistake or you have the requested information.) |
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
gotip test -run AAAA -bench BenchmarkAlloc -benchmem
What did you expect to see?
Same result as with
go1.16.3
andgotip
+GOEXPERIMENT=noregabi
:What did you see instead?
Alloc count grow linearly with the size of the loop:
Notice that the number of allocations is directly proportional to the number of iterations inside the loop.
Context
I've found this regression while trying
gotip
in a private codebase. The provided benchmark is just a synthetic and minimal example I've found that reproduces what I'm seeing in the real code.This is a memory profile snap for this benchmark, generated using
go tool pprof -alloc_objects mem.out
:The text was updated successfully, but these errors were encountered: