Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/sys/cpu: add type CacheLinePad #25203

Closed
AlexRouSg opened this issue May 1, 2018 · 36 comments
Closed

x/sys/cpu: add type CacheLinePad #25203

AlexRouSg opened this issue May 1, 2018 · 36 comments

Comments

@AlexRouSg
Copy link
Contributor

#24843 added a new package for CPU feature detection, when looking through the package I noticed const cacheLineSize and thought that could be useful if it were exported.

My use case is the same as the package, adding padding to structs to prevent false sharing.

@gopherbot gopherbot added this to the Proposal milestone May 1, 2018
@tklauser
Copy link
Member

tklauser commented May 2, 2018

/cc @bradfitz @ianlancetaylor

I think it'd be fine to export cacheLineSize in x/sys/cpu.

@martisch
Copy link
Contributor

martisch commented May 2, 2018

If these are exported i suggest to add a warning in the documentation that these are best effort guesses based on GOARCH and might not be the cpus actual cache line size.

They shouldn't be used anywhere where their correctness matters for the programs correctness to avoid issues like:
http://www.mono-project.com/news/2016/09/12/arm64-icache/

@mundaym
Copy link
Member

mundaym commented May 2, 2018

C++17 introduced std::hardware_{constructive,destructive}_interference_size, it might be worth considering a similar approach if we are exposing the API:

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0154r1.html

@ianlancetaylor
Copy link
Contributor

Like @martisch, I think it's problematic to export cacheLineSize as a constant. It's not the same on different processors with the same GOARCH value.

I would be OK with exporting it as a variable if we can arrange to set the variable correctly.

@AlexRouSg
Copy link
Contributor Author

AlexRouSg commented May 2, 2018

@ianlancetaylor How about instead of a const CacheLineSize there's a type Pad [cacheLineSize]byte so it's clear the intended usage is only to add padding to structs.

@ianlancetaylor
Copy link
Contributor

An array type doesn't sound very flexible, as it doesn't permit adjusting the padding to be only what is required to avoid cache contention.

A constant like MaxCacheLineSize would be OK with me.

@AlexRouSg
Copy link
Contributor Author

Regarding the var CacheLineSize, Intel's cpuid package has this implemented for x86 so maybe this would be useful as a reference.

https://github.com/intel-go/cpuid

@gopherbot
Copy link

Change https://golang.org/cl/111775 mentions this issue: cpu: export cache line size

@mundaym
Copy link
Member

mundaym commented May 7, 2018

I'm uncomfortable with the name MaxCacheLineSize. It could, for example, refer to the L2 cache line size if that is larger than the L1 cache line size.

Since the only use case identified so far is to avoid false sharing I think we should use a name like DestructiveIntereferenceSize, FalseSharingSize, SharedInterferenceSize or similar. This would also allow us to increase it if there are architectural quirks, such as cache line prefetching, that make it desirable to pad (and maybe align in Go 2) data structures to more than the cache line size.

@rsc
Copy link
Contributor

rsc commented May 7, 2018

Calling it CacheLineSize or MaxCacheLineSize does seem to promise more than we need and maybe more than we can deliver.

Would it be enough to introduce:

type CacheLinePad [64]byte

appropriately sized for each system, and leave it at that? At least then it's clear what it's for, and people would be discouraged from using it for other things.

Still doesn't seem particularly clean but maybe this is just not something we can do cleanly.

@tklauser
Copy link
Member

tklauser commented May 8, 2018

In #25203 (comment) @ianlancetaylor mentioned that an array type might be a bit too inflexible. I also see @mundaym's and @rsc's point about the name being ambiguous and/or promising something we can't deliver.

I think renaming the const to either DestructiveIntereferenceSize, FalseSharingSize, SharedInterferenceSize might make less of a promise while still allowing a certain kind of flexibility. What do you think?

@AlexRouSg
Copy link
Contributor Author

Since I made the proposal thought I'd give my opinions.

I don't really care what you guys call it, but I do see the benefit of a const over a type which allows fine tuning the padding e.g.

type foo struct {
    _ [cpu.CacheLineSize]byte
    x, y, z int
    _ [cpu.CacheLineSize - 24]byte
}

I'm not too sure what's the use case of a var but since @ianlancetaylor brought it up maybe he knows of one?

@ianlancetaylor
Copy link
Contributor

A const is fine with me as long as it's clear that it represents the maximum cache line size of a particular architecture. And, of course, it may change over time. I was thinking of a var because within a single architecture different processors have different cache line sizes. But since the main use seems to be for struct padding, a var is not helpful.

An array type is more awkward than a const in some cases but is perhaps more clear for the intended use. The array type can still be used in the same way as the const by calling len.

@AlexRouSg
Copy link
Contributor Author

I feel like there could be some benefit with naming the const similar to C++17 (less confusion over what it represents), tho that being said DestructiveIntereferenceSize is way to long.

As far as I can tell, DestructiveIntereferenceSize and ConstructiveIntereferenceSize are usually the same value.

So would everyone be ok with just IntereferenceSize?

Unless there is a situation where DestructiveIntereferenceSize would be different from ConstructiveIntereferenceSize then we should probably have both of them?

@mundaym
Copy link
Member

mundaym commented May 11, 2018

Unless there is a situation where DestructiveIntereferenceSize would be different from ConstructiveIntereferenceSize then we should probably have both of them?

To avoid destructive interference we want to make sure that the data is in two separate cache lines. To ensure constructive interference we want data to be in the same cache line. That means that for DestructiveInterferenceSize we would want the maximum cache line for the architecture and for ConstructiveInterferenceSize we would want the minimum cache line size for the architecture. So yes, I think they can be different.

That said I'm not sure how you could safely/portably use ConstructiveInterferenceSize in a Go program, but maybe someone else can think of something.

@AlexRouSg
Copy link
Contributor Author

That means that for DestructiveInterferenceSize we would want the maximum cache line for the architecture and for ConstructiveInterferenceSize we would want the minimum cache line size for the architecture. So yes, I think they can be different.

Ahh, now that makes more sense.

In the sample code from the site you linked, ConstructiveInterferenceSize usage looks like
static_assert(sizeof(together) <= hardware_constructive_interference_size);

So I was thinking maybe just put this check into a test?

@as
Copy link
Contributor

as commented May 12, 2018

What is the min and max cache line size?

Check ConstructiveInterferenceSize and DestructiveInterferenceSize.

Why is it called that?

To prevent confusion.

There is very little chance that this will prevent confusion outside of a specific set of users who are already familiar with this hideous name. It doesnt even make too much sense in an engineering context unless you hunt down the specific proposal in the c++ realm and extrapolate its meaning from that document. I have never heard this word in the context given here, unlike cachelinesize.

Its also confusing because of the semantics of constructive and destructive -> min, max.

Sorry for leaving the bike without hair, but I think the name is bad.

@AlexRouSg
Copy link
Contributor Author

@as since you down voted every comment with a proposed name, even MaxCacheLineSize
What would you suggest be a good name?

@AlexRouSg
Copy link
Contributor Author

Sorry just noticed you up voted type CacheLinePad [64]byte
In that case, where would the min cache size fit? Or do you not think it's needed?

@as
Copy link
Contributor

as commented May 12, 2018

The c++ proposal:

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0154r1.html

  • Destructive interference size: a number that’s suitable as an offset between two objects to likely avoid false-sharing due to different runtime access patterns from different threads.

  • Constructive interference size: a number that’s suitable as a limit on two objects’ combined memory footprint size and base alignment to likely promote true-sharing between them.

A quote from the author of the proposal who introduced the proposed terms that C++ uses. The author wrote (this was from a stackoverflow question):

I authored the proposal, great answer! To clarify one point you made: "I would almost always expect these values to be the same. I believe the only reason they are declared separately is for completeness.". Yes, they should always be the same unless: 1) the ISA has shipped with different cacheline sizes, and no target arch is given; 2) you're targeting a virtual ISA such as WebAssembly for which the actual ISA is unknown (then you get best-effort). On constexpr: it's required to constexpr for the value to be usable in determining struct layout. Runtime values are useful in other circumstances. – JF Bastien

It appears that the two values should be the same, and were only included for completeness. I don't know what completeness means in this context exactly. A few articles suggest that false sharing on L2 is not possible where as at least one family of benchmarks suggests performance gains occur when data are padded to a multiple of the L1 cache line size (it implies that some prefetching is done to L2 in pairs, so that multiple is 2 and the value used is 128 instead of 64).

The question I have is: should Go be clever like C++, and create effect-based variables/values like Constructive/Destructive or should it stick to naming the values for what they represent? To me there is an additional layer of incomprehensible magic with the effect-based names, the primary reason I would prefer not to use those names (the second is they are too long).

Regarding Min/Max cache line size, I don't know what the right answer is because it is not intuitive what Min/Max means either. Whereas cache line size is a phrase almost always associated with the size of the L1 cache lines / cache blocks. To speculatively use min/max without the possibility of >2 qualifiers seems to make a lot of assumptions that I don't think are clearly stated.

@mundaym
Copy link
Member

mundaym commented May 12, 2018

@as the statement:

It appears that the two values should be the same, and were only included for completeness

Would appear to be contradicted by the authors answer :

Yes, they should always be the same unless: 1) the ISA has shipped with different cacheline sizes, and no target arch is given; 2) you're targeting a virtual ISA such as WebAssembly for which the actual ISA is unknown (then you get best-effort).

So in situations where an ISA has shipped with different cache line sizes (like the arm64 case linked in @martisch's reply) the values may be different. Since gc doesn't expose a way to specify the target arch the constructive and destructive interference sizes will therefore always be different on arm64. The same is presumably true for the new wasm backend.

I am also starting to think that the type CacheLinePad [N]byte suggestion might be best however. It is easy to understand while not making any promises about being the cache line size (so, for example, it could presumably be double the maximum L1 cache size if that is desirable for a given arch).

@martisch
Copy link
Contributor

martisch commented May 13, 2018

Seems using the cache line size as padding to avoid false sharing is not sufficient on newer Intel architectures and on the amd64 arch we should consider 2x cache line size for struct padding since the spatial prefetcher pulls in the adjacent cache line too:

Intel® 64 and IA-32 Architectures Optimization Reference Manual
2.3.5.4  Data Prefetching
Spatial Prefetcher: This prefetcher strives to complete every cache line fetched to the L2 cache with 
the pair line that completes it to a 128-byte aligned chunk

This was discussed and benchmarked in https://groups.google.com/forum/#!topic/mechanical-sympathy/i3-M2uCYTJE.

Also seems a case were Destructive and Constructive inference size can diverge.

Using CacheLInePad[N]byte or another name sounds like the better option to account for the original use case that should be solved. This also avoids pinning the meaning to another programming language/standard when using e.g. ConstructiveInterferenceSize.

@AlexRouSg
Copy link
Contributor Author

AlexRouSg commented May 13, 2018

For the points given above, I too would agree type CacheLInePad [N]byte makes more sense here.

So should we talk about a const or func to test for true sharing since we're already on the topic? Or should that just go to a new proposal?

@bcmills
Copy link
Contributor

bcmills commented May 14, 2018

@martisch

on the amd64 arch we should consider 2x cache line size for struct padding since the spatial prefetcher pulls in the adjacent cache line too

[…]

Also seems a case were Destructive and Constructive inference size can diverge.

How so? The prefetch itself should cause the two cache lines to interfere constructively, but the need to re-mark the cache lines as dirty on write (and the extra bus traffic for the cache invalidations) should cause them to interfere destructively: it seems like the destructive and constructive interference sizes are both two cache lines.

@martisch
Copy link
Contributor

How so? The prefetch itself should cause the two cache lines to interfere constructively, but the need to re-mark the cache lines as dirty on write (and the extra bus traffic for the cache invalidations) should cause them to interfere destructively: it seems like the destructive and constructive interference sizes are both two cache lines.

I was thinking about it in the context of the amd64 platform as a whole as i assumed we likely wont do runtime detection of these sizes for use in padding structs if we use the current const approach. In amd64 we have some cpu doing spatial prefetching and some not. So Constructive seems to be ok at 64bytes (does somebody know a am64 compatible cpu thats used widely with less than 64byte cache lines?) but for some newer amd64 compatible cpus there is some Destructive interference within 128byte (might not be as bad as within the same cache line).

@AlexRouSg
Copy link
Contributor Author

AlexRouSg commented May 14, 2018

Soo I wrote a false/true sharing benchmark using amd64 numbers (64 and 128). Instead of guessing, if anyone has access to those cpus, just run the benchmark and let the numbers speak for themselves.
https://play.golang.org/p/hp_MXQWEJvB (edit: removed useless middle align tests)

I have an old intel so constructive and destructive sizes are the same.

@randall77
Copy link
Contributor

Intel(R) Xeon(R) CPU E5-1650 v4 @ 3.60GHz

BenchmarkFalseSharing/NoPad-12         	    5000	    270183 ns/op
BenchmarkFalseSharing/Pad64Start-12    	   10000	    165659 ns/op
BenchmarkFalseSharing/Pad64StartEnd-12 	   10000	    165744 ns/op
BenchmarkFalseSharing/Pad128Start-12   	   10000	    164528 ns/op
BenchmarkFalseSharing/Pad128StartEnd-12         	   10000	    167642 ns/op
BenchmarkFalseSharing/Pad64StartEndAligned-12   	   10000	    231060 ns/op
BenchmarkFalseSharing/Pad128StartEndAligned-12  	    5000	    241121 ns/op
BenchmarkFalseSharing/Pad64StartEndAlignedMiddle-12         	    5000	    247388 ns/op
BenchmarkFalseSharing/Pad128StartEndAlignedMiddle-12        	   10000	    169730 ns/op
BenchmarkTrueSharing/<64-12                                 	    2000	    602211 ns/op
BenchmarkTrueSharing/>64-12                                 	     500	   2680956 ns/op
BenchmarkTrueSharing/>128-12                                	     500	   3555779 ns/op

I don't see much difference here between 64 and 128 bytes.
Middle has some difference, but alignment is weird for those examples so I'm not sure we can conclude anything from them.

@AlexRouSg
Copy link
Contributor Author

AlexRouSg commented May 21, 2018

If the cache line were 64 bytes then I can understand why middle 128 would be better than 64.

64 - 12 = 52 bytes
So for pad 64 middle the first 64 bytes would contain field x and so it's the same as no padding. I guess I should remove the middle benches.

Weird that on my i7 middle performs the same on 64 vs 128. (cpuid reports cache line of 64 bytes)

BenchmarkFalseSharing/NoPad-4              10000            194714 ns/op
BenchmarkFalseSharing/Pad64Start-4         10000            168577 ns/op
BenchmarkFalseSharing/Pad64StartEnd-4      10000            172899 ns/op
BenchmarkFalseSharing/Pad128Start-4        10000            170881 ns/op
BenchmarkFalseSharing/Pad128StartEnd-4             10000            174534 ns/op
BenchmarkFalseSharing/Pad64StartEndAligned-4       10000            156478 ns/op
BenchmarkFalseSharing/Pad128StartEndAligned-4      10000            155024 ns/op
BenchmarkFalseSharing/Pad64StartEndAlignedMiddle-4                 10000            148957 ns/op
BenchmarkFalseSharing/Pad128StartEndAlignedMiddle-4                10000            151861 ns/op
BenchmarkTrueSharing/<64-4                                          2000            690212 ns/op
BenchmarkTrueSharing/>64-4                                           500           2205934 ns/op
BenchmarkTrueSharing/>128-4                                          500           2675777 ns/op

@rsc
Copy link
Contributor

rsc commented May 21, 2018

OK, does anyone object to CacheLinePad (as in #25203 (comment)), with a size yet to be determined?

@AlexRouSg
Copy link
Contributor Author

@rsc nope, I think pretty much everyone who commented here think that makes the most sense for the padding.

@rsc
Copy link
Contributor

rsc commented Jun 4, 2018

Revising my earlier comment, instead of:

type CacheLinePad [64]byte

We should make this an opaque struct so that the only interesting property of the type is that it has the given size:

type CacheLinePad struct { _ [64]byte }

Then code cannot depend on it being an array somehow. But otherwise, proposal accepted.

@rsc rsc changed the title proposal: x/sys/cpu: export cacheLineSize proposal: x/sys/cpu: add type CacheLinePad Jun 4, 2018
@rsc rsc modified the milestones: Proposal, Unreleased Jun 4, 2018
@rsc rsc changed the title proposal: x/sys/cpu: add type CacheLinePad x/sys/cpu: add type CacheLinePad Jun 4, 2018
@AlexRouSg
Copy link
Contributor Author

Now that I think about it, using unsafe.Sizeof() is prob better/clearer than len() and other advance usage of the padding size prob would result in using unsafe anyway.

So I 100% agree the struct type is the best so far.

@gopherbot
Copy link

Change https://golang.org/cl/116276 mentions this issue: runtime,internal/cpu: replace sys.CacheLineSize for padding by cpu.CacheLinePad

@martisch
Copy link
Contributor

martisch commented Jun 5, 2018

Made above CL to see how CacheLinePad could be used when added to internal/cpu (the package x/sys/cpu was forked from) and applied to the runtime package which already imports internal/cpu. Most uses of runtime/internal/sys.CacheLineSize (except two) are for padding. This would also remove the redundant definition of CacheLineSize per Arch in runtime/internal/sys and internal/cpu.

@tklauser
Copy link
Member

tklauser commented Jun 5, 2018

https://golang.org/cl/111775 (the CL for x/sys/cpu) is now also updated to provide a CacheLinePad type as suggested in #25203 (comment) and as implemented for internal/cpu by @martisch in https://golang.org/cl/116276.

@gopherbot
Copy link

Change https://golang.org/cl/126601 mentions this issue: runtime: replace sys.CacheLineSize for padding by cpu.CacheLinePad

gopherbot pushed a commit that referenced this issue Aug 20, 2018
Add a CacheLinePad struct type to internal/cpu that has a size of CacheLineSize.
This can be used for padding structs in order to avoid false sharing.

Updates #25203

Change-Id: Icb95ae68d3c711f5f8217140811cad1a1d5be79a
Reviewed-on: https://go-review.googlesource.com/116276
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
gopherbot pushed a commit that referenced this issue Aug 24, 2018
…t and vars

sys here is runtime/internal/sys.

Replace uses of sys.CacheLineSize for padding by
cpu.CacheLinePad or cpu.CacheLinePadSize.
Replace other uses of sys.CacheLineSize by cpu.CacheLineSize.
Remove now unused sys.CacheLineSize.

Updates #25203

Change-Id: I1daf410fe8f6c0493471c2ceccb9ca0a5a75ed8f
Reviewed-on: https://go-review.googlesource.com/126601
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
@golang golang locked and limited conversation to collaborators Jul 28, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

10 participants