Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: runtime.zerovalue can be too small when using shared libraries #11491

Closed
mwhudson opened this issue Jun 30, 2015 · 27 comments
Closed

Comments

@mwhudson
Copy link
Contributor

AIUI runtime.zerovalue is only used in maps: it is what is copied when a lookup misses. The compiler takes some care to make it be the smallest value needed: it is the size of the largest type that is seen being put into a hash (there is a comment that suggests it's also used by 2-arg type cast, but I'm not sure that's true any more). Anyway, this kind of closed world assumption is not safe in a shared library: just because the compiler / linker does not see a type being put into a map does not mean that it is not going to be put into a hash in some module that links the shared library containing the type data.

I guess the safest fix is to make runtime.zerovalue a local symbol and ensure that the zerovalue for a shared library is as large as any type defined by that shared library (possibly only need to consider exported types?). Not sure if there is a small enough fix to be safe for 1.5.

@randall77
Copy link
Contributor

I'm not sure this is a problem. The size is not associated with the size put into the map, it is associated with the size read during a map access. So any package that looks up in a map[K]V must have a zero value of size at least V. I believe the code does that correctly, as long as the linker does the maximum size calculation for each zerovalue in each shared library.

I think you might be right that this is no longer used for 2-arg type casts.

@mwhudson
Copy link
Contributor Author

mwhudson commented Jul 1, 2015

I promise you that it can be a problem :-) (or I wouldn't have spent as long in gdb this morning)

The situation is like this:

You compile some go packages into a shared library. These go packages contain a type T of say 128 bytes but do not put T into a map, so the shared library contains a reasonably small zerovalue, say 64 bytes long. type.T.zero is set to point at this zerovalue.

Then, you link a package that does put T into a map against this shared library. When a map access misses, it copies type.T.zero into the value it returns. Only the first 64 bytes of this are guaranteed to be zero! Hilarity ensues.

(It's actually a bit more complicated than that because runtime.zerovalue is a global symbol currently, so all type.*.zero pointers end up pointing to the one from the executable, but the effect is similar).

I think all this type.*.zero maintenance is a bit silly really -- it's always the same pointer for all statically defined types, so why not just have a single zeroptr? I implemented this in ee83e7d and it seems to be a net reduction in complexity. The linker could assist by making sure that zeroptr initially points at a buffer at least as large as runtime.zerovalue today but that would only make things a little simpler.

@randall77
Copy link
Contributor

Right, I see, the pointer in T points to the zero value in the wrong shared library.  Yuck.

The problem with a single zero value is reflect.  Map types made by the reflect package might need a larger zero value than any static map type. It's probably solvable, though, by returning nil to the reflect map operations, and conditioning their copy on checking the return value.

@mwhudson
Copy link
Contributor Author

mwhudson commented Jul 1, 2015

I think my change handles the reflect use case? It passes ./all.bash & go test reflect, at any rate...

@randall77
Copy link
Contributor

Yes, locking&allocating at each map access should work. I have been worried about the contention for locking, but I guess >1K values aren't that likely.

I'm thinking there may be a slightly better way. The original reason for returning the zero value was that the caller didn't need to condition on the return value, it could just unconditionally copy the return value to the destination. This is good for small values, but for large values the overhead is in the noise, and it actually hurts because we use memmove and not memzero.

So maybe we should reintroduce the condition on the return value for maps with values >1K? We'd have to compare the return value against &zero1024, but that seems like it would be ok.

@mwhudson
Copy link
Contributor Author

mwhudson commented Jul 2, 2015

It only locks & allocates when a map with a larger value type than ever seen before is created, that doesn't seem too bad?

I can see the point about memset being preferable to memmove at some point -- so the map code would always return &zero1024 on a miss and the compiler, when compiling access to a map with a value type larger than 1K, would insert a check for that pointer being returned and insert a memset in that case? Sounds doable, but I don't know if using large value types in maps is very common...

@randall77
Copy link
Contributor

Right, that's the algorithm I was trying to describe.

I suspect value types >1k are very uncommon. You almost always want to use pointers-to-big-thing instead.

I guess I'm still a bit confused as to how the current setup doesn't work. There should only be a single shared library with the runtime package in it, and zerovalue is in the runtime package. Does the problem occur when compiling other shared libraries with other packages - where the compiler ends up putting a large runtime.zerovalue symbol in that other shared library, and when that shared library is loaded, it can't affect the runtime.zerovalue that was already loaded in the runtime-containing shared library?

If my description is accurate, it sounds like the dynamic linker should at least barf when this happens instead of silently ignoring the problem. Is there any way to make the dynamic linker barf in this situation? I can imagine there might be other cases where the compiler introduces runtime data symbols that might interact badly with dynamic linking.

@mwhudson
Copy link
Contributor Author

mwhudson commented Jul 2, 2015

On 2 July 2015 at 13:47, Keith Randall notifications@github.com wrote:

Right, that's the algorithm I was trying to describe.

I suspect value types >1k are very uncommon. You almost always want to use
pointers-to-big-thing instead.

Right.

I guess I'm still a bit confused as to how the current setup doesn't work.
There should only be a single shared library with the runtime package in
it, and zerovalue is in the runtime package. Does the problem occur when
compiling other shared libraries with other packages - where the compiler
ends up putting a large runtime.zerovalue symbol in that other shared
library, and when that shared library is loaded, it can't affect the
runtime.zerovalue that was already loaded in the runtime-containing shared
library?

Mostly -- it's always the runtime.zerovalue from the executable that "wins"
because the executable is the first entry in the "global symbol lookup
scope".

In trying to explain exactly what goes wrong, I also became very confused.
I think I could make things work by always defining when linking a
runtime.zerovalue that is the same size as the largest of all the
zerovalues from the packages being linked /and/ the largest zerovalue of
all the dependent libraries. And actually I'm a bit confused as to why that
doesn't happen already.

If my description is accurate, it sounds like the dynamic linker should at
least barf when this happens instead of silently ignoring the problem. Is
there any way to make the dynamic linker barf in this situation? I can
imagine there might be other cases where the compiler introduces runtime
data symbols that might interact badly with dynamic linking.

I don't think so, welcome to wonderful wacky world of symbol interposition.

@ianlancetaylor
Copy link
Contributor

The way to make this work is for runtime.zerovalue to be a common symbol. Common symbols have the exact semantics we want, courtesy of FORTRAN. In ELF that means giving setting the section index of the symbol to SHN_COMMON (== 0xfff2).

But changing the behaviour for large value types seems better to me. I'm not sure we even need to go to 1024, probably 256 would be fine. If the value type is <= 256 bytes, the compiler generates code to copy. If the value type > 256 bytes, have the map code return nil for zero, and have the compiler generate if ret == nil { memset } else { memcpy }.

@ianlancetaylor
Copy link
Contributor

@mwhudson Do you want to try using SHN_COMMON for zerovalue when compiling shared or dynlink? That change might be small enough for 1.5.

@mwhudson
Copy link
Contributor Author

Sure, I'll give it a go.

@mwhudson
Copy link
Contributor Author

So it turns out that this only works with gold, unless your binutils is built with --enable-elf-stt-common, which Ubuntu's isn't (the default is apparently to not enable in case you want to copy your binary onto a system that has a dynamic loader from before 1999, sigh). I'm fine with depending on gold though.

@randall77
Copy link
Contributor

We can definitely do the large-value-by-reference fix for 1.6. I think that change is too big for 1.5 though. Anything simpler you'd like to do for 1.5?

@mwhudson
Copy link
Contributor Author

I was trying Ian's suggestion but I ran into https://sourceware.org/bugzilla/show_bug.cgi?id=18288, which is fixed but not part of any release yet. Feels like we should leave it for now, unless Ian has any other ideas.

@ianlancetaylor
Copy link
Contributor

My suggestion was to use SHN_COMMON. Did you try that? Using STT_COMMON too does no harm, but as you've discovered in practice you need SHN_COMMON too.

@mwhudson
Copy link
Contributor Author

Yes, but unless I'm missing something the objects in the .so need to be STT_COMMON or the dynamic linker won't use the special "largest one with that name wins" behaviour we need. And that needs either ld.bfd built with non standard options or an unreleased gold. I'd love to be wrong about something here btw :)

@ianlancetaylor
Copy link
Contributor

The GNU linker uses a heuristic to decide whether a symbol in a shared library is a common symbol: if the symbol is defined in a section that is allocated but not loaded, it will be treated as a common symbol when merged with a symbol in a regular object in the SHN_COMMON section.

Unfortunately, gold doesn't do this.

But now it occurs to me that we are being distracted by this. Why not simply make runtime.zerovalue a hidden symbol?

@mwhudson
Copy link
Contributor Author

Because then the type.zero of all types defined in a module is the size of the largest type that is put into a map in that module -- if a type is defined in one module but only put into a hash in another, it can still be too small.

@ianlancetaylor
Copy link
Contributor

I'm not quite following. The type descriptor points to a specific zero value, and that pointer was set up at link time (leaving out types created by the reflect package, which are rare). Adding that type to a map is not going to use the local runtime.zerovalue value.

@mwhudson
Copy link
Contributor Author

If zerovalue is local, a map miss will copy from the zerovalue of the module that defines the type, and that zerovalue is the size of the largest type seen in a map among the packages that go into that module. It's easy to make this go wrong -- put a large type but no use of it into a module and then link another module against it that puts this large type into a map. 66719bc makes zerovalue local and also contains a test that fails with that change. (A slightly different test with TortureZeroValue in zvtlib rather than the exe fails on tip).

@ianlancetaylor
Copy link
Contributor

Ah, I see. OK, then, if a type is not seen as going into a map, we need to set its zero field to nil, rather than pointing to runtime.zerovalue. I think the map code will do the right thing.

@mwhudson
Copy link
Contributor Author

No, it's always set to runtime.zerovalue and the map code does not do the right thing. But this is too much discussion of code that should just be deleted early in 1.6.

@ianlancetaylor
Copy link
Contributor

I think the map code does the right thing if the zero field is nil because of the mapzero function in runtime/hashmap.go.

@rsc rsc modified the milestones: Go1.6Early, Go1.5Maybe Jul 20, 2015
@mwhudson
Copy link
Contributor Author

So I started to implement the idea of having the map code return a pointer to a fixed size buffer of zeros on a miss and I've gotten stuck. In particular, I don't see the best way of handling the miss case for OINDEXMAP. Basicallly I don't know what to rewrite OINDEXMAP into when the value type is large: the access needs to turn into code like

var tmp1 T
tmp2 := p(key)
if tmp2 == runtime.zeroptr {
    tmp1 = T{}
} else {
    tmp1 = *tmp2
}
"return" tmp1

but that's really a statement, and a map index is obviously an expression. This sort of expansion is clearly done by function inlining, but I don't know how to do that in walkexpr :/

@randall77
Copy link
Contributor

I'm not sure either, but there must be a way. walkexpr can handle things like && which also have implicit control flow.

mwhudson added a commit to mwhudson/go that referenced this issue Aug 20, 2015
This replaces the scheme where the compiler tracked the largest type that was
the value of a hashtable and allocate a block of zeroes large enough to return
as a hash miss with a scheme that is simpler on the runtime but more
complicated in the compiler: just have a fixed size block of zeroes (256 bytes
here) and handle hash access differently in the compiler when the value type is
too large.

What I have here doesn't quite work (try cranking the <= 256 in walk.go down to
<= 8).

Update golang#11491.

Change-Id: I1da3aca96bc4e0f32129f3a53e31dd8079f79100
@mwhudson
Copy link
Contributor Author

Getting closer,,, the Node level stuff is pretty unfamiliar.

@gopherbot
Copy link

CL https://golang.org/cl/13784 mentions this issue.

@golang golang locked and limited conversation to collaborators Sep 4, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants