cmd/compile: allocate slices lazily #46114

bemasc · 2021-05-12T01:28:50Z

This is an alternative to #15735.

When a slice is constructed (make([]foo, N)), the Go runtime could avoid allocating the backing array until something actually tries to read or write to the slice. In most cases, the slice is used immediately, so this has little effect, but when used with a blocking implemention of io.Readable.Read, the internal call to pollDesc.waitRead() delays the first use of the slice until after the wait. This would improve the memory efficiency of Go socket servers with many blocked sockets (if they are written in a particular style and the garbage collector is working well).

This is not a language change, although it might influence coders to use different slice creation strategies.

The text was updated successfully, but these errors were encountered:

ianlancetaylor · 2021-05-12T02:14:52Z

If I understand this correctly, there is no API change here, so this does not have to be a proposal. Taking this out of the proposal process.

I don't really see how to implement this. For the cases we care about, I think that the first use of the slice will not be in the same function as the make call that creates the slice. So we need some way to record the slice as being unallocated. And then we need to allocate it. And then we need to somehow communicate that allocation up the call stack. How can we do this?

dsnet · 2021-05-12T02:15:47Z

What happens in the following?

b := make([]byte, 2)
var wg sync.WaitGroup
wg.Add(2)
go func(b1 []byte) {
	defer wg.Done()
	b1[0] = 'h' // lazily allocate backing store for b1
}(b)
go func(b2 []byte) {
	defer wg.Done()
	b2[1] = 'i' // lazily allocate backing store for b2
}(b)
wg.Wait()
fmt.Println(string(b))

I expect this to print "hi", but the proposed semantics means that the backing store would occur upon first load or store, which would occur in each Go routine. However, at that point, they only update the local slice header, and thus fmt.Println(string(b)) would report the wrong result.

bemasc · 2021-05-12T04:05:06Z

@dsnet is right, this would require all copies of the slice to share a pointer to the backing store so it can be mutated for all of them.

Dreamily, I imagine that runtime/slice.go would look like:

type sliceStorage struct {
  array unsafe.Pointer
  et *_type
  cap int
}

func (s *sliceStorage) get() unsafe.Pointer {
  // This needs to be atomic.
  if s.array == nil {
    mem, _ := math.MulUintptr(s.et.size, uintptr(s.cap))
    s.array = mallocgc(mem, s.et, true)
  }
  return s.array
}

type slice struct {
  storage *sliceStorage
  len int
}

It's of course entirely possible that this is unimplementable, or irretrievably inefficient.

randall77 · 2021-05-12T04:24:36Z

@bemasc the implementation you sketched would require two loads (instead of one) every time you read a value from a slice. Not wildly implausible, but I think we'd want a really good reason to take that performance hit.

davecheney · 2021-05-12T04:35:22Z

ISTM that this proposal is about deferring the allocation of the Read buffer for network sockets. I think this should be addressed directly with a targeted proposal, not a generalised change to the way slices work in the language.

bemasc · 2021-05-12T16:17:05Z

The use is a little broader than just network sockets (e.g. any io.Readable.Read() call that actually blocks), but yes, it's a very deep change.

The double load might be avoidable when it's not necessary (e.g. with a direct/indirect mode flag), but that could get complicated and might not be faster.

davecheney · 2021-05-12T20:46:25Z

What other sources of readables do you have in mind? ISTM the other major source are backed by files which are limited by the io subsystem and expected that calls to read will return quickly. That is too say, programs that open hundreds of thousands of files and tail them are rare.

bemasc · 2021-05-12T21:32:50Z

Yes, the main use is definitely network sockets. In my main Go application (a network server), these idle buffers for blocked reads are the dominant memory cost.

I think an approach like this is potentially consistent with Go's philosophy of encouraging users to write naive blocking code, then transforming it into an efficient non-blocking implementation. However, I can certainly imagine that the double load would make this a non-starter (although with inlining, that should probably only apply once per function).

I just wanted to share the idea; I'm happy to close the ticket if it seems unlikely. A similar but (much) less invasive approach would be something like net.Conn.ReadNext(limit int) ([]byte, error), allocating a slice on-demand.

jfesler · 2021-05-13T00:59:05Z

I could see net.Conn.ReadNext(limit int) ([]byte, error) filling a niche but valuable need. I don't see it being useful to anything app working on higher level wrappers around a reader.

Without knowing anything about the compiler, I could see something like this being recognized:

  b := make([]byte,n) // recognize alloc before read
  n, err := net.Conn.Read(b)

into

 [wait to for net.Conn.Read to be "readable"]
 b := make([]byte,n) // alloc before read
 n, err := net.Conn.Read(b)

Heck, if we simply had a way to block until a reader (or specifically a socket) were readable, we could do it without compiler tricks. Then the user would have the ability to either get a fresh allocation, or grab memory from a pool. But, that would be an API change..

CAFxX · 2021-05-16T07:17:06Z

I had in the past to move allocations out of critical regions protected by mutex to minimize contention. If I'm understanding correctly this proposal as currently written, it could potentially lead to surprising performance regressions, since the allocation would sneakily happen inside the critical section again (the workaround would obviously be to allocate and then do a dummy write before the critical section, but it would be surprising nevertheless).

Regardless, I was under the impression though that during GC we were already using MADV_DONTNEED, and therefore until something is written to each page they don't consume actual memory... Are we talking exclusively about slices that are comparable in size to, or smaller than, a single memory page?

gopherbot added this to the Proposal milestone May 12, 2021

gopherbot added the Proposal label May 12, 2021

ianlancetaylor changed the title ~~proposal: Lazy allocation of fresh slices~~ cmd/compile: allocate slices lazily May 12, 2021

ianlancetaylor added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. and removed Proposal labels May 12, 2021

ianlancetaylor modified the milestones: Proposal, Unplanned May 12, 2021

bcmills added the Performance label May 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cmd/compile: allocate slices lazily #46114

cmd/compile: allocate slices lazily #46114

bemasc commented May 12, 2021

ianlancetaylor commented May 12, 2021

dsnet commented May 12, 2021 •

edited

bemasc commented May 12, 2021

randall77 commented May 12, 2021

davecheney commented May 12, 2021

bemasc commented May 12, 2021 •

edited

davecheney commented May 12, 2021

bemasc commented May 12, 2021

jfesler commented May 13, 2021

CAFxX commented May 16, 2021 •

edited

cmd/compile: allocate slices lazily #46114

cmd/compile: allocate slices lazily #46114

Comments

bemasc commented May 12, 2021

ianlancetaylor commented May 12, 2021

dsnet commented May 12, 2021 • edited

bemasc commented May 12, 2021

randall77 commented May 12, 2021

davecheney commented May 12, 2021

bemasc commented May 12, 2021 • edited

davecheney commented May 12, 2021

bemasc commented May 12, 2021

jfesler commented May 13, 2021

CAFxX commented May 16, 2021 • edited

dsnet commented May 12, 2021 •

edited

bemasc commented May 12, 2021 •

edited

CAFxX commented May 16, 2021 •

edited