Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: allocate slices lazily #46114

Open
bemasc opened this issue May 12, 2021 · 10 comments
Open

cmd/compile: allocate slices lazily #46114

bemasc opened this issue May 12, 2021 · 10 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Milestone

Comments

@bemasc
Copy link
Contributor

bemasc commented May 12, 2021

This is an alternative to #15735.

When a slice is constructed (make([]foo, N)), the Go runtime could avoid allocating the backing array until something actually tries to read or write to the slice. In most cases, the slice is used immediately, so this has little effect, but when used with a blocking implemention of io.Readable.Read, the internal call to pollDesc.waitRead() delays the first use of the slice until after the wait. This would improve the memory efficiency of Go socket servers with many blocked sockets (if they are written in a particular style and the garbage collector is working well).

This is not a language change, although it might influence coders to use different slice creation strategies.

@gopherbot gopherbot added this to the Proposal milestone May 12, 2021
@ianlancetaylor
Copy link
Contributor

If I understand this correctly, there is no API change here, so this does not have to be a proposal. Taking this out of the proposal process.

I don't really see how to implement this. For the cases we care about, I think that the first use of the slice will not be in the same function as the make call that creates the slice. So we need some way to record the slice as being unallocated. And then we need to allocate it. And then we need to somehow communicate that allocation up the call stack. How can we do this?

@ianlancetaylor ianlancetaylor changed the title proposal: Lazy allocation of fresh slices cmd/compile: allocate slices lazily May 12, 2021
@ianlancetaylor ianlancetaylor added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. and removed Proposal labels May 12, 2021
@ianlancetaylor ianlancetaylor modified the milestones: Proposal, Unplanned May 12, 2021
@dsnet
Copy link
Member

dsnet commented May 12, 2021

What happens in the following?

b := make([]byte, 2)
var wg sync.WaitGroup
wg.Add(2)
go func(b1 []byte) {
	defer wg.Done()
	b1[0] = 'h' // lazily allocate backing store for b1
}(b)
go func(b2 []byte) {
	defer wg.Done()
	b2[1] = 'i' // lazily allocate backing store for b2
}(b)
wg.Wait()
fmt.Println(string(b))

I expect this to print "hi", but the proposed semantics means that the backing store would occur upon first load or store, which would occur in each Go routine. However, at that point, they only update the local slice header, and thus fmt.Println(string(b)) would report the wrong result.

@bemasc
Copy link
Contributor Author

bemasc commented May 12, 2021

@dsnet is right, this would require all copies of the slice to share a pointer to the backing store so it can be mutated for all of them.

Dreamily, I imagine that runtime/slice.go would look like:

type sliceStorage struct {
  array unsafe.Pointer
  et *_type
  cap int
}

func (s *sliceStorage) get() unsafe.Pointer {
  // This needs to be atomic.
  if s.array == nil {
    mem, _ := math.MulUintptr(s.et.size, uintptr(s.cap))
    s.array = mallocgc(mem, s.et, true)
  }
  return s.array
}

type slice struct {
  storage *sliceStorage
  len int
}

It's of course entirely possible that this is unimplementable, or irretrievably inefficient.

@randall77
Copy link
Contributor

@bemasc the implementation you sketched would require two loads (instead of one) every time you read a value from a slice. Not wildly implausible, but I think we'd want a really good reason to take that performance hit.

@davecheney
Copy link
Contributor

ISTM that this proposal is about deferring the allocation of the Read buffer for network sockets. I think this should be addressed directly with a targeted proposal, not a generalised change to the way slices work in the language.

@bemasc
Copy link
Contributor Author

bemasc commented May 12, 2021

The use is a little broader than just network sockets (e.g. any io.Readable.Read() call that actually blocks), but yes, it's a very deep change.

The double load might be avoidable when it's not necessary (e.g. with a direct/indirect mode flag), but that could get complicated and might not be faster.

@davecheney
Copy link
Contributor

What other sources of readables do you have in mind? ISTM the other major source are backed by files which are limited by the io subsystem and expected that calls to read will return quickly. That is too say, programs that open hundreds of thousands of files and tail them are rare.

@bemasc
Copy link
Contributor Author

bemasc commented May 12, 2021

Yes, the main use is definitely network sockets. In my main Go application (a network server), these idle buffers for blocked reads are the dominant memory cost.

I think an approach like this is potentially consistent with Go's philosophy of encouraging users to write naive blocking code, then transforming it into an efficient non-blocking implementation. However, I can certainly imagine that the double load would make this a non-starter (although with inlining, that should probably only apply once per function).

I just wanted to share the idea; I'm happy to close the ticket if it seems unlikely. A similar but (much) less invasive approach would be something like net.Conn.ReadNext(limit int) ([]byte, error), allocating a slice on-demand.

@jfesler
Copy link

jfesler commented May 13, 2021

I could see net.Conn.ReadNext(limit int) ([]byte, error) filling a niche but valuable need. I don't see it being useful to anything app working on higher level wrappers around a reader.


Without knowing anything about the compiler, I could see something like this being recognized:

  b := make([]byte,n) // recognize alloc before read
  n, err := net.Conn.Read(b)

into

 [wait to for net.Conn.Read to be "readable"]
 b := make([]byte,n) // alloc before read
 n, err := net.Conn.Read(b)

Heck, if we simply had a way to block until a reader (or specifically a socket) were readable, we could do it without compiler tricks. Then the user would have the ability to either get a fresh allocation, or grab memory from a pool. But, that would be an API change..

@CAFxX
Copy link
Contributor

CAFxX commented May 16, 2021

I had in the past to move allocations out of critical regions protected by mutex to minimize contention. If I'm understanding correctly this proposal as currently written, it could potentially lead to surprising performance regressions, since the allocation would sneakily happen inside the critical section again (the workaround would obviously be to allocate and then do a dummy write before the critical section, but it would be surprising nevertheless).

Regardless, I was under the impression though that during GC we were already using MADV_DONTNEED, and therefore until something is written to each page they don't consume actual memory... Are we talking exclusively about slices that are comparable in size to, or smaller than, a single memory page?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Projects
None yet
Development

No branches or pull requests

9 participants