Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: read-only escape analysis and avoiding string -> []byte copies #2205

Closed
bradfitz opened this issue Aug 29, 2011 · 43 comments
Closed
Assignees
Labels
GarbageCollector NeedsFix The path to resolution is known, but the work has not been done. Performance
Milestone

Comments

@bradfitz
Copy link
Contributor

Many functions take a []byte but only read from it.

If the escape analysis code could flag parameters of a function as read-only, then code
which passes in a []byte(string) conversion could be cheaper.

bytes.IndexByte is one example.

For example, http/sniff.go does:

// -------------------
  // Index of the first non-whitespace byte in data.                                                                                               
  firstNonWS := 0
  for ; firstNonWS < len(data) && isWS(data[firstNonWS]); firstNonWS++ {
  }

func isWS(b byte) bool {
        return bytes.IndexByte([]byte("\t\n\x0C\n "), b) != -1
}

// -------------------

But it's faster to avoid re-creating the []byte:

func BenchmarkIndexByteLiteral(b *testing.B) {
        for i := 0; i < b.N; i++ {
        IndexByte([]byte("\t\n\x0C\n "), 'x')
        }
}

func BenchmarkIndexByteVariable(b *testing.B) {
        var whitespace = []byte("\t\n\x0C\n ")
        for i := 0; i < b.N; i++ {
                IndexByte(whitespace, 'x')
        }
}

bytes_test.BenchmarkIndexByteLiteral    20000000           125 ns/op
bytes_test.BenchmarkIndexByteVariable   100000000           25.4 ns/op

Related is issue #2204.
@gopherbot
Copy link

Comment 1 by jp@webmaster.ms:

It seem to be much easier to add the possibility of zerocopish taking string slices from
byte slices/arrays.
As far I understand, string slices are actually readonly and cap()'less byte slices:
http://research.swtch.com/2009/11/go-data-structures.html

@bradfitz
Copy link
Contributor Author

Comment 2:

zerocopish?

@rsc
Copy link
Contributor

rsc commented Aug 30, 2011

Comment 3:

Can you be more specific?
Examples of programs that you think should be
handled specially would be a good way to do that.

@rsc
Copy link
Contributor

rsc commented Sep 15, 2011

Comment 4:

Owner changed to @rsc.

Status changed to Accepted.

@lvdlvd
Copy link

lvdlvd commented Nov 7, 2011

Comment 5:

Labels changed: added compilerbug, performance.

@rsc
Copy link
Contributor

rsc commented Dec 9, 2011

Comment 6:

Labels changed: added priority-later.

@rsc
Copy link
Contributor

rsc commented Sep 12, 2012

Comment 8:

Labels changed: added go1.1maybe.

@bradfitz
Copy link
Contributor Author

Comment 9:

I run into this often, but I should start listing examples.
In goprotobuf, text.go calls writeString with both a string and a []byte converted to a
string:
// writeAny writes an arbitrary field.
func writeAny(w *textWriter, v reflect.Value, props *Properties) {
        v = reflect.Indirect(v)
        // We don't attempt to serialise every possible value type; only those
        // that can occur in protocol buffers, plus a few extra that were easy.
        switch v.Kind() {
        case reflect.Slice:
                // Should only be a []byte; repeated fields are handled in writeStruct.
                writeString(w, string(v.Interface().([]byte)))
        case reflect.String:
                writeString(w, v.String())
Note that the function writeString reallly just wants a read-only slice of bytes:
func writeString(w *textWriter, s string) {
        w.WriteByte('"')
        // Loop over the bytes, not the runes.
        for i := 0; i < len(s); i++ {
                // Divergence from C++: we don't escape apostrophes.
                // There's no need to escape them, and the C++ parser
                // copes with a naked apostrophe.
                switch c := s[i]; c {
                case '\n':
                        w.Write([]byte{'\\', 'n'})
                case '\r':
                        w.Write([]byte{'\\', 'r'})
                case '\t':
                        w.Write([]byte{'\\', 't'})
                case '"':
                        w.Write([]byte{'\\', '"'})
                case '\\':
                        w.Write([]byte{'\\', '\\'})
                default:
                        if isprint(c) {
                                w.WriteByte(c)
                        } else {
                                fmt.Fprintf(w, "\\%03o", c)
                        }
                }
        }
        w.WriteByte('"')
}
It doesn't matter that it's frozen (like a string), nor writable (like a []byte).  But
Go lacks that type, so if instead it'd be nice to write writeAny with a []byte parameter
and invert the switch above to be like:
        switch v.Kind() {
        case reflect.Slice:
                // Should only be a []byte; repeated fields are handled in writeStruct.
                writeString(w, v.Interface().([]byte))
        case reflect.String:
                writeString(w, []byte(v.String())) // no copy!
Where the []byte(v.String()) just makes a slice header pointing in to the string's
memory, since the compiler can verify that writeAny never mutates its slice.

@bradfitz
Copy link
Contributor Author

Comment 10:

See patch and CL description at http://golang.org/cl/6850067 for the opposite
but very related case: strconv.ParseUint, ParseBool, etc take a string but calling code
has a []byte.

@robpike
Copy link
Contributor

robpike commented Mar 7, 2013

Comment 11:

Labels changed: removed go1.1maybe.

@rsc
Copy link
Contributor

rsc commented Mar 12, 2013

Comment 12:

[The time for maybe has passed.]

@bradfitz
Copy link
Contributor Author

Comment 13:

People who like this bug also like issue #3512 (cmd/gc: optimized map[string] lookup from
[]byte key)

@dvyukov
Copy link
Member

dvyukov commented Mar 31, 2013

Comment 14:

FWIW
var m map[string]int
var b []byte
_ = m[string(b)]
case must be radically simpler to implement than general read-only analysis.
It's peephole optimization, when the compiler sees such code it can generate hacky
string object using the []byte pointer.
Another example would be len(string(b)), but this seems useless.

@bradfitz
Copy link
Contributor Author

Comment 15:

Yes. That's why they're separate bugs.
I want issue #3512 first, because it's easy.

@bradfitz
Copy link
Contributor Author

Comment 16:

Labels changed: added garbage.

@rsc
Copy link
Contributor

rsc commented Jul 30, 2013

Comment 18:

Labels changed: added priority-someday, removed priority-later.

@rsc
Copy link
Contributor

rsc commented Dec 4, 2013

Comment 19:

Labels changed: added repo-main.

@rsc
Copy link
Contributor

rsc commented Mar 3, 2014

Comment 20:

Adding Release=None to all Priority=Someday bugs.

Labels changed: added release-none.

@methane
Copy link
Contributor

methane commented Mar 10, 2022

As my understanding, []byte -> string conversion don't require read-only checks. It require only escape analysis that current Go has. Am I correct?

We have several duplicated functions for bytes and strings. Implementing only []byte -> string conversion first would very helpful for us.

func escapeString(in string) []byte {
    // some escape logic here.
    // do not escape *in*
}

func escapeBytes(in []byte) []byte {
    return escapeString(string(in)) // slicebytetostringtmp, instead of slicebytetostring.
}

@randall77
Copy link
Contributor

As my understanding, []byte -> string conversion don't require read-only checks. It require only escape analysis that current Go has. Am I correct?

No, at least if I understand you correctly. We need to handle this case:

b := []byte{'x'}
s := string(b)
b[0] = 'y'

The string s should remain "x" after the last assignment.

So it isn't that b is read-only always, but that it is read-only after the conversion to string.
And "after" is hard, as b may not be the only []byte pointing at its backing array. Some sort of alias analysis is required. (Of course in code I showed nothing else aliases the backing store, but that may not be the case in more realistic scenarios.)

In any case, this issue is for the conversion the other way, string -> []byte.

@methane
Copy link
Contributor

methane commented Mar 15, 2022

Thank you.

I thought about string() in argument list like data := encodeString(string([]data)).
But after reading your response, I noticed even encodeString(string([]data)) can have aliasing via global variable. That is why restrict keyword is added for C99.
Now I understand why it is difficult than just escape analysis.

In any case, this issue is for the conversion the other way, string -> []byte.

Oh, I didn't know that. Would you change the issue title from string <-> []byte to string -> []byte?

@mattn
Copy link
Member

mattn commented Mar 15, 2022

IMO, I think that it's difficult to provide way for converting types since that bytes might be on stack not heap allocator.

var s string
// s = "hello" // This should be crash because "hello" might be on stack.
s = string([]byte("hello"))
header := (*reflect.StringHeader)(unsafe.Pointer(&s))
bytesheader := &reflect.SliceHeader{
	Data: header.Data,
	Len:  header.Len,
	Cap:  header.Len,
}
b := *(*[]byte)(unsafe.Pointer(bytesheader))
b[0] = 0x48

@randall77 randall77 changed the title cmd/compile: read-only escape analysis and avoiding string <-> []byte copies cmd/compile: read-only escape analysis and avoiding string -> []byte copies Mar 15, 2022
@jfcg
Copy link

jfcg commented Mar 25, 2022

In any case, this issue is for the conversion the other way, string -> []byte.

Simpler example (tested with Go 1.17.8 / 1.18):
pkg.go

package pkg

// ToLower converts ascii string s to lower-case
func ToLower(s string) string {
	if s == "" {
		return ""
	}
	buf := make([]byte, len(s))

	for i := 0; i < len(s); i++ {
		c := s[i]
		if 'A' <= c && c <= 'Z' {
			c |= 32
		}
		buf[i] = c
	}
	return string(buf)
}

pkg_test.go

package pkg

import "testing"

func BenchmarkToLower(b *testing.B) {
	str := "SomE StrInG"
	want := "some string"
	var res string

	for i := 0; i < b.N; i++ {
		res = ToLower(str)
	}
	if res != want {
		b.Fatal("ToLower error")
	}
}

go test -v -bench ToLower -benchmem -count=3 prints

BenchmarkToLower-4      18710726                66.86 ns/op           32 B/op          2 allocs/op
BenchmarkToLower-4      17420910                65.57 ns/op           32 B/op          2 allocs/op
BenchmarkToLower-4      17085405                63.16 ns/op           32 B/op          2 allocs/op

There is a redundant allocation & copy in return string(buf). It looks too trivial to me that I am surprised Go compiler cannot handle this.

@gopherbot
Copy link

Change https://go.dev/cl/520259 mentions this issue: cmd/compile/internal/escape: optimize indirect closure calls

@gopherbot
Copy link

Change https://go.dev/cl/520395 mentions this issue: cmd/compile: enable zero-copy string->[]byte conversions

gopherbot pushed a commit that referenced this issue Aug 17, 2023
This CL extends escape analysis in two ways.

First, we already optimize directly called closures. For example,
given:

	var x int  // already stack allocated today
	p := func() *int { return &x }()

we don't need to move x to the heap, because we can statically track
where &x flows. This CL extends the same idea to work for indirectly
called closures too, as long as we know everywhere that they're
called. For example:

	var x int  // stack allocated after this CL
	f := func() *int { return &x }
	p := f()

This will allow a subsequent CL to move the generation of go/defer
wrappers earlier.

Second, this CL adds tracking to detect when pointer values flow to
the pointee operand of an indirect assignment statement (i.e., flows
to p in "*p = x") or to builtins that modify memory (append, copy,
clear). This isn't utilized in the current CL, but a subsequent CL
will make use of it to better optimize string->[]byte conversions.

Updates #2205.

Change-Id: I610f9c531e135129c947684833e288ce64406f35
Reviewed-on: https://go-review.googlesource.com/c/go/+/520259
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
@dmitshur dmitshur modified the milestones: Unplanned, Go1.22 Aug 17, 2023
@dmitshur
Copy link
Contributor

CL 520395 that resolved this issue was reverted in CL 520596, so reopening.

@dmitshur dmitshur reopened this Aug 17, 2023
@gopherbot
Copy link

Change https://go.dev/cl/520599 mentions this issue: cmd/compile: restore zero-copy string->[]byte optimization

@gopherbot
Copy link

Change https://go.dev/cl/520600 mentions this issue: cmd/compile: enable -d=zerocopy by default

@jfcg
Copy link

jfcg commented Aug 18, 2023

Hi,
These changes address redundant copying for string <--> []byte conversions both ways, right?

@mdempsky
Copy link
Member

@jfcg No, at this time only string->[]byte.

@cespare
Copy link
Contributor

cespare commented Aug 18, 2023

I believe that #43752 discusses at least some of the []byte->string cases.

(@mdempsky it's exciting that this 4-digit issue is so close to being solved!)

gopherbot pushed a commit that referenced this issue Aug 18, 2023
This CL implements the remainder of the zero-copy string->[]byte
conversion optimization initially attempted in go.dev/cl/520395, but
fixes the tracking of mutations due to ODEREF/ODOTPTR assignments, and
adds more comprehensive tests that I should have included originally.

However, this CL also keeps it behind the -d=zerocopy flag. The next
CL will enable it by default (for easier rollback).

Updates #2205.

Change-Id: Ic330260099ead27fc00e2680a59c6ff23cb63c2b
Reviewed-on: https://go-review.googlesource.com/c/go/+/520599
Auto-Submit: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Than McIntosh <thanm@google.com>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
@go101
Copy link

go101 commented Nov 28, 2023

The optimization is cool, but it is a little pity that the following conversions still need allocations.

func CompareStrings(a, b string) int {
	return bytes.Compare([]byte(a), []byte(b))
}
package main

import "bytes"
import t "testing"

var y = "abcdefghijklmnopqrstuvwxyz0123456789"

func compareNonConstants() {
	_ = bytes.Compare([]byte(y), []byte(y))
}

func main() {
	stat := func(f func()) int {
		allocs := t.AllocsPerRun(10, f)
		return int(allocs)
	}
	println(stat(compareNonConstants)) // 2
}
package main

import t "testing"

var y = "abcdefghijklmnopqrstuvwxyz0123456789"
var s = []byte(y)

func concatBytesAndString() {
	_ = append([]byte(y), s...)
}

func main() {
	stat := func(f func()) int {
		allocs := t.AllocsPerRun(10, f)
		return int(allocs)
	}
	println(stat(concatBytesAndString)) // 2
}

@go101
Copy link

go101 commented Nov 28, 2023

BTW, should the old optimization for for range []byte(aString) be unified into the new optimization? Maybe it has been?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GarbageCollector NeedsFix The path to resolution is known, but the work has not been done. Performance
Projects
None yet
Development

No branches or pull requests