Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: optimize Write([]byte(stringVal)) to not copy the string #18822

Open
ianlancetaylor opened this issue Jan 27, 2017 · 11 comments
Open
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. Performance
Milestone

Comments

@ianlancetaylor
Copy link
Contributor

It would be nice to optimize Write([]byte(stringVal)) to not copy the string value. This is normally safe because most Write methods do not modify the byte slice passed in. In fact, the documentation for io.Writer requires that the Write method not modify the byte slice, although this is not (and can not be) enforced.

Here is how we can do this.

For each function and method that takes a parameter of slice type, record whether the slice's underlying array is modified. This will require generating annotations similar to the ones we generate for escape analysis. Assembly functions will require an annotation similar to //go:noescape. (Naturally, a slice that escapes must be treated as modified.)

For each call of the form F([]byte(stringVal)), where we know that F does not modify the slice, we can pass the string value directly. This would do essentially the same thing as the existing optimization for map lookups in which a []byte is converted to a string. This fixes direct calls, but of course the interesting cases all involve calls to methods of values of type io.Writer.

For any type with a Write method that does modify the slice, generate at compile time an additional Write·2 method that makes a copy of the slice and then calls the Write method. The method is named Write·2 to ensure that it does not conflict with any user written method.

When converting any type to io.Writer (or any interface type that inherits from io.Writer), check for the presence of a Write·2 method. If that method exists, add it to the interface as an additional entry in the itab.

For any call to an interface method Write([]byte(stringVal)), modify the call to call a special function doWrite in the io package, without copying the string. doWrite will check for a Write·2 method, and call it if it exists; that will cause the string to be copied as happens today. If the Write·2 method does not exist, which will be the normal case, doWrite will call the Write method as usual, knowing that the Write method does not modify the slice.

Generalizing this to other methods is left as an exercise for the reader.

@ianlancetaylor ianlancetaylor added this to the Unplanned milestone Jan 27, 2017
@bradfitz
Copy link
Contributor

I feel like there was a dup bug with the same plan but rejected because as I recall: nobody should be working with such large strings and stay in bytes if they want performance.

Was this motivated by something concrete?

@ianlancetaylor
Copy link
Contributor Author

There was no specific motivation; it was just an idea I had.

@philhofer
Copy link
Contributor

Your compiler annotation for Write sounds a lot like the compiler secretly interpreting Write([]byte) to be Write(immutable []byte); the same could be done for other pointer arguments that are provably unmodified.

Such an (internal) annotation would have additional benefits:

  • Methods that do not modify their receiver won't require callers to re-load fields (I don't know how common this is or how much it would matter, but it's something)
  • Functions/methods with only immutable arguments (including implicit globals) may be hoisted, etc. (Similar to __attribute__((const)) and __attribute__((pure)).

@dsnet
Copy link
Member

dsnet commented Jan 27, 2017

This would remove the need for my attempted optimization in #13848, which tried to optimize string writing at the standard library level.

@dsnet
Copy link
Member

dsnet commented Jan 27, 2017

How does this work for nested calls of io.Writer?

func (t *T) Write(p []byte) (int, error) {
	return t.wr.Write(p)
}

You do not know if T.Write mutates p or not because it depends on whether t.wr.Write mutates p. To be conservative, the compiler will need to assume that it is mutated and then it will emit T.Write·2, removing any benefit from the proposed optimization.

@ianlancetaylor
Copy link
Contributor Author

@dsnet For a case like that you turn T.Write·2 into a copy of the entire method, that passes p along uninterpreted for possibly copying by t.wr.Write·2.

@dsnet
Copy link
Member

dsnet commented Jan 27, 2017

@ianlancetaylor, Doesn't that approach fail for this example (that nobody should ever write):

	t := &T{wr: ...}
	t.Write([]byte("hello"))

func (t *T) Write(p []byte) (int, error) {
	n, err := t.wr.Write(p) // Assume t.wr.Write mutates p, but not known at compile time
	fmt.Println(p[0]) // Read p, expect to see mutated value
	return n, err
}

So T.Write·2 is a copy of T.Write but the call to t.wr.Write is replaced by doWrite(t.wr, p). If t.wr.Write does mutate p, then we are guaranteed that the "hello" string is not accidentally mutated (correct). However, the fmt.Println(p[0]) will not be able to observe the mutation that t.wr.Write caused because of the copy (wrong).

I guess that this is solvable if doWrite was changed such that the new copy was known to the caller. Something like func doWrite(w io.Writer, p *[]byte)

@ianlancetaylor
Copy link
Contributor Author

You're right: in general, if the method does something with the buffer after passing it to Write, it may not be possible to apply this optimization to callers of that method.

@minux
Copy link
Member

minux commented Jan 28, 2017 via email

@ALTree
Copy link
Member

ALTree commented Feb 1, 2017

Related to #2205 (cmd/compile: read-only escape analysis and avoiding string <-> []byte copies).

@go101
Copy link

go101 commented Jan 16, 2020

I create a thread on #golang-dev to discuss some possible optimizations for this.

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. Performance
Projects
None yet
Development

No branches or pull requests

8 participants