Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: runtime: allow access to runtime.moduledata.text #28864

Closed
larytet opened this issue Nov 19, 2018 · 14 comments
Closed

proposal: runtime: allow access to runtime.moduledata.text #28864

larytet opened this issue Nov 19, 2018 · 14 comments

Comments

@larytet
Copy link

larytet commented Nov 19, 2018

I keep format strings (things similar to the first argument of fmt.Sprintf) in a hash table. I use address of the string instead of hashing the string. I assume that the strings are const data and stored in the text segment of the executable. In the existing code I rely on /procfs/self/maps which works only in Linux. Given the starting address of the text segment I can calculate the string offset and use the value as an index in the cache table.

An API returning runtime.moduledata.text can help to improve performance in functions handling strings. Example of such API is custom versions of fmt.Sprintf()

@bcmills bcmills changed the title Allow access to runtime.moduledata.text proposal: runtime: allow access to runtime.moduledata.text Nov 19, 2018
@gopherbot gopherbot added this to the Proposal milestone Nov 19, 2018
@bcmills
Copy link
Contributor

bcmills commented Nov 19, 2018

I use address of the string Instead of hashing the strings.

More detail would be helpful. You can access the address of a string easily today using reflect.StringHeader:
https://play.golang.org/p/rvMHxQ_xoc-

  • Are you using reflect.StringHeader?
    • If not, why not?
  • What's the connection to runtime.moduledata.text?
    • What kind of API would be useful, and how would you specifically use it?
    • Can you give other examples of code in the wild that would benefit similarly?
  • What other compiler and/or runtime optimizations, if any, would allow you to avoid the need for exposing runtime internals?

@bcmills
Copy link
Contributor

bcmills commented Nov 19, 2018

(CC @aclements @randall77 for runtime.)

@larytet
Copy link
Author

larytet commented Nov 19, 2018

I use address of the string Instead of hashing the strings.

More detail would be helpful. You can access the address of a string easily today using reflect.StringHeader:
https://play.golang.org/p/rvMHxQ_xoc-

I see that my original description was not very helpful. I have modified it slightly. After I get the address of the string I want to use the address as an index in my cache/hashtable.

* Are you using `reflect.StringHeader`?
  
  * If not, why not?

* What's the connection to `runtime.moduledata.text`?

If I know the address of the text section I know what is the base address of all constant data in the executable and (theoretical maximum) size of the constant data. With the base address and size known I know how large my cache should be (how many strings should fit). My hashtable key is a trivial (addressOfTheString-TextBaseAddress)/8 where 8 is an alignment for strings in Go (x64?). A key in my cache is a string offset in the text section. The cache in this case is a simple array. No hashing is involved.

  * What kind of API would be useful, and how would you specifically use it?

A binary log like my small proof of concept here https://github.com/larytet/binlog/blob/master/binlog.go#L113 could use address of the text section moduledata.text instead of relying on /procfs/self/maps
Caching intermediate results in call to Log() appears to be the only way to implement sub 100ns/op logging framework. Here is a typical benchmark result https://stackoverflow.com/questions/10571182/go-disable-a-log-logger

  * Can you give other examples of code in the wild that would benefit similarly?

Functions doing operations with strings which change rarely can benefit. The functions can easily cache results and retrieve the results quickly. A good example is log.Printf(). API of log could "compile" the format strings and, arguably, process the arguments faster.

* What other compiler and/or runtime optimizations, if any, would allow you to avoid the need for exposing runtime internals?

This is not an easy question. Often the linkfile adds a global variable pointing to the ".text". The linkfile/linker venue is one approach.

I can hash the 64 bits virtual address of the string. This is slower than calculating the offset, but probably fast enough. Distributions of keys can be a problem. String offset from the .text base promises the perfect distribution and a sparse hashtable - better performance, worse memory footprint.

I can hash the string itself. This is what I do when I see the address of the format string does not fit the address range. This is the slowest option. Lookups in a small map alone will take 40-50ns/op.

runtime.moduledata is an absolutely gold mine. Bootstrap (?) does a lot of work to collect all the priceless data in one place. I tried to go:linkname, but could not figure out how to do it without duplicating the structures in my code like these projects do

@ianlancetaylor
Copy link
Contributor

Your suggestion relies on specific details about the current runtime implementation that are not guaranteed to be true for all implementations. If we exposed this information, and people relied on it in the way you suggest, we would be tying the hands of all later implementations. I think this would be a bad idea. If you want to be portable to other platforms and other versions of Go, use a map with a string key. If you don't care about being portable, use go:linkname and type reflection.

@larytet
Copy link
Author

larytet commented Nov 20, 2018

I need an instance of the type to use the type reflection, don't I?

@ianlancetaylor
Copy link
Contributor

Yes, I suppose you're right. Sorry.

@aclements
Copy link
Member

For the reasons Ian pointed out, we're definitely not going to expose a public API around runtime.moduledata. It's far too tied to the details of the implementation.

You're right that you can't linkname runtime.moduledata without duplicating the struct definition. linkname lets you access symbols directly, completely bypassing Go's type system, so you have to provide the type yourself (and nothing will check that it's right).

If you're willing to linkname things, however, you can get the base of the text segment without runtime.moduledata:

//go:linkname text runtime.text
var (
	text     struct{}
	textAddr = uintptr(unsafe.Pointer(&text))
)

The runtime.text symbol is placed at the beginning of the text segment, so the address of that symbol is the beginning of the text segment.

@larytet
Copy link
Author

larytet commented Nov 20, 2018

@aclements , This is a great tip. Do I have a way to get the size of the .text section as well?

@cherrymui
Copy link
Member

Do you have to use the start of .text as the base? Would any fixed address in the text section, like the address of main.main or runtime.main or a known string variable suffice? It is just a fixed offset.

@larytet
Copy link
Author

larytet commented Nov 20, 2018

@cherrymui, I am using the offset as an index in the array of processed (compiled) format strings. An address of the first constant string in the executable would be the best and ensure that my index starts from zero.
My Log(fmtStr string, args ...interface{}) API is looking in the array of processed strings first (cache). This way I am can hit 30-50ns/op for logging integers and probably can do better. For comparison fastest loggers are in 200-300ns range.

The sad truth about the production logging is that

  • we do not know when we will need a log
  • when we need the log we want it to be turned on from the last week

Most choose not to log, instead enable the logs manually when required.

@larytet
Copy link
Author

larytet commented Nov 20, 2018

I guess I will keep using /proc/self/maps

@larytet larytet closed this as completed Nov 20, 2018
@cherrymui
Copy link
Member

I agree that it is probably ideal to use the address of first string, but it should be suffice to use any given text symbol.

Another possibility of using just public API: using os.Executable to get the executable file, and using the debug/elf package (or other debug/* for non-ELF) to find out the addresses of the sections or symbols from the file. For position independent code, you can compute the fixed offset for the actual mapped address using any given symbol.

@larytet
Copy link
Author

larytet commented Nov 20, 2018

@cherrymui, I still need the size of the .text section

re ELF this is what https://github.com/martende/restartable does Platform dependent, lot of code.
os.Executable can return an empty string if the kernel removed the path. Linux Kernel needs only the filesystem inode to keep the file on the disk read only while the process runs. The process path is a best effort.

@larytet
Copy link
Author

larytet commented Nov 23, 2018

Looks like I end up with duplicating moduledata in my code anyway. I want to find all Go modules the executable (ELF) depends on, go/parse the relevant Go source files, go/ast all calls to binlog.Log(), collect the arguments, hash the format strings.

The first step - getting list of modules from the executable - has more than one possible approach

  • Read the ELF, look for pattern g\:(\/.+\.go)
  • moduledata (this one seems most reasonable)
  • debug information (?)

@golang golang locked and limited conversation to collaborators Nov 23, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants