New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/compile: coalesce consecutive calls to print/println #21417
Comments
Although special care is required if evaluating the arguments to the second println could panic. Maybe not a good starter issue after all. |
I came here to say, "don't do that!" because of argument evaluation. I use multiple printlns like this to diagnose certain kinds of failure. coalescing them into a single print would completely eliminate that possibility. In any case, we shouldn't care of println is especially efficient. You shouldn't be using it in production code. |
Of course; we can only do this if it is not detectable from user code. (This would be easier in SSA form, but it is still possible, using package gc's safeexpr.)
My goal here is shrinking runtime routines that use println, for smaller binaries and better instruction density. I would happily make println slower if it made the call sites smaller or use less stack. |
Tangential, but what about emitting blocks that contain print at the end of the function, as unlikely blocks? This would increase density of likely paths that never call print. |
This doesn't completely eliminate the ability to do that sort of debugging, though it does make it a lot more annoying:
But how much do we gain from doing this? |
Both clang and GCC do this optimization for C++ programs, for what that's worth. E.g.
gets collapsed into a single 'puts("abc")'. Q: why do this only with println? Why not apply the same coalescing to fmt.Printf calls? |
Good idea.
If done soundly (which was my intent), it doesn't impact it at all. It just limits the scope of the optimization.
My simple optimizations cut 0.5% from hello world. Given that all of that is from the runtime, it seems low priority but still worthwhile (as I said initially).
A few reasons. fmt is a bit high level for the compiler to be messing with. fmt's printing routines return values. In general, fmt's printing routines take care to make exactly one underlying write call. |
CLs 55095–55098 change how the compiler (and runtime) implement the built-ins print and println, in order to reduce the amount of code they generate. (The goal is to both reduce the runtime's binary footprint and to increase instruction density in important runtime routines.)
This issue spells out one further obvious optimization. Consider:
After CL 55098, this is gets compiled into: printlock, print "a\n", printunlock, printlock, print "b\n", printunlock. But it could get compiled into: printlock, print "a\nb\n", printunlock. In short, coalesce consecutive calls to print/println in a single printlock-protected sequence of calls, combining string constants as possible, taking care to correctly handle the spaces and newlines introduced by println.
Low priority, but might be an interesting learning exercise for someone interested in the mid-tier of the compiler. It's not super straightforward, since walkprint looks at one node at a time, but it also shouldn't too difficult.
The text was updated successfully, but these errors were encountered: