Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: optimize TrailingZerosNN by using info from prove pass #25077

Closed
josharian opened this issue Apr 25, 2018 · 6 comments
Closed

cmd/compile: optimize TrailingZerosNN by using info from prove pass #25077

josharian opened this issue Apr 25, 2018 · 6 comments

Comments

@josharian
Copy link
Contributor

A very typical use of math/bits.TrailingZerosNN is to visit all set bits.

var sink int

func visitbits(x uint64) {
	for x != 0 {
		sink = bits.TrailingZeros64(x) // use result
		x &= x - 1
	}
}

(Among other things, I'd like to do this in some hot code in the runtime.)

This currently compiles on amd64 to:

"".visitbits STEXT nosplit size=40 args=0x8 locals=0x0
	0x0000 00000 (x.go:7)	TEXT	"".visitbits(SB), NOSPLIT, $0-8
	0x0000 00000 (x.go:7)	FUNCDATA	$0, gclocals·2a5305abe05176240e61b8620e19a815(SB)
	0x0000 00000 (x.go:7)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0000 00000 (x.go:7)	MOVQ	"".x+8(SP), AX
	0x0005 00005 (x.go:8)	JMP	34
	0x0007 00007 (x.go:9)	BSFQ	AX, CX
	0x000b 00011 (x.go:9)	MOVL	$64, DX
	0x0010 00016 (x.go:9)	CMOVQEQ	DX, CX
	0x0014 00020 (x.go:9)	MOVQ	CX, "".sink(SB)
	0x001b 00027 (x.go:10)	LEAQ	-1(AX), CX
	0x001f 00031 (x.go:10)	ANDQ	CX, AX
	0x0022 00034 (x.go:8)	TESTQ	AX, AX
	0x0025 00037 (x.go:8)	JNE	7
	0x0027 00039 (<unknown line number>)	RET

However, since we know that x != 0 in the loop, we should be able to skip the CMOVEQ stuff and just use BSFQ directly. Similar (though less dramatic) optimizations apply for small int widths.

This seems like information that the prove pass has readily available. I haven't been able to figure out how to use it, though; the prove pass is complicated, and I'm not sure where best to make the change.

There seem like two reasonable options for encoding the information: Add new Ctz8NonZero ops, and change the value's op from Ctz8 to Ctz8NonZero, or giving Ctz8 an auxint indicating non-zero-ness. I weakly prefer the former.

@rasky any chance you could help out on this, if it is easy?

cc also @aclements

@josharian josharian added this to the Go1.11 milestone Apr 25, 2018
@rasky
Copy link
Member

rasky commented Apr 25, 2018

It's actually easy to add to prove, once you know where to look :)

You need to add this transformation (Ctz8 -> Ctz8NonZero) within simplifyBlock. It currently just handles OpSlicemask, so change it to a switch to make it handle more ops. Once you're there, use ft.isNonNegative() to gate the transformation.

@josharian
Copy link
Contributor Author

We need x > 0, not x >= 0. Hints for doing that? Thanks for being my guide. :)

@rasky
Copy link
Member

rasky commented Apr 25, 2018

Right, so copy the ft.limits[v.ID] check from isNonNegative, and tweak it.

I'm not 100% that ft.limits has the right information though. Try dumping the limits content for the right variable at the right point.

If it doesn't, we need to teach update() that v!=0 for an unsigned variable implies lim.umin>0. I can help you with that.

@rasky
Copy link
Member

rasky commented Apr 25, 2018

And BTW, this variant surely has the right limit information already:

for x > 0 {
    sink = bits.TrailingZeros64(x) // use result
    x &= x - 1
}

@josharian
Copy link
Contributor Author

Thanks, @rasky, this helps a lot.

@gopherbot
Copy link

Change https://golang.org/cl/109358 mentions this issue: cmd/compile: use prove pass to detect Ctz of non-zero values

@golang golang locked and limited conversation to collaborators Apr 26, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants