cmd/compile: generate wider loads when feasible #23907

klauspost · 2018-02-18T12:51:12Z

What version of Go are you using (`go version`)?

go version go1.10 windows/amd64

Does this issue reproduce with the latest release?

Yes. Tip not tested.

What did you do?

When working with file compression, a typical time-critical part is reading multiple bytes from a bit stream.

A fairly typical function looks like this:

// Uint32 will read a little endian uint32 from current offset.
func (b byteReader) Uint32() uint32 {
	b2 := b.b[b.off : b.off+4 : b.off+4]
	v3 := uint32(b2[3])
	v2 := uint32(b2[2])
	v1 := uint32(b2[1])
	v0 := uint32(b2[0])
	return (v3 << 24) | (v2 << 16) | (v1 << 8) | v0
}

This has a single bounds check, but has 4 loads to get each byte, before shifting and or'ing them into place. On my test setup this operates at 1.49 ns/op.

It seems like there is some SSA logic in place that can combine two adjacent byte loads (with shifts) into a single word load. This can be seen changing the last line to ((v3 << 24) | (v2 << 16)) | ((v1 << 8) | v0). On my machine, this improves time to 1.24 ns/op.

It is possible to also load the high word using a single word, bringing this to 0.85 ns/op, which it the best I can get with the current compiler. The assembler shows two 16 bit loads.

However, it should be possible for the compiler to automatically reduce the original test case to a single 32 bit load on little endian systems.

You can download the example file and benchmark file here.

What did you expect to see?

The compiler combining this to a single load on little endian platforms.

What did you see instead?

4 single byte loads.

The text was updated successfully, but these errors were encountered:

ALTree · 2018-02-18T13:07:36Z

You can change the last line of Uint32Typical to

return v0 | (v1 << 8) | (v2 << 16) | (v3 << 24)

i.e. reverse the order of the or operands, and a single load will be generated. So the compiler does recognize that pattern, but the mechanism is not very robust and you need to specify the or operators in ascending order.

klauspost · 2018-02-18T13:16:50Z

You are completely correct. Awesome! This seems like a small issue, then. I will close for now.

ALTree changed the title ~~cmd/compile: Generate wider loads when feasible~~ cmd/compile: generate wider loads when feasible Feb 18, 2018

ALTree added the Performance label Feb 18, 2018

klauspost closed this as completed Feb 18, 2018

golang locked and limited conversation to collaborators Feb 18, 2019

gopherbot added the FrozenDueToAge label Feb 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cmd/compile: generate wider loads when feasible #23907

cmd/compile: generate wider loads when feasible #23907

klauspost commented Feb 18, 2018 •

edited

Loading

ALTree commented Feb 18, 2018 •

edited

Loading

klauspost commented Feb 18, 2018

cmd/compile: generate wider loads when feasible #23907

cmd/compile: generate wider loads when feasible #23907

Comments

klauspost commented Feb 18, 2018 • edited Loading

What version of Go are you using (go version)?

Does this issue reproduce with the latest release?

What did you do?

What did you expect to see?

What did you see instead?

ALTree commented Feb 18, 2018 • edited Loading

klauspost commented Feb 18, 2018

klauspost commented Feb 18, 2018 •

edited

Loading

What version of Go are you using (`go version`)?

ALTree commented Feb 18, 2018 •

edited

Loading