Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: assembly generated is bigger than previous versions #30229

Closed
mariecurried opened this issue Feb 14, 2019 · 7 comments
Closed

cmd/compile: assembly generated is bigger than previous versions #30229

mariecurried opened this issue Feb 14, 2019 · 7 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Milestone

Comments

@mariecurried
Copy link

mariecurried commented Feb 14, 2019

What version of Go are you using (go version)?

$ go version
go version go1.11.4 windows/amd64

Does this issue reproduce with the latest release?

Yes. I tried on tip and the same happens.

What did you do?

I compiled the following code and inspected the assembly instructions generated by the compiler:

func test(slc [][]int) (int, int) {
	var lentotal, lenslc int
	for _, x := range slc {
		lentotal += len(x)
		lenslc++
	}
	return lentotal, lenslc
}

Assembly code generated:

What did you expect to see?

I expected the compiler to generate code similar to the 1.10.1 version, because on tip it generates unnecessary jumps and an extra block of XOR's.

What did you see instead?

Instead, the compiler generated more code than what is necessary.
In the 1.10.1 version, the only thing that I think could be different is that, on line 5, the slice address is moved to CX, but it might not be necessary in the case that len(slc) is 0, which is well handled on tip.

Summing up, I believe the code should look something like:

        pcdata  $2, $0
        pcdata  $0, $0
        xorl    DX, DX
        xorl    BX, BX
        movq    "".slc+16(SP), AX
        testq   AX, AX
        jle     test_pc37
        pcdata  $2, $1
        pcdata  $0, $1
        movq    "".slc+8(SP), CX
        jmp     test_pc25
test_pc21:
        addq    $24, CX
test_pc25:
        addq    8(CX), BX
        incq    DX
        cmpq    DX, AX
        jlt     test_pc21
test_pc37:
        pcdata  $2, $0
        movq    BX, "".~r1+32(SP)
        movq    DX, "".~r2+40(SP)
        ret

Regarding the test_pc21 block, it could disappear, as is done in the 1.10.1 version.

@mvdan mvdan added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Feb 14, 2019
@mvdan
Copy link
Member

mvdan commented Feb 14, 2019

Is this about performance, binary size, or just correctness?

@mariecurried
Copy link
Author

In terms of correctness, I believe both versions are correct in the sense that they produce the desired behavior.
In this issue, my point is more related towards better performance and smaller binary size, by eliminating redundant JUMP instructions and unneeded assembly code blocks.

@mvdan
Copy link
Member

mvdan commented Feb 14, 2019

Fair enough - was just wondering to appropriately label the issue :)

If you'd like to help get this issue fixed faster, you could try bisecting which Go commit introduced the regression. For example, you could bisect between the go1.10 and go1.11 tags, running make.bash at each step, building a tiny program, and checking its size or number of assembly instructions.

@mariecurried
Copy link
Author

After following your advice, I found that the commit that started generating this code was 837ed98, which makes sense.
According to the commit message, I believe the test_pc21 block makes sense to exist to prevent the past-the-end pointer. Now, what I think could be made better is the duplicated XOR's, which could be put together as the first instructions in the function, as shown in the assembly I wrote above.

@mvdan
Copy link
Member

mvdan commented Feb 14, 2019

/cc @aclements @randall77 @dr2chase; please see the comments above.

@randall77
Copy link
Contributor

Looks like the phi tighten pass introduces the duplicate xors. If I turn that pass off, then the register allocator also introduces duplicate xors. I can't turn off the register allocator :(

The reason for both of those behaviors is that we're trying to avoid excess register pressure by loading constants into registers as late as possible. For an example reason why, see #16407. It sounds like we're being a bit too aggressive in that regard, but I'm not sure how to design a knob to adjust it, or how to decide where to set it.

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 13, 2022
@mknyszek mknyszek moved this to Triage Backlog in Go Compiler / Runtime Jul 15, 2022
@seankhliao seankhliao added this to the Unplanned milestone Aug 20, 2022
@mariecurried
Copy link
Author

Fixed in Go 1.20

@github-project-automation github-project-automation bot moved this from Triage Backlog to Done in Go Compiler / Runtime Aug 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Projects
Development

No branches or pull requests

5 participants