Skip to content

reflect: large performance regression in reflect.New #16161

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rhysh opened this issue Jun 23, 2016 · 16 comments
Closed

reflect: large performance regression in reflect.New #16161

rhysh opened this issue Jun 23, 2016 · 16 comments

Comments

@rhysh
Copy link
Contributor

rhysh commented Jun 23, 2016

Please answer these questions before submitting your issue. Thanks!

  1. What version of Go are you using (go version)?

go1.7beta2: go version devel +fca9fc5 Thu Jun 16 19:45:33 2016 +0000 linux/amd64

  1. What operating system and processor architecture are you using (go env)?

linux/amd64

  1. What did you do?

I have a program that spends most of its time marshaling and unmarshaling data that's json- and protobuf-encoded. It's been running on go1.6.x for several months (currently go1.6.2), and I upgraded it to go1.7beta2.

It's a closed-source project, and I don't currently have a small reproducer.

  1. What did you expect to see?

I expected no major performance regressions.

  1. What did you see instead?

The program is significantly slower, with many more cycles spent in calls to reflect.New. Over 1/3rd of the program's execution time is spent in calls to reflect.(*rtype).ptrTo, which didn't even show up on profiles of the program under go1.6.2. Most of the time with ptrTo on the stack is in calls to RLock and RUnlock a *sync.RWMutex. The ptrTo cache appears to have been removed in February in 30f93f0.

Here's a bit of a go1.6.2 profile:

(pprof) peek reflect.New
126.07s of 138.79s total (90.84%)
Dropped 526 nodes (cum <= 0.69s)
----------------------------------------------------------+-------------
      flat  flat%   sum%        cum   cum%   calls calls% + context          
----------------------------------------------------------+-------------
                                             2.38s 54.71% |   redacted.path/vendor/github.com/golang/protobuf/proto.(*Buffer).dec_struct_message
                                             1.97s 45.29% |   redacted.path/vendor/github.com/golang/protobuf/proto.(*Buffer).dec_slice_struct
     0.33s  0.24%  0.24%      4.35s  3.13%                | reflect.New
                                             3.90s   100% |   reflect.unsafe_New
----------------------------------------------------------+-------------

And here's a bit of the profile with go1.7beta2:

(pprof) peek reflect.New
184.09s of 197.76s total (93.09%)
Dropped 587 nodes (cum <= 0.99s)
----------------------------------------------------------+-------------
      flat  flat%   sum%        cum   cum%   calls calls% + context          
----------------------------------------------------------+-------------
                                            39.29s 51.75% |   redacted.path/vendor/github.com/golang/protobuf/proto.(*Buffer).dec_struct_message
                                            36.64s 48.25% |   redacted.path/vendor/github.com/golang/protobuf/proto.(*Buffer).dec_slice_struct
     0.35s  0.18%  0.18%     75.93s 38.40%                | reflect.New
                                            67.65s 89.57% |   reflect.(*rtype).ptrTo
                                             7.88s 10.43% |   reflect.unsafe_New
----------------------------------------------------------+-------------
(pprof) peek ptrTo
184.09s of 197.76s total (93.09%)
Dropped 587 nodes (cum <= 0.99s)
----------------------------------------------------------+-------------
      flat  flat%   sum%        cum   cum%   calls calls% + context          
----------------------------------------------------------+-------------
                                            67.65s   100% |   reflect.New
     4.87s  2.46%  2.46%     67.65s 34.21%                | reflect.(*rtype).ptrTo
                                            37.70s 60.06% |   sync.(*RWMutex).RLock
                                            23.76s 37.85% |   sync.(*RWMutex).RUnlock
                                             1.31s  2.09% |   runtime.mapaccess1_fast64
----------------------------------------------------------+-------------
@randall77
Copy link
Contributor

@crawshaw

@potocnyj
Copy link
Contributor

I just tested this and observed a 25% slowdown on JSON unmarshaling for one of our own systems. I can work on making a reproducer program if that would help.

@crawshaw
Copy link
Member

What a pity. We have a spare 4 bytes I can use for a ptrTo type offset, though I had been hoping to remove that in 1.8. I suppose for now I can add it back and try to come up with some better solution in 1.8.

@ianlancetaylor
Copy link
Member

We could also consider adding NewPtr

// NewPtr returns a Value representing a pointer of the specific type initialized to point to a new zero
// value of the pointer's element type. This is equivalent to New(typ.Elem()).
func NewPtr(typ Type) Value {
    if typ == nil || typ.Kind() != Ptr {
        panic("reflect.NewPtr of non-pointer type")
    }
    ptr := unsafe_New(typ.Elem().(*rtype))
    return Value{typ.(*rtype), ptr, flag(Ptr)}
}

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/24400 mentions this issue.

@crawshaw
Copy link
Member

@rhysh if you get a chance I'd appreciate it if you could run your performance test again with tip.

@rhysh
Copy link
Contributor Author

rhysh commented Jun 23, 2016

@crawshaw the performance is back to usual go1.6.2 levels with e75c899. Thanks for turning this around so quickly.

@cespare
Copy link
Contributor

cespare commented Jun 23, 2016

Do the JSON decoding benchmarks not capture this regression?

@potocnyj
Copy link
Contributor

I retested my benchmark with e75c899, ans I still see a 15% regression relative to Go1.6.2.

Looking at a profile comparison, I see fairly substantial time increases in reflect.(*rtype).Name (7% vs. 2%) and reflect.(*rtype).NumMethod (8% vs 1%). Also there's more time being spent in sync/atomic.AddUint32, although that appears to be coming from reflect.(*rtype).NumMethod.

I've got a copy of my benchmark available here for those interested: https://play.golang.org/p/CiB34ge82s

@quentinmit
Copy link
Contributor

@cespare: The JSON decoding benchmarks do capture this regression, yes. See our analysis in #16117.

@crawshaw
Copy link
Member

@potocnyj I tried that benchmark and found it too noisy to see a 15% variation. How exactly are you running it?

@potocnyj
Copy link
Contributor

@crawshaw I'm running it in batches of 10 and using https://github.com/rsc/benchstat to compare the average runtime of the benchmark with Go1.6.2 vs Go tip. Specifically, here's what my commands are:

go test -bench=. -benchtime=2s -count=10 >go1.6.txt
gotip test -bench=. -benchtime=2s -count=10 >gotip.txt
benchstat go1.6.txt gotip.txt

Note that in my local environment, go is go1.6.2 and gotip is my local source copy of the Go development tree.

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/24433 mentions this issue.

@crawshaw
Copy link
Member

@potocnyj just a quick note on your benchmark:

With GOARCH=386 it is currently 16% slower than 1.6. If I add an IsNamed() bool method to reflect.Type and replace v.Type().Name() == "" with v.Type.isNamed() in encoding/json/decode.go, it becomes 11% faster than 1.6. So the cost of computing the name in non-empty cases can be significant still in JSON decoding.

I'm reluctant to propose adding a method for 1.7, as while it's a small change, it's an API change. But maybe for 1.8. Alternatively at the cost of some binary size I can reduce the cost of calculating name, but it is unlikely to ever be as cheap as a boolean method.

@potocnyj
Copy link
Contributor

@crawshaw thanks for the update here - as a comparison I ran my benchmark this morning with the latest from tip (go version devel +25a6095) and it looks like there may be a net improvement now for amd64 compared to 1.6. It's still pretty noisy as you mentioned previously but things look to be substantially improved 👍

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/24521 mentions this issue.

@golang golang locked and limited conversation to collaborators Sep 30, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

8 participants