Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime/race: report TSAN memory statistics #37233

Open
kevinburke1 opened this issue Feb 14, 2020 · 10 comments
Open

runtime/race: report TSAN memory statistics #37233

kevinburke1 opened this issue Feb 14, 2020 · 10 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@kevinburke1
Copy link

kevinburke1 commented Feb 14, 2020

I don't really have specific recommendations here, but I wanted to report this troubleshooting session, because this is the worst time I've ever had trying to troubleshoot Go program performance, I was at my wit's end trying to debug it, Go has a brand of being not difficult to troubleshoot, and "I can find the problems quickly" is a big part of the reason I like the language. So some time spent thinking about and fixing these issues seems useful.

The details of the program are not that important, but I had a pretty simple program that checks for rows in a Postgres database, updates their status if it finds one, then makes an HTTP request with the contents of the database row. I tested it by writing a single database row once per second, with a single worker. Via *DBStats I never had more than 4 database conns open at a time.

What I observed was high and growing RSS usage. In production this would result in the program being OOM killed after some length of time. Locally on my Mac, RAM use grew from 140MB to 850MB when I left it running overnight.

I tried all of the tricks I could find via blog posts:

  • disabling HTTP2 support
  • disabling MADV_DONTNEED
  • setting Connection: close on the HTTP Client
  • manually inspecting the code for goroutine leaks (constant use of around 20 goroutines)
  • manually inspecting the code for defers in a for loop in a long running function, unbounded array appends etc. (none)
  • ensuring HTTP bodies were fully consumed, DB queries closed after use, etc.

None of these made a difference, the memory usage kept growing. Running pprof and looking at the runtime memory statistics, I was struck by a difference between the reported values and the actual observed memory usage. These numbers are after the program had been running all night and Activity Monitor told me it was using 850MB of RAM.

# runtime.MemStats
# Alloc = 3410240
# TotalAlloc = 2834416608
# Sys = 213078288
# Lookups = 0
# Mallocs = 38955235
# Frees = 38943138
# HeapAlloc = 3410240
# HeapSys = 199819264
# HeapIdle = 194469888
# HeapInuse = 5349376
# HeapReleased = 192643072
# HeapObjects = 12097
# Stack = 1376256 / 1376256
# MSpan = 132600 / 196608
# MCache = 13888 / 16384
# BuckHashSys = 2045579
# GCSys = 8042776
# OtherSys = 1581421
# NextGC = 4194304
# LastGC = 1581710840991223000
# PauseNs = [...]
# PauseEnd = [...]
# NumGC = 1187
# NumForcedGC = 0
# GCCPUFraction = 4.524177278457185e-06
# DebugGC = false

Specifically there's only about 200MB of "Sys" allocated and that number stayed pretty constant even though Activity Monitor reported the program was using about 850MB of RSS.

Another odd thing was the inuse_space heap profile only reported about 340kB of RAM usage. This sample was taken after reducing runtime.MemProfileRate to 512 and after leaving the program running all night.

(pprof) top20
Showing nodes accounting for 313.47kB, 91.83% of 341.36kB total
Dropped 100 nodes (cum <= 1.71kB)
Showing top 20 nodes out of 130
      flat  flat%   sum%        cum   cum%
   76.76kB 22.49% 22.49%    76.76kB 22.49%  runtime.malg
   68.02kB 19.93% 42.41%    68.02kB 19.93%  bufio.NewReaderSize (inline)
      54kB 15.82% 58.23%       54kB 15.82%  github.com/rcrowley/go-metrics.newExpDecaySampleHeap (inline)
   15.35kB  4.50% 62.73%    57.89kB 16.96%  github.com/kevinburke/rickover/dequeuer.(*Dequeuer).Work
   15.35kB  4.50% 67.23%    15.35kB  4.50%  net/http.(*persistConn).roundTrip
   14.80kB  4.34% 71.56%    15.90kB  4.66%  database/sql.(*Rows).awaitDone
    9.99kB  2.93% 74.49%    13.55kB  3.97%  github.com/lib/pq.NewConnector
    8.77kB  2.57% 77.06%     8.77kB  2.57%  database/sql.(*Tx).awaitDone
    8.28kB  2.43% 79.48%    69.90kB 20.48%  github.com/lib/pq.(*Connector).open
       8kB  2.34% 81.83%        8kB  2.34%  bufio.NewWriterSize
       8kB  2.34% 84.17%        8kB  2.34%  hash/crc32.slicingMakeTable
    5.24kB  1.53% 85.71%    10.23kB  3.00%  github.com/lib/pq.(*conn).prepareTo
       4kB  1.17% 86.88%        4kB  1.17%  runtime.allgadd

I would have expected that number to be a lot higher.

Eventually I realized that I had been running this program with the race detector on - performance isn't super important but correctness is so I wanted to ensure I caught any unsafe reads or writes. Some searching around in the issue tracker revealed this issue - #26813 - which seems to indicate that the race detector might not clean up correctly after defers and recovers.

Recompiling and running the program without the race detector seems to eliminate the leak. It would be great to have confirmation that #26813 is actually the problem, though. github.com/lib/pq uses recover and panic quite heavily to do fast stack unwinds - similar to the JSON package - but as far as I can tell from adding print statements to my test program, it's not ever actually panicking, so maybe just the heavy use of recover and my use of defer context.cancel() is enough to set it off.

Look, it is fine if the race detector leaks memory, and I understand running programs in production for days on end with the race detector on is atypical. What bothered me more was how hard this was to track down, and how poorly the tooling did at identifying the actual problem. The only documentation on the runtime performance of the race detector states:

The cost of race detection varies by program, but for a typical program, memory usage may increase by 5-10x and execution time by 2-20x.

I read pretty much every blog post anyone had written on using pprof and none of them mentioned this issue, either.

It would be nice if runtime.MemStats had a section about memory allocated for the race detector, or if there was more awareness/documentation of this issue somewhere. I'm sure there are other possible solutions that I'm probably not aware of because I'm not super familiar

@dgryski
Copy link
Contributor

dgryski commented Feb 14, 2020

/cc @dvyukov

@dmitshur dmitshur added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Feb 15, 2020
@dmitshur dmitshur added this to the Backlog milestone Feb 15, 2020
@gopherbot
Copy link

Change https://golang.org/cl/220586 mentions this issue: doc/articles: mention memory leak in race detector docs

@gopherbot
Copy link

Change https://golang.org/cl/220586 mentions this issue: doc/articles/race_detector: mention memory leak potential

gopherbot pushed a commit that referenced this issue Feb 25, 2020
As far as I can tell, there is no public documentation on this topic,
which cost me several days of debugging.

I am possibly unusual in that I run binaries in production with the
race detector turned on, but I think that others who do the same may
want to be aware of the risk.

Updates #26813.
Updates #37233.

Change-Id: I1f8111bd01d0000596e6057b7cb5ed017d5dc655
Reviewed-on: https://go-review.googlesource.com/c/go/+/220586
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
@gopherbot
Copy link

Change https://golang.org/cl/221019 mentions this issue: [release-branch.go1.14] doc/articles/race_detector: mention memory leak potential

gopherbot pushed a commit that referenced this issue Feb 25, 2020
…ak potential

As far as I can tell, there is no public documentation on this topic,
which cost me several days of debugging.

I am possibly unusual in that I run binaries in production with the
race detector turned on, but I think that others who do the same may
want to be aware of the risk.

Updates #26813.
Updates #37233.

Change-Id: I1f8111bd01d0000596e6057b7cb5ed017d5dc655
Reviewed-on: https://go-review.googlesource.com/c/go/+/220586
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
(cherry picked from commit ba093c4)
Reviewed-on: https://go-review.googlesource.com/c/go/+/221019
Run-TryBot: Dmitri Shuralyov <dmitshur@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 7, 2022
@ainar-g
Copy link
Contributor

ainar-g commented Sep 5, 2022

Sorry for the noise, but is this still the case as of Go 1.19's update to thread sanitizer v3?

@gopherbot
Copy link

Change https://go.dev/cl/458218 mentions this issue: runtime: implement traceback iterator

gopherbot pushed a commit that referenced this issue Mar 10, 2023
Currently, all stack walking logic is in one venerable, large, and
very, very complicated function: runtime.gentraceback. This function
has three distinct operating modes: printing, populating a PC buffer,
or invoking a callback. And it has three different modes of unwinding:
physical Go frames, inlined Go frames, and cgo frames. It also has
several flags. All of this logic is very interwoven.

This CL reimplements the monolithic gentraceback function as an
"unwinder" type with an iterator API. It moves all of the logic for
stack walking into this new type, and gentraceback is now a
much-simplified wrapper around the new unwinder type that still
implements printing, populating a PC buffer, and invoking a callback.
Follow-up CLs will replace uses of gentraceback with direct uses of
unwinder.

Exposing traceback functionality as an iterator API will enable a lot
of follow-up work such as simplifying the open-coded defer
implementation (which should in turn help with #26813 and #37233),
printing the bottom of deep stacks (#7181), and eliminating the small
limit on CPU stacks in profiles (#56029).

Fixes #54466.

Change-Id: I36e046dc423c9429c4f286d47162af61aff49a0d
Reviewed-on: https://go-review.googlesource.com/c/go/+/458218
Reviewed-by: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
@nikandfor
Copy link

Looks like I've had the same entertainment for couple of days. I vote for info about race detector allocs and frees in runtime.MemStats.

@ayushmpb95
Copy link

It is still the case with GO 19.5 at least.

@thepudds
Copy link
Contributor

thepudds commented May 22, 2023

Sorry for the noise, but is this still the case as of Go 1.19's update to thread sanitizer v3?

As far as I understand (which might not be correct), the “bug” is an unimplemented feature on the Go side rather than on the TSAN side. See Dmitry’s comment here: #26813 (comment)

If that’s correct, I would expect that upgrading TSAN would not help.

Instead, the new iterator-based approach for stack unwinding would hopefully enable fixing the root cause of this memory growth. See for example the reference to #26813 by @aclements in the CL description for https://go.dev/cl/458218.

In any event, #26813 is probably the bug to follow if you want to track a fix for the root cause.

@aclements aclements changed the title runtime/race: leak in long-running programs, more transparent memory statistics runtime/race: report TSAN memory statistics Jul 25, 2023
@aclements
Copy link
Member

Since #26813 is about the bug that caused the memory leak, I've retitled this bug to focus on reporting memory used by the TSAN runtime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
Status: Triage Backlog
Development

No branches or pull requests

9 participants