-
Notifications
You must be signed in to change notification settings - Fork 18k
x/tools/gopls: deadlock due to telemetry worker #33692
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thank you for filing a gopls issue! Please take a look at the Troubleshooting section of the gopls Wiki page, and make sure that you have provided all of the relevant information here. |
Thank you for reporting this issue @muirrn! I shall send a fix for this immediately. |
Change https://golang.org/cl/190637 mentions this issue: |
Could you please try that patch and see if it fixes the problem for you? It
performs some best effort load shedding.
…On Sat, Aug 17, 2019 at 1:08 PM GopherBot ***@***.***> wrote:
Change https://golang.org/cl/190637 mentions this issue: internal/telemetry/export:
add load-shedding for workQueue
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#33692?email_source=notifications&email_token=ABFL3VYEZM6IQBWVOXD5IQDQFBECZA5CNFSM4IMMRCEKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4QRSQY#issuecomment-522262851>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABFL3V7YTB2DS3WOQ6TGHKLQFBECZANCNFSM4IMMRCEA>
.
|
Thanks for looking at this. It looks like your change would avoid the deadlock, but it would introduce substantial latency under heavy contention. There is also guaranteed loss of telemetry data whenever it hits the current deadlock case. It seems to me that the telemetry worker needs to be rearchitected in a more fundamental way. I'm sure @ianthehat already has great ideas, but I'm going to say my idea anyway: we could accumulate all telemetry messages for a single request in memory, and then kick off a goroutine to send them after the request completes. This way the telemetry messages for a given request are still sent in order, but they are always handled asynchronously with respect to LSP requests. If there is some requirement that the telemetry messages be processed "live" then this obviously wouldn't work. |
Right, it implements load shedding as a best effort and then drops overflowing events as it is a circular buffer of sorts, but ensures no deadlocks.
Events are processed online but the suggested batching will mean that the global ordering of events to process will be lost. Also, when metrics are collected, we'd now have to include timestamps too. Anyways, good thoughts and concerns @muirrn! We can discuss more as we work on this issue. |
Change https://golang.org/cl/190737 mentions this issue: |
I think the load shedding may cause other significant issues (incorrect counts in rare events would be very bad, but much worse is memory leaks caused by unmatched start/finish span pairs for instance) |
Thank you for the discourse @muirrn and @ianthehat! Sure, for now we can the mutex as Ian has posted up. |
My gopls on master (caa95bb) keeps locking up. The debug interface doesn't even respond. I attached via dlv and I saw goroutines were blocked here:
Furthermore, one of the goroutines was the telemetry worker goroutine itself. It was invoking a task that published to the workQueue channel (which must have been full, causing a deadlock).
/cc @ianthehat
The text was updated successfully, but these errors were encountered: