Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: use MiniDumpWriteDump for GOTRACEBACK=crash on Windows #49471

Open
zx2c4 opened this issue Nov 9, 2021 · 37 comments
Open

runtime: use MiniDumpWriteDump for GOTRACEBACK=crash on Windows #49471

zx2c4 opened this issue Nov 9, 2021 · 37 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. Proposal Proposal-Accepted
Milestone

Comments

@zx2c4
Copy link
Contributor

zx2c4 commented Nov 9, 2021

CL 307372 was committed, which contained a nice change to produce crashdumps under certain circumstances. However, several questions were proposed about when those crashdumps would be automatically uploaded to Microsoft in the context of using Insider Builds. In the absence of good answers to that, the otherwise okay CL was reverted (by me) with CL 362454. This proposal is essentially about reverting the revert, and having an extended discussion about under which circumstances these should be enabled and whether we need a new flag for it, as suggested by @bufflig.

Before this proposal moves to an "under consideration" step, we should get a clearer idea about what we're proposing. I'll defer to @zhizheng052 and @bhiggins on this, as they wrote the original CL.

CC @zhizheng052 @bhiggins @bufflig @ianlancetaylor @alexbrainman @aarzilli @aclements

@gopherbot gopherbot added this to the Proposal milestone Nov 9, 2021
@ianlancetaylor ianlancetaylor added this to Incoming in Proposals (old) Nov 9, 2021
@zx2c4
Copy link
Contributor Author

zx2c4 commented Nov 9, 2021

In the CL discussion, there was some question as to when "optional" diagnostics gets enabled. I was just installing a fresh Windows 11 board a few minutes ago and noticed that this was checked by default:

image

@aarzilli
Copy link
Contributor

aarzilli commented Nov 9, 2021

It seems to me that if you opt to send you crash dumps to microsoft during install (by not opting out) and then opt into creating a crash dump by setting GOTRACEBACK=crash, then having your crash dump uploaded to microsoft is the expected behavior. Given that this only happens on insider builds, it isn't even that sketchy that the option is preselected.
Since this is already gated by two options I don't think adding a third opt-in (through a new flag) is going to make a difference. IMO it's either:

  • add a courtesy reminder that windows does this in the documentation of GOTRACEBACK (and maybe the release notes), or
  • do not implement GOTRACEBACK=crash on windows and document that this is the reason.

@zx2c4
Copy link
Contributor Author

zx2c4 commented Nov 9, 2021

Given that this only happens on insider builds, it isn't even that sketchy that the option is preselected.

This wasn't an insider build; I was installing the Windows 11 final release.

@aarzilli
Copy link
Contributor

aarzilli commented Nov 9, 2021

Given that this only happens on insider builds, it isn't even that sketchy that the option is preselected.

This wasn't an insider build; I was installing the Windows 11 final release.

That sucks. Still, I don't think it changes the rest of my opinion.

@bufflig
Copy link
Contributor

bufflig commented Nov 9, 2021

I assume there's something in the registry that enables the "sending to MS" behavior. I wonder if we could detect that and either:

a) Not enable WER or
b) Enable WER but warn the user or
c) Simply refuse to run when you have that combination (GOTRACEBACK=crash and uploading is enabled)?

Enabling GOTRACEBACK=crash even when you don't actually want the file on disk is not an unreasonable functionality, and was there before this change (it gives you some additional crash information). To get the added behavior that you upload crashdumps to a third party does not seem expected in those cases. At least a stern warning seems mandated, but I'd actually prefer to disable/revert the change for 1.18 and decide exactly how we want to proceed.

@zx2c4
Copy link
Contributor Author

zx2c4 commented Nov 9, 2021

I'd actually prefer to disable/revert the change for 1.18 and decide exactly how we want to proceed.

This has already happened, FYI.

There's also the WER option to pop up a UI asking for consent. I suspect this won't work in services, though.

Also, I wonder if we could ditch this whole WER set of options and just create a minidump ourselves? https://docs.microsoft.com/en-us/windows/win32/api/minidumpapiset/nf-minidumpapiset-minidumpwritedump

@elagergren-spideroak
Copy link

To clarify: this is only when GOTRACEBACK=crash is set, right?

@rsc rsc changed the title proposal: enable WER dumps for Windows under certain circumstances proposal: runtime: enable WER dumps for Windows under certain circumstances Dec 1, 2021
@rsc
Copy link
Contributor

rsc commented Dec 1, 2021

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@rsc rsc moved this from Incoming to Active in Proposals (old) Dec 1, 2021
@zx2c4
Copy link
Contributor Author

zx2c4 commented Dec 1, 2021

From the top post:

Before this proposal moves to an "under consideration" step, we should get a clearer idea about what we're proposing.

For the record, I still don't quite know what we're proposing here.

@rsc
Copy link
Contributor

rsc commented Dec 8, 2021

At least the proposal discussion will get more eyes on it.
What are the options? Is it possible to write a crash dump that doesn't get collected and sent off?

@rsc
Copy link
Contributor

rsc commented Jan 5, 2022

It sounds like we are still waiting to find out what exactly is being proposed here.
Does anyone know?

@rsc
Copy link
Contributor

rsc commented Jan 19, 2022

We are still waiting to find out exactly what is being proposed here.
If we can't figure that out, it seems like this would be a likely decline.

@alexbrainman
Copy link
Member

To clarify: this is only when GOTRACEBACK=crash is set, right?

Correct.

It sounds like we are still waiting to find out what exactly is being proposed here.
Does anyone know?

I reviewed CL 307372. The change makes Go runtime create process crash dumps that can be used by Delve to debug crashes.

The feature is enabled when you set GOTRACEBACK=crash environment variable.

There are also other things that you need to do to enable this feature. Here are the instructions

https://docs.microsoft.com/en-us/windows/win32/wer/collecting-user-mode-dumps

If we can't figure that out, it seems like this would be a likely decline.

Hopefully my explanation is good enough for you to reconsider.

I am not planing to use this feature today. But I can see how it can be life saving for me and others when they have nothing but crashing program. And crash dump is not good enough.

Also these crash dumps are standard on Windows. Other vendors use them for exactly the same purpose. So I don't see why Go should not implement features available to other tool developers.

Alex

@alexbrainman
Copy link
Member

What I also want to add to my comment above is that this feature was requested nearly 5 years ago in #20498. Brad labeled it as NeedsFix and Ian as help wanted, so assumed it was OK to implement this feature.

Different people tried to implement this feature. But the implementation required quite a bit of effort and manual testing.

What made me put extra effort with CL 307372 (most recent attempt) is that I discovered that Delve can now actually use the dump files generated by this change. Which makes this feature quite more useful to more people.

I understand concerns that Jason raised about his crash dumps sent to Microsoft. But he can use GOTRACEBACK environment variable or some settings from

https://docs.microsoft.com/en-us/windows/win32/wer/collecting-user-mode-dumps

to stop that.

This feature is what Windows does. If we want crash dumps, we have to accept whatever policies Microsoft enforce onto that feature. We can make it clear in the docs what happens when user sets GOTRACEBACK=crash.

Alex

@rsc
Copy link
Contributor

rsc commented Jan 26, 2022

Hi @alexbrainman, thanks for the information. This sounds like a very useful, important feature for users.

I tried to read https://docs.microsoft.com/en-us/windows/win32/wer/collecting-user-mode-dumps and am confused about the implications with sending data to Microsoft.

It says "This feature is not enabled by default."
Does that mean that you have to edit a registry setting to make GOTRACEBACK=crash work?
Does it mean that MS doesn't collect the crashes by default?
Or if not, what exactly is the registry setting that has to be changed to stop uploads of GOTRACEBACK=crash-generated files to MS?

Thanks.

@alexbrainman
Copy link
Member

It says "This feature is not enabled by default."
Does that mean that you have to edit a registry setting to make GOTRACEBACK=crash work?

GOTRACEBACK=crash will still output appropriate stack trace. But GOTRACEBACK=crash will not create crash dump files until you edit your registry. That is how I read Microsoft doco. And that is what I see on my PC.

Does it mean that MS doesn't collect the crashes by default?

I don't believe so. But don't take my word for it. Reading

https://docs.microsoft.com/en-us/windows/win32/wer/using-wer

it talks about Windows OS crashes - see WER sends the report to Microsoft (Watson Server) if the user consented. bit. But we are talking about our program crash dumps. That would be too wasteful for Microsoft to collect all our program dumps.

what exactly is the registry setting that has to be changed to stop uploads of GOTRACEBACK=crash-generated files to MS?

I looked at my friend's PC (with registry untouched by hackers like me), and it does have HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\Windows Error Reporting key, but there is no LocalDumps key inside. See https://docs.microsoft.com/en-us/windows/win32/wer/collecting-user-mode-dumps for details.

This registry config does not create any crash dumps regardless of GOTRACEBACK=crash setting.

If you have any Windows PC, you can check its registry yourself. You can use regedit program, if you have GUI access to your PC (direct console or Remote Desktop).

Alternatively you can write Go program to list that registry key. If you start with internal/syscall/windows/registry.TestReadSubKeyNames and adjust couple of lines, you should be able to run that test on Go builders to see what their regstry settings are. If you care.

Alex

@beoran
Copy link

beoran commented Feb 2, 2022

@zx2c4 's suggestion to create a minidump should be considered also as a more privacy focused alternative.

@rsc
Copy link
Contributor

rsc commented Feb 2, 2022

It sounds like if we make this change, then developers who have not edited the registry will not see any change at all.
And developers who did edit the registry will get exactly what those edits imply. That sounds OK too.
So it sounds like we should do this?

Does anyone object to accepting this proposal?

@alexbrainman
Copy link
Member

@zx2c4 's suggestion to create a minidump should be considered also as a more privacy focused alternative.

I don't know what you mean by "minidump". But I tested CL 307372 with DumpType registry value set to 2. I also tried dumps created with DumpType registry value of 1, but these are not recognised by Delve.

You can see CL 307372 commit message explaining DumpType values that we tried. Just like I explained it here.

Alex

@aarzilli
Copy link
Contributor

aarzilli commented Feb 3, 2022

@alexbrainman @beoran It seems to me that there is some confusion brewing here around the term "minidump" stemming from microsoft decision to use "minidump" and "mini dump" to mean two different things:

  • minidump is a file format used on windows for userspace core dumps, roughly equivalent to ET_CORE elf files. The term is used in contrast to crashdump, which is the format of operating system core dumps. The contents of a minidump are determined by the MINIDUMP_TYPE enumeration
  • mini dump is a minidump with MINIDUMP_TYPE set to MiniDumpNormal. This is in contrast with a "full dump" which includes the full memory of the process.

@zx2c4 was suggesting above to call MiniDumpWriteDump from dbghelp.dll directly, instead of letting WER do it. Either way a minidump is produced and either way the result can be a mini dump or a full dump.

AFAIK mini dumps are recognized by Delve but there is nothing useful in them (because of how goroutines are implemented), which is why it might seem that they aren't.

@zx2c4
Copy link
Contributor Author

zx2c4 commented Feb 3, 2022

It sounds like if we make this change, then developers who have not edited the registry will not see any change at all.
And developers who did edit the registry will get exactly what those edits imply. That sounds OK too.
So it sounds like we should do this?

Does anyone object to accepting this proposal?

Didn't we learn at some point that Windows beta users (called "insiders") get this registry key turned on?

@zx2c4
Copy link
Contributor Author

zx2c4 commented Feb 3, 2022

@zx2c4 was suggesting above to call MiniDumpWriteDump from dbghelp.dll directly, instead of letting WER do it. Either way a minidump is produced and either way the result can be a mini dump or a full dump.

Indeed this still seems like the best solution.

@aarzilli
Copy link
Contributor

aarzilli commented Feb 3, 2022

Indeed this still seems like the best solution.

In theory it makes no difference, if microsoft wants to upload your core dumps they can hook MiniDumpWriteDump just like they can hook WER. The consent they ask is broad.
In practice I don't know if there is any difference, has anyone checked that WER actually uploads anything by default and MiniDumpWriteDump doesn't?

@zx2c4
Copy link
Contributor Author

zx2c4 commented Feb 3, 2022

WER means we hand the state of a crashing app to the WER API and say "do something with it." As an app developer, I can actually register to receive those dumps on Microsoft Partner Portal. With MiniDumpWriteDump, the third param is a file handle, which means we're in control. I mean sure, Microsoft could shim this and intercept everything; it could do all this behind the scenes too without any API call. But that seems a bit much tinfoil right? That'd be a Big Deal if they were doing that, whereas WER is kind of already marked as part of Microsoft's "crash into this blackbox" apparatus.

@rsc
Copy link
Contributor

rsc commented Feb 9, 2022

@aarzilli Thanks for the tutorial on "minidumps" vs "mini dumps". You said that "mini dumps" are recognized by Delve but basically useless because they don't have memory contents. What about "minidumps" as generated by MiniDumpWriteDump? Do they work with Delve?

If the WER API and MiniDumpWriteDump both write files that work with Delve, and
if we have uncertainty about the WER API and uploads to MS but have no uncertainty about MiniDumpWriteDump,
then it seems like the choice is clear: use MiniDumpWriteDump.

But are both of those if statements true?

@aarzilli
Copy link
Contributor

aarzilli commented Feb 9, 2022

What about "minidumps" as generated by MiniDumpWriteDump? Do they work with Delve?

As far as I can tell procdump.exe uses MiniDumpWriteDump to write its minidumps and Delve works with those.

@rsc
Copy link
Contributor

rsc commented Feb 16, 2022

OK, well if MiniDumpWriteDump works with Delve, then my previous complex conditional reduces to "the choice is clear: use MiniDumpWriteDump".

Does anyone object to using MiniDumpWriteDump instead of the WER API?

@rsc rsc changed the title proposal: runtime: enable WER dumps for Windows under certain circumstances proposal: runtime: use MiniDumpWriteDump for GOTRACEBACK=crash on Windows Feb 23, 2022
@rsc
Copy link
Contributor

rsc commented Feb 23, 2022

Retitled to what I believe we converged on. Any objections?

@zx2c4
Copy link
Contributor Author

zx2c4 commented Feb 23, 2022

Seems okay to me.

@rsc
Copy link
Contributor

rsc commented Mar 2, 2022

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

@rsc rsc moved this from Active to Likely Accept in Proposals (old) Mar 2, 2022
@rsc rsc moved this from Likely Accept to Accepted in Proposals (old) Mar 9, 2022
@rsc
Copy link
Contributor

rsc commented Mar 9, 2022

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

@rsc rsc changed the title proposal: runtime: use MiniDumpWriteDump for GOTRACEBACK=crash on Windows runtime: use MiniDumpWriteDump for GOTRACEBACK=crash on Windows Mar 9, 2022
@rsc rsc modified the milestones: Proposal, Backlog Mar 9, 2022
@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 7, 2022
@iglendd
Copy link

iglendd commented Oct 3, 2022

Thank you guys for considering this parity (with Linux) implementation. I would like to humbly mention a few hopefully relevant items.

  • The fact that Go-program panics are not crashing is a bit odd but not uncommon. As far as I understand, the Java-runtime does the same thing. Java though allows to dump "crash" data in textual form (if you add -XX:ErrorFile=file flag. This makes it very easy to post process crashing call stack. It is like a Go Panic output but it persists. In many circumstances collecting panics from stderr is not practical. We would be happy if panic is persisted in a file too if some configuration specified, e.g. GOTRACEBACK=2+filepath.
  • Additional oddity with lack of Go-based crashes is absence of Event Log ID 1000 by "Application Crash" source which is automatically generated by Windows when an application is really crashing. Number of solutions are tracking this event log entry EventTracker KB --Event Id: 1000 Source: Application Error yet currently there is no way to detect Go-based panics from that. I was hoping that GOTACEBACK=crash would at least enabled it but it is still in TODO state. Your previous merge which you have rolled back would be perfect for this, even if a dump had never been created.
  • Ability to generate a dump from within crashing application, which had been discussed in this thread, via MiniDumpWriteDump directly would be indeed welcomed and appears trivial but it is a nuanced feature. Ideally it needs to be done from other thread (which is documented) or even process (this is one of the reason crashpad replacement of breakpad). The other problem that you have to be careful to manage space and have some constrained in places. Seems to be Collecting User-Mode Dumps already provides globally and per app. But again doing it programmatically would be useful.
  • I do not get the utility of WER moving my applications mini crash dump to Microsoft. Sure it will help them to make their program better. At some point one could also login into their portal and browse one's application dump but I do not think it is possible nowadays. And so I see a little utility, IMHO, to facilitate that although it may be useful for some people. And yes it appears that how dumps are generated and handled via WER and Collecting User-Mode Dumps are not related to each other.
  • Philosophically, I like the approach used in Linux and in formally stated the GOTRACEBACK=crash documentation which I can rephrase as, we crash, OS handle that. I am Ok with that. Current WER or User Crash Dump or AeDebug configurations would be observed (it is not a bad thing) and behavior would follow other applications crashing on Windows behavior. Additional goodies (like custom dump creation, perhaps adding ability to customize further panic control, etc) are certainly welcome but at least default Windows behavior would be a big step forward IMHO.
  • Last but not least I would love to have a global defer where I could handle go panics and even generate dumps via call to Go runtime. It is not possible now. Instrumenting 1k goroutes in a large applications to add panic handler is a huge pain and still does not handle 3rd party Go modules goroutines panics. Moreover it will not handle a crash within CGO callback thread/code.

Sorry for a large wish list :) but may be you can accommodate or at least consider some of it :)

@qmuntal
Copy link
Contributor

qmuntal commented Nov 11, 2022

These are all good comments @iglendd, I suggest you submit a separate proposal so we can discuss without polluting this thread.

Heads up: I'm implementing this proposal in the form it has been approved. Won't be able to submit it before go1.20 freeze though.

@gopherbot
Copy link

Change https://go.dev/cl/458955 mentions this issue: runtime: write minidump when crashing on windows

@qmuntal
Copy link
Contributor

qmuntal commented Dec 21, 2022

⚠️ After testing CL 458955, using it for a while and sharing it internally, I would not recommend merging it, even more, I would step back and rethink this proposal.

The downsides of the current proposal have already been presented by @iglendd in his previous comment (without much attention from myself and others):

  • The elephant in the room: it can deadlock. MiniDumpWriteDump suspends all the threads (even the non-Go threads) and tries to acquire the global heap lock. If the lock was already hold by on of the suspended threads, the process hangs. The only recommended and reliable way is to call it from another process. Which process should we launch? Who provides it?
  • A minidump of a simple helloworld app weights 40Mb. This means that setting GOTRACEBACK=crash will flood the temporary directory pretty quickly. This proposal don't say how to define an alternate directory nor a rotation policy.
  • There are many things that can go inside a minidump, This proposal don't say how to specify the minidump granularity.
  • I don't want to set GOTRACEBACK=crash every time I run a Go app just in case it crashes, but I can't set GOTRACEBACK=crash globally because I don't want to generate minidumps of applications I don't care about (they weight a lot!).

This makes me think that creating a minidump by just setting GOTRACEBACK=crash is not enough, and even unexpected. Generating a minidump should be more intentional and we should provide the knobs to tweak the generation process.

Then you have WER. WER solves all the previous issues, and more, without adding any burden on our side. IMO, we would provide more value if we focus on deciding how to enable WER rather than implementing an alternate minidump experience.

CL 458955 is still there in case my arguments haven't convinced you.

@iglendd
Copy link

iglendd commented Dec 21, 2022

@qmuntal thank you for attempting to address this long standing issue. I hope you do not mind me offering a few additional comments.

  • I agree with the first 3 bullet points. There are ways to deal with them but they would require significant error, making platform specific knobs/requirements in the Go specification and making the whole endeavor involved more than anyone would want to.
  • Last point of granularity of GOTRACEBACK=crash I this is a general evil, this is how Linux part is implemented anyway(?) and moreover I was always wondering if I set GOTRACEBACK=crash explicitly in my code on the Go application start (e.g. because my own configuration "requested" it) - would it have Go runtime still consider it or it read this variable before Go entry point is invoked. If it is the former then we have more flexibility with this.
  • I still think that one of my previous comments still holds the water due to its simplicity and its philosophical cross-platform consistency. Here is the quote again (but in short, let it crash, it is debugging feature, do not swallow exception, OS specific handler can be used/setup to do just that, there are built-in and 3rd party utilities capable of that)

Philosophically, I like the approach used in Linux and it formally stated the GOTRACEBACK=crash documentation which I can rephrase as, we crash, OS handle that. I am Ok with that. Current WER or User Crash Dump or AeDebug configurations would be observed (it is not a bad thing) and behavior would follow other applications crashing on Windows behavior. Additional goodies (like custom dump creation, perhaps adding ability to customize further panic control, etc) are certainly welcome but at least default Windows behavior would be a big step forward IMHO....

@qmuntal
Copy link
Contributor

qmuntal commented Dec 25, 2022

@iglendd I've submitted the proposal #57441, please take a look to see if it resonates with your idea

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. Proposal Proposal-Accepted
Projects
Status: Triage Backlog
Status: Accepted
Development

No branches or pull requests

10 participants