Date: | February 15, 2013 / year-entry #45 |
Tags: | code |
Orig Link: | https://blogs.msdn.microsoft.com/oldnewthing/20130215-00/?p=5233 |
Comments: | 16 |
Summary: | A particular scenario was hanging, and the team responsible for the scenario debugged it to the point where they saw that their X component was waiting for their Y component, which was waiting for Explorer, so they asked for help chasing the hang into Explorer. The team was kind enough to have shared what they've learned so... |
A particular scenario was hanging, and the team responsible for the scenario debugged it to the point where they saw that their X component was waiting for their Y component, which was waiting for Explorer, so they asked for help chasing the hang into Explorer. The team was kind enough to have shared what they've learned so far: kd> !alpc /m 9c14d020 Message 9c14d020 MessageID : 0x0274 (628) CallbackID : 0xCCA5 (52389) SequenceNumber : 0x00000016 (22) Type : LPC_REQUEST DataLength : 0x0094 (148) TotalLength : 0x00AC (172) Canceled : No Release : No ReplyWaitReply : No Continuation : Yes OwnerPort : 82bb9db8 [ALPC_CLIENT_COMMUNICATION_PORT] WaitingThread : 834553c0 QueueType : ALPC_MSGQUEUE_MAIN QueuePort : 84646730 [ALPC_CONNECTION_PORT] QueuePortOwnerProcess : 846209c0 (explorer.exe) ServerThread : 00000000 <----------------------- QuotaCharged : No CancelQueuePort : 00000000 CancelSequencePort : 00000000 CancelSequenceNumber : 0x00000000 (0) ClientContext : 02a56b80 ServerContext : 00000000 PortContext : 0701ea20 CancelPortContext : 00000000 SecurityData : 962f89b8 View : 00000000 kd> !process 846209c0 0 PROCESS 846209c0 SessionId: 1 Cid: 0804 Peb: 7fbac000 ParentCid: 0724 DirBase: 3e546380 ObjectTable: 97195300 HandleCount: 1041. Image: explorer.exe Yikes, there is no thread signed up to service the request. I don't know much about ALPC, but I can fumble around. Fortunately, this is debugging and not rocket surgery, so you still get full points if you stumble across the answer by guessing.
I decided to start guessing by looking at what the
kd> !alpc -? !alpc /m MessageAddress Dumps the message at the specified address. !alpc /p PortAddress Dumps the port at the specified address. Well, I already saw what the result was for dumping the message, so I may as well dump the port. kd> !alpc /p 84646730 ... 8 thread(s) are registered with port IO completion object: THREAD 84658d40 Cid 0804.0888 Teb: 7fa7e000 Win32Thread: 8214a748 WAIT THREAD 8466a040 Cid 0804.08c4 Teb: 7fa74000 Win32Thread: 8214c800 WAIT THREAD 84659a00 Cid 0804.08ec Teb: 7fa72000 Win32Thread: 82158d08 WAIT THREAD 8466c8c0 Cid 0804.08f0 Teb: 7fa6e000 Win32Thread: 82160420 WAIT THREAD 84671040 Cid 0804.0910 Teb: 7fa68000 Win32Thread: 8217c4e8 WAIT THREAD 8460d180 Cid 0804.099c Teb: 7fa5e000 Win32Thread: 820bad08 WAIT THREAD 834278c0 Cid 0804.0c80 Teb: 7fa6b000 Win32Thread: 820b9620 WAIT THREAD 8345ad40 Cid 0804.0da0 Teb: 7fba9000 Win32Thread: 821c6d08 WAIT ... So it looks like there are eight threads signed up to process events on this port. (Is that what this means? I don't know, but I'm going to assume that it does, because this is debugging. Debugging is an exercise in optimism.) Let's see what they're doing. kd> .thread 84658d40;k Implicit thread is now 84658d40 *** Stack trace for last set context - .thread/.cxr resets it ChildEBP RetAddr 940ef394 80f1505f nt!KiSwapContext+0x19 940ef3d0 80f184e0 nt!KiSwapThread+0x34b 940ef3fc 80f163fc nt!KiCommitThreadWait+0x26f 940ef46c 80f4d2df nt!KeWaitForSingleObject+0x459 940ef4b4 80e20838 nt!KiSchedulerApc+0x298 940ef4c8 00000000 hal!KfLowerIrql+0x2c [the others look the same] Well, I don't know what they're doing, but it looks like they're waiting for something. But one of the threads looks different: kd> .thread 84671040;k Implicit thread is now 84671040 *** Stack trace for last set context - .thread/.cxr resets it ChildEBP RetAddr 9415f864 80f1505f nt!KiSwapContext+0x19 9415f8a0 80f184e0 nt!KiSwapThread+0x34b 9415f8cc 80eb3d6e nt!KiCommitThreadWait+0x26f 9415f934 810c0527 nt!KeWaitForMultipleObjects+0x4e3 9415fbe4 810c0703 nt!ObWaitForMultipleObjects+0x2fd 9415fd38 80ef113c nt!NtWaitForMultipleObjects+0xca 9415fd38 77945e04 nt!KiFastCallEntry+0x12c 07e2f1c4 779437f6 ntdll!KiFastSystemCallRet 07e2f1c8 7515c136 ntdll!NtWaitForMultipleObjects+0xa 07e2f34c 77752658 KERNELBASE!WaitForMultipleObjectsEx+0xee 07e2f368 777fbe60 KERNEL32!WaitForMultipleObjects+0x19 07e2f3d4 777fc5de KERNEL32!WerpReportFaultInternal+0x1a3 07e2f3e8 777df654 KERNEL32!WerpReportFault+0x6d 07e2f3f4 751e517c KERNEL32!BasepReportFault+0x19 07e2f490 77a0f95a KERNELBASE!UnhandledExceptionFilter+0x1e0 07e2f4a0 77a0fd4d ntdll!TppExceptionFilter+0x1b 07e2f4b4 77a1c66b ntdll!TppWorkerpInnerExceptionFilter+0x13 07e2fb34 77753278 ntdll!TppWorkerThread+0xa6092 07e2fb40 779761a6 KERNEL32!BaseThreadInitThunk+0xe 07e2fb80 77976152 ntdll!__RtlUserThreadStart+0x4a 07e2fb90 00000000 ntdll!_RtlUserThreadStart+0x1c Ah, well that explains why Explorer isn't responding: It crashed on an unhandled exception! Windows Error Reporting is busy trying to generate a report.
Now to see what the crash was.
I don't know for sure, but I'm pretty confident that one
of the parameters to
kd> dd 07e2f3f4 l4 07e2f3f4 07e2f490 77a0f95a 07e2f4e8 00000001 ChildEBP RetAddr Param1 kd> dd 07e2f4e8 l2 07e2f4e8 07e2f620 07e2f63c ^ ^ ExceptionRecord ContextRecord kd> .cxr 0x07e2f63c eax=00000000 ebx=0451e2f8 ecx=e2af034f edx=77945e00 esi=00000000 edi=0451e2e0 eip=1df7fc6a esp=07e2f920 ebp=07e2f938 iopl=0 nv up ei pl zr na pe nc cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010246 contoso!ContosoPower::Disconnect+0xdd: 001b:1df7fc6a 8b08 mov ecx,dword ptr [eax] ds:0023:00000000=????????
Aha, Explorer crashed due to a null pointer crash in the
Passing the buck onward to Contoso, the report back was that this was a known issue, and a hotfix was available. |
Comments (16)
Comments are closed. |
More often than not it seems the fault lies with Contoso; those guys suck!
So, was the original team related to Contoso?
Nice. Ah yes the downside in allowing garbage into your memory space is you get the support requests for somebody else's garbage. Unfortunately the classical model of processes is the only model understood by managers.
"Fortunately, this is debugging and not rocket surgery"
Does one require less knowledge and skill than the other?
So if they just waited a bit they'd see the exception report with clear traces of Contoso?
@John: Just look at their wiki page. Contoso Bank, Contoso Pharmaceuticals, Contoso University, Contoso News Agency, Software Developement. It's like they're some evil corporation taking over the world, but this can't be because they're just a fictional entity thought up by Microsoft, right? Ha-ha. Ha-ha, ha… Gotta lock my door.
"Fortunately, this is debugging and not rocket surgery"
"Does one require less knowledge and skill than the other?"
not really about skill but if you mess up the rocket surgery the rocket could exploade costing millions of dollars and lives. If you messe up debugging you have to emm, try again.
The big problem with rocket surgery is the malpractice insurance.
It seems to me like this hang should be temporary. Shouldn't WEP generate the report and then "unhang", somehow releasing the request made by the Y component, while simultaneously producing a report of the cause of the crash?
Maybe the next question is "Why did WEP hang?"
@kog: "if you mess up the rocket surgery the rocket could exploade costing millions of dollars and lives. If you messe up debugging you have to emm, try again."
Unless you're debugging a rocket.
Debugging a rocket: ti.arc.nasa.gov/…/0176%20(Havelund).pdf
[The only way to "unhang" the process is to terminate it, at which point it's too late to generate the crash report. WER can't generate the report until the user gives permission, because generating the report includes transmitting information to Microsoft to ask, "Hey, I hit this failure, do you want me to capture any data beyond the default crash dump?" -Raymond]
Close all open handles. Anybody waiting for it gets WAIT_ABANDONED_*. Since they're external to the calling process, WAIT_ABANDONED_* should already be handled. I can't believe I had to write that.
@Joshua: Yanking a program's infrastructure out from under it is simply delaying when the program is going to crash and burn. See blogs.msdn.com/…/463977.aspx for possible things that can happen when a program is presented with WAIT_ABANDONED; the answer to "What can I do with invalid handles?" is "Not much."
Terminating a hung process is probably better all around because you don't leave programs in unstable states where they think that everything's OK but in reality nothing is!
Wow. Raymond got it ErikF didn't. My general idea is whatever handles that are being used for the LPC would be closed anyway when WER terminates the process so it could potentially close them now. I overlooked LPC by Windows messaging as opposed to kernel32 handles.
Microsoft is just an umbrella corporation for Contoso.
I had assumed that WER was going to silently create a small crash dump and then at some later point send it to MS, whereupon MS might say "next time you hit this crash, capture additional information X".
Regardless, it sounds like you're saying WER put up UI to ask if the user wants to give permission. Did the user who reported the issue not notice the UI, or did they ignore it because they didn't realize it had to do with their issue?