Crash Dump Analysis Patterns (Part 7)

Crash dump 불펌스페샬 2007. 5. 13. 23:58 posted by CecilDeSK
반응형
Crash Dump Analysis Patterns (Part 7)

We have to live withtools that produce inconsistent dumps. For example, LiveKd.exe from sysinternals.com which is widely used by Microsoft and Citrix technical support to save complete memory dumps without server reboot. I even wrote an article for Citrix customers:

Using LiveKD to Save a Complete Memory Dump for Session or System Hangs

If you read it you will find an important note which is reproduced here:

LiveKd.exe-generated dumps are always inconsistent and cannot be a reliable source for certain types of dump analysis, for example, looking at resource contention. This is because it takes a considerable amount of time to save a dump on a live system and the system is being changed during that process. The instantaneous traditional “CrashOnCtrlScroll” method or SystemDump tool always save a reliable and consistent dump because the system is frozen first (any process or kernel activity is disabled), then a dump is saved to a page file.

If you look at such inconsistent dump you will find thatmanyuseful kernelstructures such as ERESOURCE list (!locks)arebroken and even circular referenced and therefore WinDbg commands display “strange” output.

Easy and painless (for customers) dump generationusing such “Live” tools means thatit iswidely used and wehaveto analyze dumps saved bythese tools and sentfrom customers.This brings us to the next crash dump analysis pattern called “Inconsistent Dump”.

Ifyou have such dumpyou should look at itin order to extract maximum useful informationthat helpsin identifyingthe root cause or give you further directions. Not all information is inconsistent in such dumps. For example, drivers, processes, thread stacks and IRP lists cangive you some clues about activities. Even some information not visible in consistent dump can surface in inconsistent dump (subject to commands used).

For example, I had a LiveKd dump where I looked atprocess stacks by running the script I created earlier:

Yet another WinDbg script

and I found that for some processesin addition to their own threadsthe script lists additional terminated threads that belong to a completely different process (have never seenit in consistent dump):

Process 89d97d88 is not visible in theactive process list (script mentioned above or !process 0 0 command). However, if wefeed thismemory address to !process command (or explore it as _EPROCESS structure, dt command) we getits contents:

What might have happened there: terminated process 89d97d88 was excluded from active processes list butitsstructure was left in memory and due to inconsistencythread listswere also brokenand therefore terminated threadssurfacedwhen listing other processes and their threads.

I suspected herethat winlogon.exe died in session 2 and left empty desktop windowwhich a customersaw and complained about. The only left and visible process from session 2 was csrss.exe. The conclusion was to enable NTSD as a default postmortem debugger to catch winlogon.exe crash when it happens next time.

- Dmitry Vostokov -

반응형

Crash Dump Analysis Patterns (Part 4)

Crash dump 불펌스페샬 2007. 5. 13. 23:57 posted by CecilDeSK
반응형
Crash Dump Analysis Patterns (Part 4)

After looking at one dump today where all thread environment blocks were zeroed, import table corrupt and recalling some similar cases Iencountered previously I came up with the next pattern: Lateral Damage.


Whenthis problem happens you don’t have much choice and your first temptation is to apply Alien Component anti-patternunless your module listiscorrupt andyou have manifestation of anothercommon problemIwill talk about next time: Corrupt Dump.



Anti-pattern isnot always bad solution ifcomplemented by subsequent verification and backed by experience.If yougetdamagedprocess and thread structuresyou canpoint toa suspiciouscomponent (supported by some evidence like raw stack analysis and educated guess) and request additional dumps in hope to get less damaged process space or seethat component again. At the very end if removing it stabilizes the customer environment it proves you were right.

- Dmitry Vostokov -

반응형

Crash Dump Analysis Patterns (Part 5)

Crash dump 불펌스페샬 2007. 5. 13. 23:57 posted by CecilDeSK
반응형
Crash Dump Analysis Patterns (Part 5)

The next pattern I would like to talk about is Optimized Code.If you have such cases you should not trust your crash dump analysis tools like WinDbg.Always suspect that compiler generated code might have been optimizedif you see any suspicious or strange behaviour of your tool.Let’s consider this fragmentof stack:

Args to Child
77e44c24 000001ac 00000000 ntdll!KiFastSystemCallRet
000001ac 00000000 00000000 ntdll!NtFsControlFile+0xc
00000034 00000bb8 0013e3f4 kernel32!WaitNamedPipeW+0x2c3
0016fc60 00000000 67c14804 MyModule!PipeCreate+0x48

3rd-party function PipeCreate from MyModule opens a named pipe and its first parameter (0016fc60) points to a pipe name L”\.pipeMyPipe”. Inside the source code it calls Win32 API function WaitNamedPipeW(to wait for the pipe to be available for connection) and passes the same pipe name. Butwe see that the first parameter to WaitNamedPipeW is00000034 which cannot be the pointer to a validUnicode string. And the programshould have been crashedif 00000034 were a pointer value.

Everything becomes clear if we look at WaitNamedPipeWdisassembly (comments are mine):

0:000> uf kernel32!WaitNamedPipeW
mov edi,edi
push ebp
mov ebp,esp
sub esp,50h
push dword ptr [ebp+8] ;Use pipe name
lea eax,[ebp-18h]
push eax
call dword ptr [kernel32!_imp__RtlCreateUnicodeString (77e411c8)]




call dword ptr [kernel32!_imp__NtOpenFile (77e41014)]
cmp dword ptr [ebp-4],edi
mov esi,eax
jne kernel32!WaitNamedPipeW+0×1d5 (77e93316)
cmp esi,edi
jl kernel32!WaitNamedPipeW+0×1ef (77e93331)
movzx eax,word ptr [ebp-10h]
mov ecx,dword ptr fs:[18h]
add eax,0Eh
push eax
push dword ptr [kernel32!BaseDllTag (77ecd14c)]
mov dword ptr [ebp+8],eax; reuse parameter slot

As we know [ebp+8] is the first function parameter in non-FPO calls:

Parameters and Local Variables

And we see it is reused because after we convert LPWSTR to UNICODE_STRING and call NtOpenFile to get a handle we no longer need our parameter slot andthe compilercan reuse it to store other information.

Thereisanother compiler optimization we should be aware of and it is called OMAP.Itmovesthe code inside the code sectionand puts the most frequently accessed code fragments together. In that case if you type in WinDbg, for example,

0:000> uf nt!someFunction

you get different code than if you type (assuming f4794100 is the address of the function you obtained from stack or disassembly)

0:000> uf f4794100

In conclusionthe advise is to be alert and conscious during crash dump analysis and inspect any inconsistencies closer.

Happy debugging!

- Dmitry Vostokov -

반응형

Crash Dump Analysis Patterns (Part 3)

Crash dump 불펌스페샬 2007. 5. 13. 23:56 posted by CecilDeSK
반응형
Crash Dump Analysis Patterns (Part 3)

Another pattern Iobservefrequently is False Positive Dump.We get dumpspointingin a wrong direction ornot useful for analysisand this usually happens whenwrong toolwas selected or right onewas not properly configured for capturing crash dumps. Here is one example I investigated in detail.

The customer experienced frequent spooler crashes. The dump was sent for investigation to find an offending component: usually it is a printer driver. WinDbgrevealed the following exception thread stack (parameters are not shown here for readability):

KERNEL32!RaiseException+0x56
KERNEL32!OutputDebugStringA+0x55
KERNEL32!OutputDebugStringW+0x39
HPZUI041!ConvertTicket+0x3c90
HPZUI041!DllGetClassObject+0x5d9b
HPZUI041!DllGetClassObject+0x11bb

The immediate responseis to point to HPZUI041.DLL butif we look at parameters to KERNEL32!OutputDebugStringA we would see that the string passed to it is a valid NULL-terminated string:

0:010> da 000d0040
000d0040 ".Lower DWORD of elapsed time = 3"
000d0060 "750000."

If we disassemble OutputDebugStringA up to RaiseException callwe would see:

0:010> u KERNEL32!OutputDebugStringA
KERNEL32!OutputDebugStringA+0x55
KERNEL32!OutputDebugStringA:
push ebp
mov ebp,esp
push 0FFFFFFFFh
push offset KERNEL32!'string'+0x10
push offset KERNEL32!_except_handler3
mov eax,dword ptr fs:[00000000h]
push eax
mov dword ptr fs:[0],esp
push ecx
push ecx
sub esp,228h
push ebx
push esi
push edi
mov dword ptr [ebp-18h],esp
and dword ptr [ebp-4],0
mov edx,dword ptr [ebp+8]
mov edi,edx
or ecx,0FFFFFFFFh
xor eax,eax
repne scas byte ptr es:[edi]
not ecx
mov dword ptr [ebp-20h],ecx
mov dword ptr [ebp-1Ch],edx
lea eax,[ebp-20h]
push eax
push 2
push 0
push 40010006h
call KERNEL32!RaiseException

There is no jumps in the code prior to KERNEL32!RaiseException call and this means that raising exceptionwas expected. Also MSDN documentation says:

“If the application has no debugger, the system debugger displays the string. If the application has no debugger and the system debugger is not active, OutputDebugString does nothing.”

So spoolsv.exe might have been monitored by a debugger which caught that exception and instead of dismissing it dumped the spooler process.

If we look at ‘analyze -v’ output we could see the following:

Comment: 'Userdump generated complete user-mode minidump
with Exception Monitor function on WS002E0O-01-MFP'

ERROR_CODE: (NTSTATUS) 0x40010006 -
Debugger printed exception on control C.

Now we see thatdebugger was User Mode Process Dumper you can downloadfrom Microsoft web site:

How to use the Userdump.exe tool to create a dump file

If we download it,install it and write a small console program in Visual C++ to reproduce this crash:

#include "stdafx.h"
#include
int _tmain(int argc, _TCHAR* argv[])
{
OutputDebugString(_T("Sample string"));
return 0;
}

and if we compile it in Release mode andconfigure Process Dumper applet in Control Panel to include TestOutputDebugString.exe with the following properties:

userdump5.JPG

and then run our program we would see Process Dumper catching KERNEL32!RaiseExceptionand saving the dump.

Even if weselect to ignore exceptions that occur inside kernel32.dll this tool still dumpsourprocess. Now we can see that the customer most probably enabled ‘All Exceptions’ check box too. What the customer should have done is to use default rules like on the picture below:

userdump4.JPG

Or select exception codes manually. In this case no dump is generated even if we manually select all of them. Just to check thatthe latter configuration still catches access violationswe can add a line of code dereferencing NULL pointer and Process Dumper will catch it and save the dump.

Conclusion: the customer should have used NTSD as a default postmortem debugger from the start. Then if crash happened wewould have seen the real offending component or could have applied other patterns and requested additional dumps.

- DmitryVostokov-

반응형