Crash Dump Analysis Patterns (Part 2)

Crash dump 불펌스페샬 2007. 5. 13. 23:55 posted by CecilDeSK
반응형
Crash Dump Analysis Patterns (Part 2)

Another pattern I would like to discuss isDynamic MemoryCorruption (and its user and kernel variants called Heap Corruption and Pool Corruption). You might have already guessed it :-) It is so ubiquitous. And its manifestations are random and usually crashes happen far away from the original corruption point. In your user mode and space part of exception threads (don’t forget about Multiple Exceptions pattern) you would see something like this:

ntdll!RtlpCoalesceFreeBlocks+0x10c
ntdll!RtlFreeHeap+0x142
MSVCRT!free+0xda
componentA!xxx

or this

ntdll!RtlpCoalesceFreeBlocks+0x10c
ntdll!RtlpExtendHeap+0x1c1
ntdll!RtlAllocateHeap+0x3b6
componentA!xxx

or any similar variants and you need to know exact component that corrupted application heap (which usually is not the same as componentA.dll you see in crashed thread stack).

For this common recurrent problem we have a general solution: enable heap checking. This general solution has many variants applied in a specific context:

  • parameter value checking for heap functions
  • user space software heap checks before or after certain checkpoints (like “malloc”/”new” and/or “free”/”delete” calls): usually implemented by checking various fill patterns, etc.
  • hardware/OS supported heap checks (like using guardand nonaccessible pages to trap buffer overruns)

The latter variant is the mostly used according to my experience and mainly due to the fact that most heap corruptions originate from buffer overflows.And it is easier to rely on instant MMU support than on checking fill patterns. Here is the article from Citrix support web site describing how you can enable full page heap. It uses specific process as an example: Citrix Independent Management Architecture (IMA) service but you can substitute any application name you are interested in debugging:

How to enable full page heap

and another article:

How to check in a user dump that full page heap was enabled

The following Microsoft article discusses various heap related checks:

How to use Pageheap.exe in Windows XP and Windows 2000

The Windows kernel analog to user mode and space heap corruption is called page and nonpaged pool corruption. If we consider Windows kernel pools as variants of heap then exactly the same techniques are applicable there, for example, the so called special pool enabled by Driver Verifier is implemented by nonaccessible pages. Refer to the following Microsoft article for further details:

How to use the special pool feature to isolate pool damage

- Dmitry Vostokov -

반응형

Crash Dump Analysis Patterns (Part 1)

Crash dump 불펌스페샬 2007. 5. 13. 23:54 posted by CecilDeSK
반응형
Crash Dump Analysis Patterns (Part 1)

After doing crash dump analysis exclusively for more than 3 years I decided toorganize my knowledge into a set of patterns (so to speak ina dump analysispattern language and therefore try facilitate its common vocabulary).

What is a pattern? It is a general solution you can apply in a specific context to a common recurrentproblem.

There are many pattern and pattern languages in software engineering, for example, look at the following almanac that lists +700 patterns:

The Pattern Almanac 2000

and thefollowing link is very useful:

Patterns Library

The first pattern I’m going to introduce today is Multiple Exceptions. This pattern captures the known fact that there could be as many exceptions (”crashes”) as many threads in a process.The following UML diagram depicts the relationshipbetween Process, Thread and Exception entities:

da_pattern_1_corrected.JPG

Every process in Windows has at least one execution thread so there could beat least one exception per thread (like invalid memory reference) if things go wrong. There could besecond exception in that thread if exception handling code experiences another exception or the first exception was handled and you have another one and so on.

So what is the general solution to that common problem whenan application or servicecrashes and you have a crash dump file (common recurrent problem) froma customer (specific context)? The general solution is to look at all threads and their stacks and do not rely on what tools say.

Here is a concrete example from one of the dumps I got today:

Internet Explorer crashed andI opened it in WinDbg and ran ‘!analyze -v’ command. This is what I gotin my WinDbg output:

ExceptionAddress: 7c822583 (ntdll!DbgBreakPoint)
ExceptionCode: 80000003 (Break instruction exception)
ExceptionFlags: 00000000
NumberParameters: 3
Parameter[0]: 00000000
Parameter[1]: 8fb834b8
Parameter[2]: 00000003

Break instruction, you might think, shows that the dump was taken manually from the running application and there was no crash - the customer sent the wrong dump or misunderstood instructions. HoweverI looked at all threadsand noticed the following two stacks (threads 15 and 16):

0:016>~*kL...

15 Id: 1734.8f4 Suspend: 1 Teb: 7ffab000 Unfrozen
ntdll!KiFastSystemCallRet
ntdll!NtRaiseHardError+0xc
kernel32!UnhandledExceptionFilter+0x54b
kernel32!BaseThreadStart+0x4a
kernel32!_except_handler3+0x61
ntdll!ExecuteHandler2+0x26
ntdll!ExecuteHandler+0x24
ntdll!KiUserExceptionDispatcher+0xe
componentA!xxx
componentB!xxx
mshtml!xxx
kernel32!BaseThreadStart+0x34

# 16 Id: 1734.11a4 Suspend: 1 Teb: 7ffaa000 Unfrozen
ntdll!DbgBreakPoint
ntdll!DbgUiRemoteBreakin+0x36

So we see here that the real crash happened incomponentA.dll and componentB.dll or mshtml.dll might have influenced that. Why this happened? The customermight have dumped Internet Explorer manually while it was displaying an exception message box. The following reference says that ZwRaiseHardError displays a message box containing an error message:

Windows NT/2000 Native API Reference

Or perhaps something else happened.Manycases where we see multiple thread exceptions in one process dump happened because crashed threads displayed message boxes like Visual C++ debug message box and preventing that process from termination.In our dump under discussion WinDbgautomatic analysis command recognized only the last breakpoint exception (shown as # 16). In conclusion we shouldn’t rely on“automatic analysis” often anywayandprobably should write our own extension to list possible multiple exceptions (based on some heuristics I will talk about later).

- Dmitry Vostokov -

반응형

불펌란의 저자에 대해서

Crash dump 불펌스페샬 2007. 5. 13. 23:52 posted by CecilDeSK
반응형
About Dmitry Vostokov


My current position in Citrix is EMEA Development Analysis Team Lead and I work in Dublin office, Ireland. I joined Citrix on 14th of October, 2003 as an Escalation Development Analysis Engineer.

I’mthe author ofseveral Citrix debugging and troubleshooting tools (download requires free registration) including:

and Dump2Wavetool.

I’malso a co-author ofthe very popularCitrix StressPrinters tool.

I have my personal web site www.vostokov.comand I’ma founder of

The following linksare currently mirrors of this blog

I’m planning to put something useful there in the future. For example, forensic memory analysis issimilar to crash dump analysis.

If you want to know more about me please read my interview:

Inside Citrix November 2006


저자 블로그의 about 란을 그래도 옮겨 왔습니다.

많은 도움이 될것 같아 메일한통 날려보고 ..

시간관계상 불펌으로 일단 마감을 ㅡ,ㅡ 현재 집에 인터넷이 안되는 관계로 ㅡ,ㅡ 후배 사무실에서 대충 불펌질만 하다 가야되겠군요 후일 정리를 미루고요 ㅡ,ㅡ

우연히 발견한 블로그인데 역시 세상에는 저완 관심을 같이 하는 수많은 사람이 있다는 사실에 놀랄수밖에 없습니다.

나중에 시간나면 좀더 정리와 아울러 성의있는 감사메일로 좋은글 읽을수있게 해주신 보답을 해야겠네요 ...

반응형
반응형
The Elements of Crash Dump Analysis Style


After looking at multitude ofcrash dump analysis reports from different companies andengineersI would like to highlight several rules for good analysis reports:

  • Format your debugger output in fixed size font (Courier New or Lucida Console). This is very important for readability
  • Bold and highlight (using different colors) important addresses or data
  • Keep the same color for the same address or data consistently
  • Use red color for bug manifestation points
  • If you refer to some dump put a link to it

What is considered bad crash dump analysis style? These are:

  • Variable size font (you copy yourdebugger outputto Outlook e-mailas is and it is using the default font)
  • Highlight the whole data set (for example, stack trace) in red
  • Too muchirrelevant information

As an example ofthe good style I advocate (albeit not perfect) please look at the previouspost Crash Dump Analysis Case Study

These aremyfirst thoughts aboutcrash and memory dump analysis style and I continue toelaborate it and present more examples later.

- Dmitry Vostokov -

반응형