|Copyright © Microsoft Corporation. This document is an archived reproduction of a version originally published by Microsoft. It may have slight formatting modifications for consistency and to improve readability.|
Matt Pietrek is the author of Windows 95 System Programming Secrets (IDG Books, 1995). He works at NuMega Technologies Inc., and can be reached at email@example.com.|
In last month's column, (April 1997)
I created an
MSJExceptionHandler class for generating report
files when an unhandled exception occurs. At the end of the column, I
described basic stack walking on the Intel CPU. However, the code
addresses that you'd see from my stack-walking code are logical
addresses. That is, they'd have the name of the EXE or DLL that
encompasses the address, along with the section and offset within the
section. While these addresses are what you'd see in a MAP file, most
people would rather see symbolic function names in their stack traces.
In addition, my stack-walking code didn't help you if the program's code
wasn't generated with stack frames.|
This month, I'll show you how to remedy both of these problems by using IMAGEHLP.DLL. Until Windows NT® 4.0 arrived, IMAGEHLP.DLL lurked in the backwaters of the Win32® SDK. In Windows NT 4.0, it became an integral part of the operating system, and is a redistributable component for Windows® 95 users. IMAGEHLP.DLL has many useful functions that provide services such as executable file modification, symbol table access, and security certificate manipulation. I'll use just a few of the functions here, but you'll find that quite a lot can be accomplished with this DLL.
A few of the functions that I'll describe use debug information of one sort or another. I get a fair amount of inquiries on the various types of debug information, since not much has been written about this topic. For this reason, I'm going to spend a little bit of time describing the various types of information before extending my MSJExceptionHandler class to use symbolic debug information. As a side note, the terms "symbol table" and "debug information" are often used interchangeably; a distinction could be made, but I won't split hairs.
Types of Debug Information
The most well-known form of debug information you'll see in Win32 executables is the information that debuggers work with directly. For example, this form of debug information lets the debugger convert between an address and the name of the function or variable that it corresponds to. Likewise, it lets the debugger translate between program addresses and the source file and line number that generated the code. This debug information even lets a debugger know about the parameters and local variables a function uses, and where they can be found on the stack. In addition, this format includes type information, which describes the size and type (for example, void *, or BOOL) of variables and functions.
Until a few years ago, Microsoft® compilers used a symbol table format known as CodeView information. This format has been documented in a variety of places, including the MSDN CD-ROM. A number of other compiler vendors have adopted CodeView as their debug format. The notable exception is Borland, which uses a proprietary format of debug information in Borland C++ and Delphi. Up until Visual C++® 4.1, you could still force the linker to produce CodeView-style symbols. CodeView symbol tables, like most other types of debug information, are stored at the end of the executable file for which they were created.
Starting in Visual C++ 2.0, Microsoft introduced a new type of symbol table. This format is known as the program database or, more commonly, the PDB. The shortened name comes from the fact that this information is kept in a file with a PDB extension separate from the executable. The primary reason was to support the Microsoft linker's incremental linking feature. If the debug info were to be kept at the end of the executable file, it would require the linker to do significantly more file I/O when writing a file with debug information. Microsoft's solution was to put the debug information in a separate file and make the executable file contain a reference to the external symbol table.
The format of PDB symbol tables isn't publicly documented. (Even I don't know the exact format, especially as it continues to evolve with each new release of Visual C++.) However, PDB information is essentially the chunks of CodeView information pulled from throughout the project's source files. So how are debuggers supposed to use PDB information? If you look in the BIN directory of all versions of Visual C++ going back to Visual C++ 2.0, you'll see DLLs with names like dbi.dll, mspdb40.dll, and mspdb41.dll. These DLLs know how to read PDB information and present it in a consistent format to the client program (typically a debugger). The APIs that these DLLs export aren't publicly documented, to my knowledge.
Another type of debug information that Visual C++ can emit is known as Common Object File Format (COFF), and preceded Win32 by many years. When the Windows NT team was writing tools for their early work, COFF symbol tables made sense because many development tools ported from other platforms worked with COFF symbols. Even today, you can force Visual C++ to generate COFF symbols by specifying the /DEBUGTYPE:COFF or /DEBUGTYPE:BOTH linker options. One disadvantage of Microsoft's COFF information is that it doesn't contain type information that tells the debugger if a particular variable is an int, a double, or so forth.
You'll find documentation on the COFF format on the MSDN CD-ROM. WINNT.H contains most of the data structures that COFF symbol tables use. My overview of COFF symbols can be found in chapter 8 of Windows 95 System Programming Secrets (IDG Books, 1995).
The next type of debug information is Frame Pointer Omission (FPO) data, which is specific to the Intel CPU architecture. Briefly, FPO is helper information that stack-walking code can use to walk past functions that weren't generated with a standard EBP frame (as I described last month). Using FPO information, a stack-walking routine can piece together what the stack looks like for this type of function. By knowing what the stack looks like, the code can detect the location of the return address and the next higher frame on the stack. FPO information is usually stored as part of the executable to which it corresponds.
You'll see FPO information generated for your own code if you force the compiler to generate debug information and perform optimizations. For example, if you compile FOO.CPP, with this command line:
CL /Zi /O1 FOO.CPP
|and use the obscure /FPO option on the resulting EXE|
DUMPBIN /FPO FOO.EXE
|you'll see something like this:|
FPO Data (41) Proc Use Has Frame Address Size Locals Prolog BP SEH Type Params 00001014 8 0 0 N N fpo 4 0000101E 10 0 0 N N fpo 8
recently, FPO information was undocumented. However, documentation
eventually showed up in an obscure corner of the MSDN CD-ROM. More
recently, WINNT.H from the Windows NT 4.0 Win32 SDK included a
definition for the FPO_DATA structure, which is essentially all you need
Yet another form of debug information is relatively new and undocumented, except for a few obscure references in WINNT.H and the Win32 SDK help. This type of information is known as OMAP. Apparently, as part of Microsoft's internal build procedure, small fragments of code in EXEs and DLLs are moved around to put the most commonly used code at the beginning of the code section. This presumably keeps the process memory working set as small as possible. However, when shifting around the blocks of code, the corresponding debug information isn't updated. Instead, OMAP information is created. It lets symbol table code translate between the original address in a symbol table and the modified address where the variable or line of code really exists in memory.
In WINNT.H, you'll see two #defines, IMAGE_DEBUG_TYPE_OMAP_TO_SRC and IMAGE_ DEBUG_TYPE_OMAP_FROM_SRC, that provide evidence for the existence of OMAP information. Likewise, in the description of the IMAGEHLP_SYMBOL structure in the Win32 SDK, you'll see the #define SYMF_OMAP_ GENERATED. The question is, where can you find examples of OMAP information? If you use Windows NT, you can find it nearly everywhere. Prowling through the DBG files, which Microsoft provides in the Win32 SDK for most system components, you'll find that many of them have OMAP information.
The DBG files provide the debug information for your use without including it in the executable. Microsoft provides symbol tables for all the components of Windows NT. However, by putting them in separate DBG files, you don't have to pay the overhead of increased disk usage if you don't need the symbol tables. You can simply copy the DBG files for the system components you use.
DBG files are nothing more than a collection of the various types of debug information. Following a standard header at the beginning of a DBG file is a directory of the various types of debug information in the file. If you've written code that works with symbol tables in executables, it's really not hard to modify them to work with DBG files as well.
The standard method for creating a DBG file is to build your executable file with whatever types of debug information you want. Remember, doing a debug build doesn't necessarily mean that you have to disable optimizations. Once the executable is created, use the REBASE program from the Win32 SDK to strip the symbols out of the executable and put them into a DBG file. Alternatively, if you write your own tools, IMAGHELP.DLL has a SplitSymbols API that can create a DBG file.
If you're going to use my MSJEXHND framework (or something like it), you should seriously consider making DBG files for your final release. In your release build, leave all your optimizations on, but enable debugging optimization (and optionally, line-number information). After building your executable, use REBASE to strip the symbol information out into a DBG file. This way, you'll have symbols for debugging your release build, but your users won't.
that I've rambled on about symbol tables and DBG files, let me tie this
back together with the original topic: symbolic stack traces in an
exception report. IMAGEHLP.DLL supports and uses nearly everything that
I've described above. For starters, it can read CodeView information,
PDB files, and COFF debug information to translate symbolic names to
addresses. If FPO data is present, IMAGEHLP uses it to walk the stack
even when EBP-style stack frames are missing. For Microsoft executable
files that have undergone working set optimization, IMAGEHLP uses the
OMAP information to provide correct symbolic addresses. And IMAGEHLP can
do all of this either from executable files or from separate .DBG
aren't very user friendly. IMAGEHLP.DLL's symbol table
functions can make quick work of linear addresses that you feed it, and
spit back the corresponding function name from your code.|
Figure 1 shows the revised MSJExceptionHandler class with IMAGEHLP support. There are two new methods: InitImagehlpFunctions and ImageHlpStackWalk. At the end of the class declaration is a slew of typedefs and member variables, all related to IMAGEHLP functions. These additions make the MSJExceptionHandler code independent of the presence of IMAGEHLP.DLL. If IMAGEHLP is present, my code connects to it via LoadLibrary and GetProcAddress; if not, the code falls back to the same behavior as last month's version. Because of this behavior, there are two different stack-walking methods. The ImagehlpStackWalk method is used when IMAGEHLP.DLL is available, while IntelStackWalk is used when it's not.
The only change I made to last month's code (besides adding new methods) is in the GenerateExceptionReport method (see Figure 1). In last month's code, the method simply finished with a call to IntelStackWalk. For the revised code, I call the new InitImagehlpFunctions method first, then ImagehlpStackWalk. I let IMAGEHLP.DLL clean up by calling the SymCleanup API.
Before I get to the really interesting code in ImagehlpStackWalk, let me point out something in the InitImagehlpFunctions method (see Figure 1). The current IMAGEHLP.DLL documentation is lacking in some key areas that aren't obvious at first. I found out the hard way that, before the symbol table or stack-walking APIs will work, you have to call the SymInitialize API. For the third parameter, fInvadeProcess, I pass TRUE. This causes IMAGEHLP to attempt to load symbol tables for every module in the process, including the DBG files that are provided for the Windows NT system components. If you want to defer some of this work and let IMAGEHLP demand-load the symbol tables, you can use the SymSetOptions API with the SYMOPT_DEFERRED_ LOADS flag.
The code for MSJExceptionHandler::ImagehlpStackWalk method can be found in Figure 1. The focus of this routine is a loop based on the IMAGEHLP StackWalk API. Again, I found out the hard way (because it's not in any documentation) that some preliminary setup is needed before calling StackWalk for the first time on the Intel platform. A STACKFRAME structure must be created and initialized with the instruction pointer, frame pointer, and stack pointer. Then you can just spin in a loop, calling StackWalk until it returns FALSE.
Each successful call to StackWalk yields the next higher frame in the call stack. The StackWalk function implicitly uses FPO data, so you don't have to do anything special. Also notice that since I'm executing in the same process context as the stack I'm walking, I can pass the values returned by GetCurrentProcess and GetCurrentThread as the process and thread handles. If I walked a thread in another process context (like debuggers do), I'd somehow have to get hold of a valid process and thread handle.
The last four parameters to StackWalk might seem a little strange. IMAGEHLP.DLL is designed to be very flexible, and to not make assumptions about the environment that it's operating under. Therefore, it uses caller-supplied callback functions to read memory, find the FPO (or similar data), correlate addresses to DLLs, and convert between segmented 16-bit addresses and 32-bit linear addresses. For two of these parameters, you can pass zero, and the StackWalk API will do the right thing. For the other two parameters, you can pass the address of a function you wrote or the addresses of built-in IMAGEHLP APIs that provide acceptable default behavior. That is what I've done in the MSJEXHND code.
After each successful call to the StackWalk API you get back the linear memory address of some piece of code. In the IntelStackWalk method, I converted these almost useless linear addresses into logical addresses that you could look up in a MAP file. However, with the aid of IMAGEHLP and its SymGetSymFromAddr API, I can do much better (assuming symbol table information is available). If everything works out, I can correlate that linear address to a specific function.
SymGetSymFromAddr takes four parameters. The first is the process handle for the process in which you want to look up the symbol. The second parameter is the linear address that you're asking about. The third parameter is a DWORD that the API fills in with a displacement. For example, if the address you're asking about is 0x30 bytes inside a function, the API will write 0x30 to the DWORD. The final parameter is a pointer to an IMAGEHLP_SYMBOL structure, which SymGetSymFromAddr fills in with all sorts of goodies, including the symbol name.
Filling in and using the fourth parameter (the IMAGEHLP_SYMBOL pointer) can be confusing at first. The problem is that the IMAGEHLP_SYMBOL structure doesn't leave room for the symbol name at the end. You have to create a buffer in memory that's at least the size of an IMAGEHLP_SYMBOL structure plus the size of the largest symbol name that you're expecting. In the MSJEXHND code, I did this by making a BYTE array of the desired size (including a 512-byte name buffer). I then made a pointer of type IMAGEHLP_SYMBOL * that points at the BYTE buffer. Also, before passing the buffer to SymGetSymFromAddr, you have to initialize several fields in the buffer. Again, the Win32 documentation is vague in this area, so see the ImagehlpStackWalk code for an example of what's necessary.
For each entry in the call stack, the ImagehlpStackWalk method uses SymGetSymFromAddr to look for a symbolic name for the address. If a symbol is found, the code prints out the function name, along with how far the address is from the beginning of the function (in bytes). If IMAGEHLP had line-number functions, I'd try to correlate the linear address to a source file and line number as well. If SymGetSymFromAddr can't locate a symbol (perhaps because there's no symbol table available), my code converts the linear address into a logical address and emits that.
Earlier, I mentioned that some IMAGEHLP functions have problems under Windows 95. The problem is that the Windows NT 4.0 version of IMAGEHLP doesn't enumerate the list of loaded EXEs and DLLs in a process under Windows 95. IMAGEHLP.DLL needs to know what (and where) modules are loaded so it can correlate a linear address to the EXE or DLL that it belongs to. Without knowing which module an address comes from, IMAGEHLP can't know which symbol table to use. Because of this problem, both the StackWalk and SymGetSymFromAddr APIs currently don't work on Windows 95. Hopefully this will be fixed in a subsequent release of IMAGEHLP.DLL.
This wraps up our tour of unhandled exception reporting, stack walking, and symbol tables. Everything that I've described here is just part of the basic nuts and bolts that debugger writers work with nearly every day. These issues may seem complicated; however, with the aid of IMAGEHLP.DLL, much of the hard work has been done for you. Likewise, my MSJEXHND framework shows that the topics I've described can be useful outside of writing a debugger.
Have a question about programming in Windows? Send it to Matt at firstname.lastname@example.org
|From the May 1997 issue of Microsoft Systems Journal.|