A
while back, a nifty DLL called IMAGEHLP.DLL appeared from deep within
the Windows NT® team. IMAGEHLP provided APIs for reading and modifying
Portable Executable files, as well as rudimentary code for working with
debug symbols. In a nutshell, IMAGEHLP was a dumping ground for various
routines used by linkers, debuggers, and related tools. At first,
IMAGEHLP.DLL was a redistributable DLL, but eventually it became
important enough to make part of the operating system. As
time went on, IMAGEHLP's debugging and symbol management functionality
grew. Eventually, Microsoft split the executable image manipulation code
away from the symbol-related code. The resultant symbol code ended up
in DBGHELP.DLL. DBGHELP has become an official, Microsoft-endorsed
mechanism for reading all types of debug information generated by
Microsoft tools. In addition, DBGHELP knows all the tricks to walk a
call stack on multiple CPU platforms. DBGHELP.DLL encapsulates a lot of
nasty, tricky, OS and version-specific code so that you don't have to
write it yourself. Both John
Robbins (of Bugslayer fame) and I have covered features of IMAGEHLP and
DBGHELP in past columns in MSDN® Magazine. This month,
I'll cover some newly added features that take DBGHELP to a whole new
level. I've also provided a nifty class (WheatyExceptionReport) that can
be easily dropped into C++-based projects to provide a detailed crash
report, including the names and values of all local and global variables
at the time of a program crash. DBGHELP
has gone through a number of iterations. I'm using version 5.1 here,
which comes with Microsoft® Windows® XP. In the sample code for this
column, you'll absolutely need to have this latest version (which is
redistributable) for the code to run. DBGHELP
is not the only way to read debug information. The COFF and CodeView
symbol formats are documented, so you can read them directly. In
addition, there is the PDB format, which somewhat resembles CodeView®
internally. Microsoft has not documented the PDB format, but instead has
provided tool vendors with a private API for reading and writing PDB
files. More recently, Visual
Studio® .NET introduced a new format for the PDB format, which is
identified by an RSDS signature. The Debug Information Access (DIA) SDK,
which is currently supplied in Visual Studio .NET betas, can read RSDS
and earlier format files. Going forward, the DIA APIs are the official,
Microsoft-endorsed method of accessing symbolic information from
Microsoft code, both managed and unmanaged. However, the DIA APIs are
COM-based and not for the faint of heart. The
beauty of DBGHELP is that its mission in life is to protect you from
all the hassles of needing to know how to read all the various formats.
DBGHELP provides a nice, relatively simple API on top of a rat's nest of
symbol management and stack walking code. Having looked at both DIA and
DBGHELP, I think DBGHELP has a flatter learning curve. What's New in DBGHELP 5.1? Version
5.1 of DBGHELP.DLL offers several new sets of functions, as well as
support for the latest Visual Studio .NET debug formats. Most of the new
functions offer brand new functionality. However, a few just revise or
extend previous DBGHELP APIs to provide better functionality. All of the
old APIs are still there, so your existing code that uses DBGHELP
shouldn't break. The first detail
I noticed when examining the new features of DBGHELP 5.1 was that the
existing documentation was lacking some important information. Although
not incorrect, there were major gaps that needed to be filled before the
scope of the new functionality could be appreciated. In this column,
I'll attempt to fill in some of those gaps. To
start the tour of new DBGHELP features, consider the symbol information
obtained by older APIs such as SymGetSymFromAddr and
SymEnumerateSymbols. There's minimal information about a symbol such as
its name and address, but there's no type information whatsoever.
DBGHELP 5.1 introduces a much more complete and consistent way of
describing symbols. Part of the
reason for reworking the way DBGHELP works with symbols is that it now
supports local variables and parameters. Using the pre-5.1 APIs, you'd
only get symbol information for functions and variables at global scope.
The way that the CPU references locals and parameters is different from
globals, so a new, more descriptive means of describing a symbol is
needed. Another reason for
reworking the symbol support is that DBGHELP 5.1 includes partial
support for symbol types. A symbol type can be a basic type (such as an
integer or a float) or a more complex user-defined type (that is,
structures, unions, enums, and so on). Future versions of DBGHELP may
offer even more complete symbol type support. However, using the new
SYMBOL_INFO structure and some additional code, you can do things like
deconstruct all the members of a structure down to their basic types. DBGHELP
5.1 contains a few new functions for source file support. Although not
documented in the August 2001 SDK, the SymEnumSourceFiles API is self
explanatory and follows the same enumeration model as the other DBGHELP
enumeration APIs. The SymFindFileInPath API lets applications use the
standard debugger logic for locating symbol files. It's not uncommon for
an executable and its associated symbol file to be in separate
directories, so debuggers often have to go hunting around to find the
appropriate symbol files. The
next new addition to DBGHELP 5.1 relates to hunting down symbol files.
With version 5.1 of DBGHELP comes symbol provider capabilities.
Essentially, a symbol provider DLL is a way for DBGHELP to call an
external DLL and have the DLL provide the symbol file. The primary
purpose of this feature is to allow symbol files to be provided
on-demand over the Internet. In the latest downloadable versions of
WinDbg, the debugger uses a Microsoft-written symbol provider that
automatically downloads the latest symbol files on demand for newer
Microsoft operating systems and other selected products. Finally,
DBGHELP 5.1 provides MiniDump capabilities. At any point (including
when a program crashes), DBGHELP can create a minidump file that
contains basic information about the program state such as its threads,
modules, and certain locations in memory. In theory, a minidump file can
be read by current versions of WinDbg and Visual Studio .NET, although
I've had trouble doing so in my simple test cases. Drilling into the New Symbol and Type Information DBGHELP
5.1 introduces the SYMBOL_INFO structure, which is used by a new set of
APIs for looking up and enumerating symbols. A SYMBOL_INFO structure
contains much more than just a name and address. The new DBGHELP 5.1
APIs, namely SymEnumSymbols, SymFromAddr, and SymFromName, all return
SYMBOL_INFO structures. Now let's
take a quick look at some of the fields of a SYMBOL_INFO structure. For
starters, it has a TypeIndex field from which information about the
symbol's type can be learned. For instance, if a symbol represents a
structure, it's possible to enumerate the members of structure and
classes, as well as get their types, sizes, and offsets within the
structure. The
SYMBOL_INFO.TypeInfo member is useless by itself. It doesn't contain any
magic values (such as CodeView format constants) that can be decoded.
However, the TypeInfo member can be passed to the new SymGetTypeInfo API
to retrieve information about the symbol or its underlying type.
SymGetTypeInfo returns a wide variety of information, but its
documentation is less than clear. I'll describe some of the more
interesting options later and leave you to experiment with other
options. The next interesting
field in the SYMBOL_INFO structure is the ModBase, which is the load
address for the module containing the symbol. You'll need to pass this
value to the SymGetTypeInfo API to get correct results whenever you
happen to be working with multiple symbol tables. The
SYMBOL_INFO.Flags member provides useful information about a symbol,
such as whether it's a local variable, a parameter, or a global
variable. If the symbol is a frame-based local variable or parameter,
the IMAGEHLP_SYMBOL_INFO_LOCAL flag is set. If the symbol is a
parameter, the IMAGEHLP_SYMBOL_INFO_PARAMETER flag is also set. The use
of these two flags can be confusing at first, but at least the behavior
is consistent. If the symbol is a
frame-based local variable or parameter, the SYMBOL_INFO.Address member
provides the offset of the variable relative to the stack frame. In
this case, the IMAGEHLP_SYMBOL_INFO_REGRELATIVE flag is on, and the
SYMBOL_INFO.Register field contains an undocumented enumeration
indicating which register holds the frame pointer. Currently, this value
seems to always be 8, indicating the EBP register. Be warned, however,
that the encoding of the Register member may change at some point in the
future. If the parameter or
local variable resides in a register, the IMAGEHLP_SYMBOL_INFO_REGISTER
flag is set in the SYMBOL_INFO.Flags member. The SYMBOL_INFO.Register
field contains the same enumeration which I have just described that
indicates which register is referenced. Of course, if the symbol is a
global variable, the SYMBOL_INFO.Address field is an actual linear
address for the variable. Continuing
through the SYMBOL_INFO structure, the Tag member indicates what type
of symbol is being described. The values correspond to the SymTagEnum
enumeration from the CVCONST.H file which is part of the DIA SDK. Some
of the more commonly encountered Tag values are as follows:
enum SymTagEnum
{
SymTagFunction=5,
SymTagData=7,
SymTagPublicSymbol=10,
SymTagUDT=11,
SymTagEnum=12,
SymTagTypedef=17,
};
In my experience, you'll see all of these Tag values in
executables with full PDB information. If you're working with the
stripped PDB files that Microsoft provides for operating system
components, you'll only see SymTagPublicSymbol types. The Mysterious SymGetTypeInfo The
SymGetTypeInfo API is central to DBGHELP 5.1's support of type
information. The API takes a type index parameter, a module load
address, a process handle, and an IMAGEHLP_SYMBOL_TYPE_INFO enum as
input. The IMAGEHLP_SYMBOL_TYPE_INFO enum specifies what information to
return in the output parameter. I won't cover all the possible enum
values, but I will describe the ones that I've found to be most useful. It's
important to know up front that the meaning of Type Index values is
less than clear. For one thing, they're not like CodeView format type
indices, where values below 0x1000 were predefined types (like integer)
and values above 0x1000 are user-defined types. Also, starting with a
Type Index in a SYMBOL_INFO structure, you can get a TypeId, and that
TypeId must be used in certain other calls to SymGetTypeInfo. The best
thing I can suggest for documentation is to look at my sample code and
see how it uses Type Indexes and what it passes to SymGetTypeInfo. To
determine if a symbol is a user-defined type, you can pass a Type
Index, along with TI_GET_CHILDRENCOUNT to SymGetTypeInfo. If it returns
TRUE, the output parameter indicates how many children it has. If there
are children, call SymGetTypeInfo again, this time passing
TI_FINDCHILDREN and an appropriately initialized TI_FINDCHILDREN_PARAMS
structure. See the sample code for this column for details on how to do
this. Assuming you've done things correctly, you'll get back a
collection of DWORD-sized Type Indexes, one for each structure or
enumeration member. What can you
do with these Child Type Indexes? For starters, you can pass them to
SymGetTypeInfo, with the TI_GET_SYMNAME option. This time, you'll get
back the name of the child member as a Unicode string. Using just the
three calls to SymGetTypeInfo that I've described, you can list all
members of any structure. Don't forget to free the returned Unicode
strings by calling LocalFree. While
the names of the structure members are important, odds are that you'll
want more information about them. That's where additional calls to
SymGetTypeInfo can help out. Some of these calls work directly with the
values returned from the TI_FINDCHILDREN call, while others require a
different value, which you get by calling SymGetTypeInfo with the
TI_GET_TYPEID. There are no hard-and-fast rules here. The sample sources
are a good place to see that what I've found works. To
get the offset of the member within the structure (or class), pass the
type index and TI_GET_OFFSET to SymGetTypeInfo. Likewise, to get the
size of the member, use TI_GET_LENGTH. If the type under examination is
an array, you can use TI_GET_COUNT to determine how many elements are
present. The TI_GET_BASETYPE form
of a SymGetTypeInfo call yields a value known as a base type. These
values correlate to the BasicType enumeration in the previously
mentioned CVCONST.H file from the DIA SDK. From the BasicType, you can
learn if the member is a char, int, unsigned int, float, and so on. Note
that the size is not implicitly part of some of these values. For
example, both floats and doubles show up as btFloats. However, by
determining the size of the member (which you can obtain via a different
call), you can deduce whether it's a float or a double. Enumerating Locals Enumerating
the local variables and parameters of a method is tricky if you don't
know the secret. The first key detail to know is the SymSetContext API,
which currently isn't documented very well. The purpose of SymSetContext
is to let DBGHelp know which method you want to enumerate locals and
parameters for. To call
SymSetContext correctly, you need to pass a pointer to a correctly
initialized structure of type IMAGEHLP_STACK_FRAME. Although there are
many fields in the structure, only one, InstructionOffset, is important
for x86 usage. The InstructionOffset field should be set to the address
within the routine for which you want to enumerate symbols. It should be
a linear address, not a relative virtual address. Having
called SymSetContext correctly, call SymEnumSymbols next. This is a new
API that effectively replaces the older SymEnumerateSymbols API. When
you call SymEnumSymbols to enumerate locals and parameters, it's
critical to pass 0 as the BaseOfDll parameter. If you pass a non-zero
BaseOfAddr value, the API enumerates the global variables of the
specified module. If you use
SymSetContext to access the local variables in a function, be aware that
the debug information is scope-aware. That is, if you declare local
variables other than at the top of the function (inside of an if block,
for example) you won't see that symbol if you just pass the function's
starting address to SymSetContext. If you want the complete set of
locals at a given point in a function, you'll need to call SymSetContext
for the desired address. Showing Off the New DBGHELP APIs In my April 1997 column in Microsoft Systems Journal,
I presented the MSJExceptionHandler code. The idea was that when an
unhandled exception occurred, the code wrote out a report file with some
basic information like the registers and a stack trace. To make this
happen, the code installed an unhandled exception handler. Since the
code relied heavily on the original IMAGEHLP APIs, this was a perfect
opportunity to upgrade the code to take advantage of the newer DBGHELP
APIs. The new version is the WheatyExceptionReport class in Figure 1.
This code did everything that the original code does and adds a few new
features. The most important new feature is the display of names and
values for parameters and locals of all functions in the faulting
thread. To do this, the code takes advantage of both the new symbol
enumeration and the type APIs in DBGHELP 5.1. In
addition, the WheatyExceptionReport code attempts to show the names and
values of all the global variables. I say "attempts" because the type
information available from DBGHELP isn't always complete enough to
accurately know how to format a particular value. This is actually the
case for both local and global variables. While my code for formatting
symbol values can probably be improved, it does a good job using what
I'd call a fairly small amount of code. The
last new feature of WheatyExceptionReport is that the call stack
entries now include the source file and line number, if available. This
capability comes from the SymGetLineFromAddr API, introduced in DBGHELP
5.0. Using WheatyExceptionReport
from unmanaged C++ code is incredibly easy. Simply include the source
and header file in your project and rebuild. The code in Figure 1
defines a global variable called g_WheatyExceptionReport that's of type
WheatyExceptionReport. In the WheatyExceptionReport constructor, the
code calls SetUnhandledExceptionFilter, and sets the handler to
WheatyExceptionReport::WheatyUnhandledExceptionFilter. WheatyExceptionReport
is primarily intended for use in unmanaged programs written using a
Microsoft language compiler. Conceptually, there's no reason why it
couldn't be used in a Microsoft .NET program, but .NET provides much
better error reporting capabilities, so this code wouldn't add much
value. To use WheatyExceptionReport with a language like Visual Basic®
6.0, you'll need to compile the code into a DLL, and make sure the DLL
is loaded early in the application's startup so that the unhandled
exception filter is installed. When
an exception occurs, the WheatyUnhandledExceptionFilter method is
called with a pointer to an EXCEPTION_POINTERS structure. This structure
is interrogated to find the exception type and address, as well as the
register values. The filter function either creates a new file to write
the report to, or appends a new report to an existing file. The report
file is in the same directory as the application EXE file, has the same
base file name, and an .RPT extension. For instance, if C:\FOO\BAR.EXE
faulted, the report file would be C:\FOO\BAR.RPT. Figure 2
shows a portion of an .RPT file. The top few lines indicate the type of
the exception (in this case, an access violation) and its address. The
address is given both as a linear address and a logical address
(section:offset) within a module. Following that is a display of the
register values at the time of the exception. If
you study the figure closely, you'll notice that there are actually two
call stacks. The first call stack is the terse version, with one stack
frame entry per line. Besides the addresses of the instruction and frame
pointers, the lines also contain the name of the function and its
source file information, if available. This call stack is intended to
let you see quickly how control got to the faulting code. The
second call stack comes in the section titled "Local Variables and
Parameters." Here, the first line of each stack entry is identical, as
in the first stack walk. Immediately following each frame is the name
and value of each local variable and parameter. When the variable is a
structure, the name and type of the structure are shown. For instance,
the following is a local variable of type _SYSTEM_INFO, called sysinfo.
Local 'sysinfo' _SYSTEM_INFO
When showing the data members of a structure, each member
is indented. The display code uses recursion to drill down through all
of the nested structures, so nested members are indented in an
appropriate manner. The last
section of the .RPT file is entitled "Global Variables." Here the code
uses SymEnumSymbols, but passes the module handle of the faulting
module. It uses the same symbol formatting function
(WheatyExceptionReport::FormatSymbolValue) as used in the stack frame
walk code. Again, I would caution you that this code is not bulletproof,
but in my testing it does a pretty good job with most simple variable
types, as well as structures. It specifically doesn't attempt to display
any sort of data referenced via a pointer, with the exception of ANSI
strings. To test out
WheatyExceptionReport, I created the TestExceptionHandler project. It
consists of a small executable (TestExceptionHandler.CPP), along with
the WheatyExceptionReport code. The executable just calls down through a
few functions and creates some interesting local variables of simple
and complex types. It then causes an intentional access violation. After
running the program, you should end up with a TestExceptionHandler.RPT
file, which is viewable with any ASCII editor of your choice. Since the
project chains control onto the previous handler (after writing out the
report), you will see the system dialog informing you of an unhandled
exception. The information from
an .RPT file isn't nearly as complete as the information from a minidump
crash file. However, you may not want to use the minidump facilities,
or you may have different needs. Feel free to extend the
WheatyExceptionReport code any way you find useful and make sure to let
me know if you do anything interesting with it! Wrap-up DBGHELP
5.1, introduced in Windows XP, provides a healthy collection of new
functions. In this column, I've shown you some of the new type and
symbol enumeration APIs. There are even a few I didn't have space to
cover here, but which are interesting nonetheless (for example,
SymEnumTypes). Using these new
APIs, I've constructed an updated exception reporting facility that's
incredibly simple to use. The code is well commented and shows how to
call the new DBGHELP APIs. On my Web site (http://www.wheaty.net)
I've also put the source for another utility (DbgHelpDemo) that does a
more formal dump of the symbol and type information within a PDB file.
While the DBGHELP information certainly isn't enough to write a
full-blown debugger all by itself, the source for these two programs
shows how DBGHELP can be a valuable asset in many cases.
Send questions and comments for Matt to hood@microsoft.com. |