In my May 1997
column, I used some APIs from IMAGEHLP.DLL as part of a framework for
reporting on unhandled exceptions. Since then I've received quite a bit
of email about the use of those APIs, indicating that IMAGEHLP.DLL is an
area of widespread interest. Unfortunately, in many ways the IMAGEHLP
documentation assumes that you're comfortable working with executable
files and symbol tables. It's also weak in explaining which APIs need to
be used, and in what particular order to perform a given task. The
result is that many developers who would benefit from using IMAGEHLP.DLL
get lost in the documentation.
This
month, I'll go over a different subset of the IMAGEHLP APIs to show how
their powerful features can be implemented with a few simple lines of
code. To demonstrate how easy it is to use IMAGEHLP APIs, I wrote the
EZPE program, a PE file-display program that also displays debug symbols
belonging to an executable (that is, EXEs, DLLs, OCXs, and so on). It
displays information similar to programs like DUMPBIN from Visual C++® or PEDUMP from my book, Windows 95 System Programming Secrets
(IDG Books, 1995). The key difference is that EZPE never touches the
executable file directly, and it doesn't grovel through data structures
like other PE file- display programs. Instead, EZPE lets the IMAGEHLP
APIs do all the hard work and effectively demonstrates the proper use of
the APIs as a by-product.
Another
nice feature that falls out from EZPE's use of the IMAGEHLP APIs is
that you can see the symbol names and addresses contained within debug
information, such as the DBG files that are provided for Windows NT
components. You can also use EZPE to see the symbols contained with PDB
files, something that even DUMPBIN can't do. All you have to do is make
sure that the symbol table file (for example, PDB or DBG) is in the same
directory as the executable it belongs to. Running EZPE on the
executable file causes it to automatically find the symbols in the PDB
or DBG file, as appropriate. The beauty is that EZPE kicks back and lets
IMAGEHLP.DLL do the hard work of finding and loading the symbol tables.
More on this later.
Before
jumping into a description of the IMAGEHLP APIs, a quick review of
IMAGEHLP's availability is worthwhile. IMAGEHLP is a standard component
of Windows NT® 4.0. However, it's a redistributable DLL, so you can ship it with your app if it needs to run on Windows®
95. Be aware, though, that certain functions in IMAGEHLP don't work
under Windows 95 (at least not in the IMAGEHLP.DLL that was available
when I wrote this). The import library and header file for IMAGEHLP.DLL
can be found in any Win32®
SDK that shipped on or after the release date of Windows NT 4.0 (July
31, 1996). IMAGEHLP isn't specific to any one CPU platform. I built my
EZPE program on a DEC Alpha, and it worked perfectly the first time.
IMAGEHLP APIs
The
first IMAGEHLP API to look at is MapAndLoad. You'd use this if you were
interested only in the contents of an executable and didn't care about
any debug information that might be available. Although the IMAGEHLP
documentation is vague about exactly what MapAndLoad does, it's really
quite simple. First, the function goes through
the necessary gyrations to make a memory mapped file corresponding to
the specified executable. Internally, MapAndLoad goes through the
standard OpenFile, CreateFileMapping, MapViewOfFile sequence. Because
these underlying APIs open up handles, it's important that you call the
matching UnMapAndLoad API when you're done to close all the handles.
After
memory mapping the executable file, MapAndLoad fills in the
LOADED_IMAGE structure that was passed in. There are a number of key
fields in this structure that are likely to be valuable to you. The
MappedAddress field is where the executable is mapped into memory (that
is, it's what the internal call to MapViewOfFile returned). The
FileHeader field contains a pointer to an IMAGE_NT_
HEADERS structure, which is defined in WINNT.H. The IMAGE_NT_HEADERS
structure is better known as the PE header, and contains all the vital
values for the executable. This structure has been described in numerous
articles (many of which are in the Microsoft KnowledgeBase), so I won't
dwell on it here. However, EZPE does a rudimentary printout of the PE
header contents without putting too much effort into interpreting the
fields.
The
Sections field in the LOADED_IMAGE structure is a pointer to the PE
section table, which is an array of IMAGE_SECTION_HEADERS that is also
defined in WINNT.H. The number of sections in the array is given by the
(you guessed it) NumberOfSections field. An IMAGE_
SECTION_HEADER structure contains the name of a section, its size, its
attributes, and its location within the executable file. The EZPE
program prints out the important contents of each IMAGE_SECTION_HEADER
in sequence, again without too much effort doing things
such as breaking down the attributes into meaningful
flags like PAGE_READONLY.
The
final field in the LOADED_IMAGE structure worth mentioning here is the
Characteristics field. Using this is just a shortcut to grabbing the
Characteristics field out of the PE header. The characteristics flags
are defined in WINNT.H and include values such as IMAGE_FILE_DLL, which
means the executable is a DLL rather than a program (EXE) file.
As the main function in EZPE.CPP shows (see Figure 1),
the MapAndLoad and UnMapAndLoad APIs can be used without any advanced
preparation, unlike the symbol table APIs that I'll get to shortly.
MapAndLoad is relatively lightweight and executes quickly. Using just
MapAndLoad, and knowing the contents of various PE file data structures,
you can quickly access nearly everything of importance in an executable
file.
As a
final note on MapAndLoad, it's important to remember that it creates a
linear mapping of the entire file in one contiguous chunk. This is
different from the Win32 loader bringing an executable module into
memory, creating distinct mappings for each section so that it starts on
a page boundary in memory. The result of this linear mapping is that
any Relative Virtual Addresses (RVA) that you might see in the PE header
aren't directly usable with the image as loaded by MapAndLoad. To use
an RVA in this situation, you'd have to adjust it to account for the
difference between the section's file offset and its in-memory address.
Luckily, IMAGEHLP.DLL provides an API, ImageRVAToVa, that will do this
for you.
If
it's symbol table information you're after, the equivalent to MapAndLoad
is the MapDebugInformation API. You can think of MapDebugInformation as
a superset of the MapAndLoad API. Besides mapping the executable file
into memory, this API also figures out what the best type of symbol
information is as well as some basic information about that symbol
table. What exactly do I mean by "best"? It turns out that an executable
can be built with more than one type of debug information. For example,
you can create an executable with both CodeView (PDB) information and a
COFF symbol table. IMAGEHLP knows how to read both formats, as well as a
few others, and knows which one is optimal for your executable. More on
this later. Just as the MapAndLoad API eventually needs to be followed
by a call to UnMapAndLoad, MapDebugInformation also needs to be cleaned
up by calling UnmapDebugInformation.
Because
the symbols for an executable may be in a file other than the executable
itself, the MapDebugInformation API takes a parameter not needed for
the MapAndLoad API—the symbol search path. By default, IMAGEHLP searches
for symbol files in a series of paths that I'll describe later.
However, the MapDebugInformation API lets you override these paths. This
is what I've done in the EZPE source where it calls
MapDebugInformation.
Besides
mapping and loading the executable and its symbols (if present), the
MapDebugInformation API returns a pointer to an IMAGE_DEBUG_INFORMATION
structure. This structure contains many more fields than a LOADED_IMAGE
structure, although nearly every field in the LOADED_IMAGE structure can
be found in the IMAGE_DEBUG_INFORMATION structure. For example, the
MappedBase field contains the address where the executable was mapped,
and is the same as the MappedAddress field in a LOADED_IMAGE structure.
Similarly, the Sections field is a pointer to the executable's section
table, and so forth.
More
useful information found in the IMAGE_DEBUG_
INFORMATION structure includes the preferred load address (the ImageBase
field), and the size of the executable in memory (the SizeOfImage
field). There are also pointers to the table of names for the exported
functions, as well as the executable's time/date stamp DWORD. You can
pass this DWORD to the C++ ctime function to get the time and date when
the executable was built. For more information on the time/date stamp,
see my February 1997 column.
The
meaning of some fields in the IMAGE_DEBUG_
INFORMATION structure isn't so obvious—like the pointers to Function and
FPO tables. The Function table is data used by the structured exception
handling code on the Alpha and MIPS platforms (it's not encountered
with Intel-based executables). FPO information is seen only on the Intel
platform; it helps debuggers walk the call stack in the absence of
standard EBP register stack frames.
Finally,
the IMAGE_DEBUG_INFORMATION has a variety of fields that indicate if
CodeView and COFF information are present, and if so, where. There's
even a pointer to the debug directory. This is the data structure in the
PE file that tells you what types of debug information are present and
where. The MapDebugInformation API does a good job of extracting this
information and presenting it in the IMAGE_DEBUG_INFORMATION structure.
Still, if you're so inclined, you can go straight to the same raw data
that IMAGEHLP uses to generate the IMAGE_
DEBUG_INFORMATION structure. Remember though, the whole advantage of
using IMAGEHLP is to avoid such low-level grunginess.
So far,
the two APIs I've examined (MapAndLoad and MapDebugInformation) simply
map an executable into memory and extract some useful information from
it. Neither API loads a symbol table, although calling one of them is
effectively a prerequisite to using the symbol table APIs. The key piece
of data needed is the mapped address of the executable. The symbol
table APIs work from the mapped executable to find and load the
appropriate symbol table into memory.
The first
IMAGEHLP symbol table API you should be aware of is SymInitialize,
which sets up internal variables in IMAGEHLP so that the DLL is prepared
to load symbol tables for the executable and possibly multiple DLLs
within a process. As you might expect, there's a corresponding shutdown
API, SymCleanup, that should be called when you're finished working with
symbols.
The first
parameter to SymInitialize is an identifier for a process that you want
to use when working with symbols. If you were using IMAGEHLP as part of
a real debugger, you'd want to pass a valid process handle. This allows
IMAGEHLP to enumerate through all the loaded modules in a process
address space and load the associated symbol tables. You can turn off
this automatic module enumeration by passing FALSE as the third
parameter to SymInitialize. If you're not a debugger process (EZPE
isn't), you can pass whatever value you'd like as the process handle.
Just remember to pass the same value to subsequent symbol APIs that
expect a process handle. In the case of EZPE, I used the value zero
through a #define called MY_
PROCESS_HANDLE.
(As a
side note to the automatic module enumeration I referred to, the Windows
NT 4.0-supplied IMAGEHLP won't do this under Windows 95. The module
enumeration APIs under Windows NT are different than those in Windows
95. However, these differences are slated to be resolved in a subsequent
release.)
The
second parameter to SymInitialize is the symbol search path. If you pass
a valid string pointer in the form of a path (that is, directories
separated by semicolons), IMAGEHLP searches those directories when
looking for a symbol table that's in a different file than the
executable. Passing zero causes IMAGEHLP to use three environment
variables as the path: _NT_SYMBOL_PATH, _NT_ALTERNATE_SYMBOL_PATH, and
SYSTEMROOT.
After
you've called SymInitialize, the next step is to load the symbol tables
you're interested in—that is, assuming you didn't pass a valid process
handle to SymInitialize so that it enumerated and loaded all the symbol
tables automatically. EZPE doesn't do this, so it's necessary to
manually load the symbol table for the executable file that it's working
with. The API that manually loads a symbol table is SymLoadModule. Not
surprisingly, there's a SymUnloadModule to use when you're done with a
given symbol table.
Although
the SymLoadModule API takes six parameters, only three are required for a
simple program like EZPE. The first parameter is the process handle
value that was passed to SymInitialize earlier in the program. Parameter
three is the name of the executable file whose symbols are to be
loaded. Parameter five is the address where the executable is mapped
into memory. As I alluded to earlier, this value can be obtained easily
by calling MapAndLoad or MapDebugInformation. Assuming all goes well,
SymLoadModule returns TRUE.
After loading a symbol table, there are a variety of actions available. For example, in my
May 1997
column I used the SymGetSymFromAddr function to take an address and
find the name of the nearest symbol. The end result was a stack trace
containing symbolic function names. If I were writing a debugger, I
could use the SymGetSymFromName API to look up the address of a function
or variable name that the user requested.
With
EZPE, the first action after loading a symbol table is to find out more
about what was just loaded. This can be done with the SymGetModuleInfo
API. The first parameter is the process handle value used with the other
symbol APIs. The second parameter needs to be an address somewhere
within the module that the symbol table belongs to. In the EZPE code,
the easiest thing to use is the base address to which the executable was
memory mapped. The third parameter to SymGetModuleInfo is a pointer to
an IMAGEHLP_MODULE structure that the API fills with information about
the module and its symbol table.
The first
group of fields in an IMAGEHLP_MODULE structure is standard stuff that
you can get in ways that I described earlier. More interesting is the
SymType and NumSyms fields. The SymType field contains an enum that
indicates what type of symbol table was loaded (for example, SymCoff,
SymCv, SymPdb, or SymExport).
The
SymExport type is worth a mention. Exports aren't formally considered to
be debug information. However, the information stored for an exported
function (its name and address) is the bare minimum required for
inclusion in a symbol table. Therefore, IMAGEHLP can synthesize a symbol
table out of an executable's exports. The upshot is that any executable
that exports symbols can be considered to have at least a minimal
symbol table available. (By the way, if you're a SoftIce user, the Load
EXPORTS capability works along the same lines.)
Another,
more useful action you can take with a loaded symbol table is to
enumerate through all the symbols. For this purpose, IMAGEHLP has the
SymEnumerateSymbols API. The first parameter is the process handle value
used with the other symbol APIs. Parameter two is the base address of
the executable whose symbols you're interested in. The third parameter
is the address of a callback function that will be called once for each
symbol in the symbol table. The fourth parameter can be whatever you'd
like. It's passed on to the callback function, unmodified. If I didn't
want to use the SymEnumerateSymbols API, I could use the GymGetSymNext
API in a loop instead. Both APIs have their strengths and weaknesses, so
I just picked one arbitrarily for EZPE to use.
The EZPE Code
Now let's
look at the EZPE program and its code. EZPE is a command-line program
that accepts arguments. The source file EZPE.CPP is shown in Figure 1. You can see its usage by running EZPE with no arguments:
|