|Copyright © Microsoft Corporation. This document is an archived reproduction of a version originally published by Microsoft. It may have slight formatting modifications for consistency and to improve readability.|
In my May 1997
column, I used some APIs from IMAGEHLP.DLL as part of a framework for
reporting on unhandled exceptions. Since then I've received quite a bit
of email about the use of those APIs, indicating that IMAGEHLP.DLL is an
area of widespread interest. Unfortunately, in many ways the IMAGEHLP
documentation assumes that you're comfortable working with executable
files and symbol tables. It's also weak in explaining which APIs need to
be used, and in what particular order to perform a given task. The
result is that many developers who would benefit from using IMAGEHLP.DLL
get lost in the documentation.
This month, I'll go over a different subset of the IMAGEHLP APIs to show how their powerful features can be implemented with a few simple lines of code. To demonstrate how easy it is to use IMAGEHLP APIs, I wrote the EZPE program, a PE file-display program that also displays debug symbols belonging to an executable (that is, EXEs, DLLs, OCXs, and so on). It displays information similar to programs like DUMPBIN from Visual C++® or PEDUMP from my book, Windows 95 System Programming Secrets (IDG Books, 1995). The key difference is that EZPE never touches the executable file directly, and it doesn't grovel through data structures like other PE file- display programs. Instead, EZPE lets the IMAGEHLP APIs do all the hard work and effectively demonstrates the proper use of the APIs as a by-product.
Another nice feature that falls out from EZPE's use of the IMAGEHLP APIs is that you can see the symbol names and addresses contained within debug information, such as the DBG files that are provided for Windows NT components. You can also use EZPE to see the symbols contained with PDB files, something that even DUMPBIN can't do. All you have to do is make sure that the symbol table file (for example, PDB or DBG) is in the same directory as the executable it belongs to. Running EZPE on the executable file causes it to automatically find the symbols in the PDB or DBG file, as appropriate. The beauty is that EZPE kicks back and lets IMAGEHLP.DLL do the hard work of finding and loading the symbol tables. More on this later.
Before jumping into a description of the IMAGEHLP APIs, a quick review of IMAGEHLP's availability is worthwhile. IMAGEHLP is a standard component of Windows NT® 4.0. However, it's a redistributable DLL, so you can ship it with your app if it needs to run on Windows® 95. Be aware, though, that certain functions in IMAGEHLP don't work under Windows 95 (at least not in the IMAGEHLP.DLL that was available when I wrote this). The import library and header file for IMAGEHLP.DLL can be found in any Win32® SDK that shipped on or after the release date of Windows NT 4.0 (July 31, 1996). IMAGEHLP isn't specific to any one CPU platform. I built my EZPE program on a DEC Alpha, and it worked perfectly the first time.
The first IMAGEHLP API to look at is MapAndLoad. You'd use this if you were interested only in the contents of an executable and didn't care about any debug information that might be available. Although the IMAGEHLP documentation is vague about exactly what MapAndLoad does, it's really quite simple. First, the function goes through the necessary gyrations to make a memory mapped file corresponding to the specified executable. Internally, MapAndLoad goes through the standard OpenFile, CreateFileMapping, MapViewOfFile sequence. Because these underlying APIs open up handles, it's important that you call the matching UnMapAndLoad API when you're done to close all the handles.
After memory mapping the executable file, MapAndLoad fills in the LOADED_IMAGE structure that was passed in. There are a number of key fields in this structure that are likely to be valuable to you. The MappedAddress field is where the executable is mapped into memory (that is, it's what the internal call to MapViewOfFile returned). The FileHeader field contains a pointer to an IMAGE_NT_ HEADERS structure, which is defined in WINNT.H. The IMAGE_NT_HEADERS structure is better known as the PE header, and contains all the vital values for the executable. This structure has been described in numerous articles (many of which are in the Microsoft KnowledgeBase), so I won't dwell on it here. However, EZPE does a rudimentary printout of the PE header contents without putting too much effort into interpreting the fields.
The Sections field in the LOADED_IMAGE structure is a pointer to the PE section table, which is an array of IMAGE_SECTION_HEADERS that is also defined in WINNT.H. The number of sections in the array is given by the (you guessed it) NumberOfSections field. An IMAGE_ SECTION_HEADER structure contains the name of a section, its size, its attributes, and its location within the executable file. The EZPE program prints out the important contents of each IMAGE_SECTION_HEADER in sequence, again without too much effort doing things such as breaking down the attributes into meaningful flags like PAGE_READONLY.
The final field in the LOADED_IMAGE structure worth mentioning here is the Characteristics field. Using this is just a shortcut to grabbing the Characteristics field out of the PE header. The characteristics flags are defined in WINNT.H and include values such as IMAGE_FILE_DLL, which means the executable is a DLL rather than a program (EXE) file.
As the main function in EZPE.CPP shows (see Figure 1), the MapAndLoad and UnMapAndLoad APIs can be used without any advanced preparation, unlike the symbol table APIs that I'll get to shortly. MapAndLoad is relatively lightweight and executes quickly. Using just MapAndLoad, and knowing the contents of various PE file data structures, you can quickly access nearly everything of importance in an executable file.
As a final note on MapAndLoad, it's important to remember that it creates a linear mapping of the entire file in one contiguous chunk. This is different from the Win32 loader bringing an executable module into memory, creating distinct mappings for each section so that it starts on a page boundary in memory. The result of this linear mapping is that any Relative Virtual Addresses (RVA) that you might see in the PE header aren't directly usable with the image as loaded by MapAndLoad. To use an RVA in this situation, you'd have to adjust it to account for the difference between the section's file offset and its in-memory address. Luckily, IMAGEHLP.DLL provides an API, ImageRVAToVa, that will do this for you.
If it's symbol table information you're after, the equivalent to MapAndLoad is the MapDebugInformation API. You can think of MapDebugInformation as a superset of the MapAndLoad API. Besides mapping the executable file into memory, this API also figures out what the best type of symbol information is as well as some basic information about that symbol table. What exactly do I mean by "best"? It turns out that an executable can be built with more than one type of debug information. For example, you can create an executable with both CodeView (PDB) information and a COFF symbol table. IMAGEHLP knows how to read both formats, as well as a few others, and knows which one is optimal for your executable. More on this later. Just as the MapAndLoad API eventually needs to be followed by a call to UnMapAndLoad, MapDebugInformation also needs to be cleaned up by calling UnmapDebugInformation.
Because the symbols for an executable may be in a file other than the executable itself, the MapDebugInformation API takes a parameter not needed for the MapAndLoad API—the symbol search path. By default, IMAGEHLP searches for symbol files in a series of paths that I'll describe later. However, the MapDebugInformation API lets you override these paths. This is what I've done in the EZPE source where it calls MapDebugInformation.
Besides mapping and loading the executable and its symbols (if present), the MapDebugInformation API returns a pointer to an IMAGE_DEBUG_INFORMATION structure. This structure contains many more fields than a LOADED_IMAGE structure, although nearly every field in the LOADED_IMAGE structure can be found in the IMAGE_DEBUG_INFORMATION structure. For example, the MappedBase field contains the address where the executable was mapped, and is the same as the MappedAddress field in a LOADED_IMAGE structure. Similarly, the Sections field is a pointer to the executable's section table, and so forth.
More useful information found in the IMAGE_DEBUG_ INFORMATION structure includes the preferred load address (the ImageBase field), and the size of the executable in memory (the SizeOfImage field). There are also pointers to the table of names for the exported functions, as well as the executable's time/date stamp DWORD. You can pass this DWORD to the C++ ctime function to get the time and date when the executable was built. For more information on the time/date stamp, see my February 1997 column.
The meaning of some fields in the IMAGE_DEBUG_ INFORMATION structure isn't so obvious—like the pointers to Function and FPO tables. The Function table is data used by the structured exception handling code on the Alpha and MIPS platforms (it's not encountered with Intel-based executables). FPO information is seen only on the Intel platform; it helps debuggers walk the call stack in the absence of standard EBP register stack frames.
Finally, the IMAGE_DEBUG_INFORMATION has a variety of fields that indicate if CodeView and COFF information are present, and if so, where. There's even a pointer to the debug directory. This is the data structure in the PE file that tells you what types of debug information are present and where. The MapDebugInformation API does a good job of extracting this information and presenting it in the IMAGE_DEBUG_INFORMATION structure. Still, if you're so inclined, you can go straight to the same raw data that IMAGEHLP uses to generate the IMAGE_ DEBUG_INFORMATION structure. Remember though, the whole advantage of using IMAGEHLP is to avoid such low-level grunginess.
So far, the two APIs I've examined (MapAndLoad and MapDebugInformation) simply map an executable into memory and extract some useful information from it. Neither API loads a symbol table, although calling one of them is effectively a prerequisite to using the symbol table APIs. The key piece of data needed is the mapped address of the executable. The symbol table APIs work from the mapped executable to find and load the appropriate symbol table into memory.
The first IMAGEHLP symbol table API you should be aware of is SymInitialize, which sets up internal variables in IMAGEHLP so that the DLL is prepared to load symbol tables for the executable and possibly multiple DLLs within a process. As you might expect, there's a corresponding shutdown API, SymCleanup, that should be called when you're finished working with symbols.
The first parameter to SymInitialize is an identifier for a process that you want to use when working with symbols. If you were using IMAGEHLP as part of a real debugger, you'd want to pass a valid process handle. This allows IMAGEHLP to enumerate through all the loaded modules in a process address space and load the associated symbol tables. You can turn off this automatic module enumeration by passing FALSE as the third parameter to SymInitialize. If you're not a debugger process (EZPE isn't), you can pass whatever value you'd like as the process handle. Just remember to pass the same value to subsequent symbol APIs that expect a process handle. In the case of EZPE, I used the value zero through a #define called MY_ PROCESS_HANDLE.
(As a side note to the automatic module enumeration I referred to, the Windows NT 4.0-supplied IMAGEHLP won't do this under Windows 95. The module enumeration APIs under Windows NT are different than those in Windows 95. However, these differences are slated to be resolved in a subsequent release.)
The second parameter to SymInitialize is the symbol search path. If you pass a valid string pointer in the form of a path (that is, directories separated by semicolons), IMAGEHLP searches those directories when looking for a symbol table that's in a different file than the executable. Passing zero causes IMAGEHLP to use three environment variables as the path: _NT_SYMBOL_PATH, _NT_ALTERNATE_SYMBOL_PATH, and SYSTEMROOT.
After you've called SymInitialize, the next step is to load the symbol tables you're interested in—that is, assuming you didn't pass a valid process handle to SymInitialize so that it enumerated and loaded all the symbol tables automatically. EZPE doesn't do this, so it's necessary to manually load the symbol table for the executable file that it's working with. The API that manually loads a symbol table is SymLoadModule. Not surprisingly, there's a SymUnloadModule to use when you're done with a given symbol table.
Although the SymLoadModule API takes six parameters, only three are required for a simple program like EZPE. The first parameter is the process handle value that was passed to SymInitialize earlier in the program. Parameter three is the name of the executable file whose symbols are to be loaded. Parameter five is the address where the executable is mapped into memory. As I alluded to earlier, this value can be obtained easily by calling MapAndLoad or MapDebugInformation. Assuming all goes well, SymLoadModule returns TRUE.
After loading a symbol table, there are a variety of actions available. For example, in my May 1997 column I used the SymGetSymFromAddr function to take an address and find the name of the nearest symbol. The end result was a stack trace containing symbolic function names. If I were writing a debugger, I could use the SymGetSymFromName API to look up the address of a function or variable name that the user requested.
With EZPE, the first action after loading a symbol table is to find out more about what was just loaded. This can be done with the SymGetModuleInfo API. The first parameter is the process handle value used with the other symbol APIs. The second parameter needs to be an address somewhere within the module that the symbol table belongs to. In the EZPE code, the easiest thing to use is the base address to which the executable was memory mapped. The third parameter to SymGetModuleInfo is a pointer to an IMAGEHLP_MODULE structure that the API fills with information about the module and its symbol table.
The first group of fields in an IMAGEHLP_MODULE structure is standard stuff that you can get in ways that I described earlier. More interesting is the SymType and NumSyms fields. The SymType field contains an enum that indicates what type of symbol table was loaded (for example, SymCoff, SymCv, SymPdb, or SymExport).
The SymExport type is worth a mention. Exports aren't formally considered to be debug information. However, the information stored for an exported function (its name and address) is the bare minimum required for inclusion in a symbol table. Therefore, IMAGEHLP can synthesize a symbol table out of an executable's exports. The upshot is that any executable that exports symbols can be considered to have at least a minimal symbol table available. (By the way, if you're a SoftIce user, the Load EXPORTS capability works along the same lines.)
Another, more useful action you can take with a loaded symbol table is to enumerate through all the symbols. For this purpose, IMAGEHLP has the SymEnumerateSymbols API. The first parameter is the process handle value used with the other symbol APIs. Parameter two is the base address of the executable whose symbols you're interested in. The third parameter is the address of a callback function that will be called once for each symbol in the symbol table. The fourth parameter can be whatever you'd like. It's passed on to the callback function, unmodified. If I didn't want to use the SymEnumerateSymbols API, I could use the GymGetSymNext API in a loop instead. Both APIs have their strengths and weaknesses, so I just picked one arbitrarily for EZPE to use.
The EZPE Code
Now let's look at the EZPE program and its code. EZPE is a command-line program that accepts arguments. The source file EZPE.CPP is shown in Figure 1. You can see its usage by running EZPE with no arguments:
Syntax: EZPE [options] <filename> -d Decorated C++ names -n No symbol display
|In the simplest case, you'd give EZPE the name of an executable file to display. EZPE outputs to the stdout, so its output can be redirected to a file. For example:|
EZPE C:\WINNT\SYSTEM32\KERNEL32.DLL > results
shows the results of running EZPE on its own EXE. The -n option tells
EZPE to not bother loading and displaying the symbols. If you were to
use the -n option, everything after the "==== IMAGE_DEBUG_ INFORMATION
====" line in Figure 2 would be omitted from the program output.|
The -d option tells EZPE to display the decorated (mangled) names of any C++ symbols in the symbol table. By default, when SymLoadModule creates the symbol table, it undecorates any C++ symbols into human readable form. The undecorated name consists solely of the class name and member function name, such as foo::bar. This is the default output mode that EZPE uses. The -d option tells EZPE to emit the raw, decorated names instead: ?ParseCommandLine@@YAHHQAPADPAD1@Z.
While I was writing EZPE, it occurred to me that the default undecoration strips out lots of potentially useful information such as the parameters, calling convention, return type, and so forth. Therefore, when using the -d option, EZPE displays the decorated name as well as an undecorated version that contains much more information about the symbol.
Coming up with a better undecorated version of a symbol name turned out to be a bit of a challenge. The first requirement was to force SymLoadModule to leave the symbol names alone when loading the symbol table. Luckily, there's another IMAGEHLP API that makes this easy—the SymSetOptions API takes a flag called SYMOPT_UNDNAME, which isn't a default setting. Because I wanted to change only that option and leave the others alone, the code calls SymGetOptions to get the current options. It then ORs in the SYMOPT_UNDNAME flag and calls SymSetOptions with the result.
The remaining work of displaying a better undecorated symbol name is to call yet another IMAGEHLP API, UndecorateSymbolName, for any name that appears to be decorated (decorated names begin with a "?"). UndecorateSymbolName takes a whole slew of parameters that tell it what parts of an undecorated name to include or not include. The EZPE code uses the set of options that should produce the most information in the undecorated name.
When I tested EZPE, the UndecorateSymbolName failed on certain symbol names. A little investigation proved that some symbols had garbage characters at the end of their names, whether the name was normal or decorated. Apparently, IMAGEHLP leaves garbage at the end of certain symbol names when operating with the SYMOPT_ UNDNAME option enabled. For normal names, I didn't go to the trouble of trying to strip off the garbage characters. However, I did notice that most C++ symbol names end with a capital Z. In the EnumSymbolsCallback function from EZPE.CPP, you'll find that the code works backwards from the end of C++ symbol names, stripping off characters until it encounters a Z. Not pretty, but it seems to work OK.
Another interesting thing about the EZPE EnumSymbolsCallback function concerns the fourth parameter. It turns out that when IMAGEHLP calls the function, the symbol address it passes is a linear address and is connected to where the executable was mapped into memory. For a debugger operating on a live process, this is just fine. However, in a symbol display program, it's worthless. The executable could be mapped nearly anywhere.
To resolve this situation, I made EZPE emit the RVA of the symbol rather than the value IMAGEHLP passes to the callback function. (An RVA is independent of the executable's mapped address, and just makes more sense since PE files themselves store all addresses as RVAs.) To calculate the RVA of each symbol, the EnumSymbolsCallback has to know where the executable is mapped into memory. Luckily, SymEnumerateSymbols has a parameter that it passes on, unmodified, to the enumeration callback function. EZPE uses this parameter to convey the executable's mapped address to the enumeration callback function. In the callback, the code subtracts this value from the symbol address to obtain the symbol's RVA. You'll see this in the portion of Figure 2 that begins with the header "==== Symbols ====". In particular, note that the addresses for the symbols are relatively small and fall within the RVAs listed for the various PE file sections.
As a final wrap-up, let me first apologize for the macro madness at the beginning (for example, the DisplayPtrFieldD macro). When I was writing EZPE, I knew that it would display many fields from numerous structures. I wanted these fields to be formatted in a nice, consistent manner. If I had used printf directly, I would need to modify each printf individually if I wanted to change any output formatting. Making EZPE into a GUI app would have been even more of a pain. By using nested macros and the preprocessor stringize feature, I was able to isolate all the details of how the structure fields should be displayed into one location.
If you're ambitious and want to extend or customize EZPE, there are a number of things you can do. For example, you could remove the display of the various IMAGEHLP-specific data structures. I included them to show what sort of information IMAGEHLP gives you. The resulting output would be smaller and would include only information from the executable and symbol tables. Another nice feature would be to decode the various fields containing flags, such as the Characteristics field in the PE header, or the PE section attributes. Even with this extra code, you'd have a very compact program, which is a testament to the power that IMAGEHLP.DLL provides.
Have a question about programming in Windows? Send it to Matt at email@example.com