Copyright © Microsoft Corporation. This document is an archived reproduction of a version originally published by Microsoft. It may have slight formatting modifications for consistency and to improve readability.
November 1997




Code for this article: Hood1197.exe (9KB)

Matt Pietrek is the author of Windows 95 System Programming Secrets (IDG Books, 1995). He works at NuMega Technologies Inc., and can be reached at mpietrek@tiac.com or at http://www.wheaty.net.

When I first started writing the article on Windows NT® 5.0 that appears in this issue, I didn't have much to work with other than the operating system itself. With no SDK docs or white papers, figuring out what's new came down to comparing the Windows NT 4.0 and 5.0 system DLLs to see what new exported APIs were added (or possibly deleted).
In the past, I would have compared the exports from two similar DLLs like this:
  • Run DUMPBIN /EXPORTS on both DLLs, redirecting the output to separate files
  • Run the SORT program on both files, redirecting the output
  • Run the FC (File Compare) program on the two sorted files
The output from the FC program would consist of the APIs that had changed between the two DLLs.
Although this process isn't horrible and can be done in under a minute, it obviously isn't suitable when you have dozens or hundreds of DLLs to examine. While I could have spent considerable time designing an elaborate executable file-comparison program, there's something to be said for just hacking out something good enough for the job at hand. No GUI, no fancy algorithms—just grab the information and let the CPU burn cycles to give you the results.
The program I came up with is called PEDIFF, and I was pleasantly surprised that I spent only an hour getting the first version to work well enough for my own needs. Alas, there were some restrictions and limitations that made it unsuitable for a column. Therefore, for you, the intrepid reader, I went back in and made it more robust (and, as a by-product, faster). The name PEDIFF is somewhat of a misnomer since it only compares PE file exports, but the code can easily be extended to list other PE file differences.

PEDIFF Basics
The structure of PEDIFF operations can be expressed very simply in pseudocode:
  • Get the names of the two files to compare
  • Load the exports for both files into two separate lists
  • Walk though list 1, flagging all entries that are also in list 2
  • Walk though list 1, printing out all entries not previously flagged
  • Walk though list 2, printing out all entries not previously flagged
The code that implements this sequence of steps is PEDIFF.CPP (see Figure 1). Function main begins by calling ProcessCommandLine, which parses the command-line arguments to come up with the names of the two files to compare. Assuming the command line is reasonable, function main creates two instances of a class that I called PEExportList. Both PEExportList class instances are passed separately to a helper function called LoadExportInfo, which I'll describe later.
The PEExportList class is defined and implemented in PEExportList.H and PEExportList.CPP (see Figure 2). Internally, the PEExportList class maintains a list (actually, an array) of ExportedSymbolInfo structures. An ExportedSymbolInfo struct is minimalist and really just associates an exported function name with a set of flags:
 struct ExportedSymbolInfo
 {
     char *      m_pszName;
     unsigned    m_flags;
 };
The three methods of the PEExportList class all work with ExportedSymbolInfo structures. The PEExportList:: AddSymbol method adds a new entry to the end of the list. The PEExportList::LookupSymbol method takes the name of an exported function and returns a pointer to the ExportedSymbolInfo previously created for it, or NULL if the symbol isn't found.
The PEExportList::GetNextSymbol method allows for list enumeration. To start an enumeration, pass in a NULL pointer. The method returns a pointer to the first ExportedSymbolInfo in the list. To continue the enumeration, pass in the ExportedSymbolInfo pointer returned by the previous call. I won't go into the class's implementation details any further, except to say that the list is really an array that's dynamically reallocated if the number of entries grows beyond the current limits. Perhaps not the most elegant implementation, but it was fast and easy to write.
Now let's look at the LoadExportInfo code, which fills up a PEExportList instance with API goodies to compare. In keeping with my fast, brute-force mandate, I needed to enumerate the exported functions from a DLL with minimum fuss. My first implementation used the IMAGEHLP MapDebugInformation, SymLoadModule, and SymEnumerateSymbols APIs. The underlying idea is that IMAGEHLP.DLL synthesizes a symbol table from the exports of an image (in the absence of debug information). The flaw with this approach is that if you have symbol tables lying around, SymLoadModule won't bother loading the exports. An example of having a symbol table lying around is if you installed the .DBG files that are available for Windows NT system DLLs.
Using IMAGEHLP's symbol table routines ultimately didn't work out, but it did get me thinking about other ways that I could use IMAGEHLP to get at the exported API list. The approach I took was to read the export list directly, but to use IMAGEHLP APIs to save me from messy low-level details. I ended up using the MapAndLoad API to map the DLL into memory and get a pointer to its PE header. From the PE header, it's easy to get the relative virtual address (RVA) of the export table.
While knowing the RVA of the exports section is great, an RVA isn't a real pointer. Luckily, IMAGEHLP has the ImageRvaToVa API. You pass in information about the previously mapped-in DLL and the RVA you want to translate. What comes out is a usable pointer. Once I had a usable pointer to the exports table, it wasn't that hard to walk the array of named exports. The exact format of an export table is described in WINNT.H, so I won't give the details here. However, it's worth mentioning that many of the fields in an export table are given as RVAs, so I had to make use of the ImageRvaToVa API a few more times. As the code loops through the entries in the export table, it calls the PEExportList::AddSymbol method to add each API to the DLL's list of APIs. Mission accomplished.
After loading up the two PEExportList class instances, function main flags APIs that are in both DLLs. Using the PEExportList::GetNextSymbol method, the code iterates through all the APIs in the first DLL. For each API, the code checks to see if an identically named API is also in the second DLL's list. If so, the EXPSYMINFO_MATCH flag is set in the ExportedSymbolInfo struct for that API in both lists.
The last task for function main is to iterate through both lists again. For each API that doesn't have the EXPSYMINFO_MATCH flag set, the code prints the name of the API. Once again, simple brute force is used. To make things easier, the code precedes both API enumerations by printing out the name of the DLL for which it is about to start spewing out API names.

Using PEDIFF
The comprehensive online help for PEDIFF can be obtained by running PEDIFF from the command line with no arguments. The normal use of PEDIFF is to specify the names of both files to compare on the command line. For example, on my system the beta copy of Windows NT 5.0 is on my E: drive, while Windows NT 4.0 is on my C: drive. Running


 PEDIFF C:\WINNT\SYSTEM32\USER32.DLL E:\WINNT\SYSTEM32\USER32.DLL
writes the output shown in Figure 3 to stdout. The figure shows that there is one USER32 API that was cut in Windows NT 5.0, and many new APIs added.
To make PEDIFF easier, I stole an idea from the way cool WINDIFF program that's part of the Win32
® SDK. It turns out that when you're comparing PE files, the odds are high that the DLLs you're comparing will have the same base file name (for instance, USER32.DLL). There's also a good chance that your current working directory contains one of the DLLs that you'd like to compare. With this in mind, if you specify only one file name, and if that file name contains a path, PEDIFF will compare that file to the equivalently named file in the current directory. For instance,

 PEDFIFF C:\WINNT\SYSTEM32\USER32.DLL
causes the exports of that file to be compared to USER32.DLL in the current directory.
While this shortcut syntax is great for comparing the exports of two versions of a particular DLL, it would be terribly tedious to compare the exports of hundreds of DLLs (as I need to do for my Windows NT 5.0 article research). This is where a little knowledge of the command processor comes in handy. There's a built-in command called FOR that causes a command you specify to be executed for each file matching a given filespec:

 FOR %a in (*.DLL) DO PEDIFF c:\winnt\system32\%a >> my_diffs.txt
For each file that meets the filespec in the parentheses, the command processor fills in the variable %a with the name of the file, and then executes whatever follows the DO keyword. Note also that I used >> rather than > to concatenate all PEDIFF output to a file rather than have the output file be overwritten by each execution of PEDIFF.
The finished program is under 4KB in size. I achieved a significant size reduction by using the LIBCTINY replacement runtime library from my October 1996 column. Using the standard Visual C++
® runtime library, the EXE would be about three times the size. PEDIFF is a perfect candidate for using LIBCTINY—the Visual C++ runtime library isn't used extensively, and the speed of the replacement functions like malloc and printf isn't an issue. I've included LIBCTINY.LIB in the program sources this month in case you want to build or modify PEDIFF yourself.

Have a question about programming in Windows? Send it to Matt at mpietrek@tiac.com

From the November 1997 issue of Microsoft Systems Journal.