When I first started writing the article on
Windows NT® 5.0 that
appears in this issue, I
didn't have much to work with other than the operating system itself.
With no SDK docs or white papers, figuring out what's new came down to
comparing the Windows NT 4.0 and 5.0 system DLLs to see what new
exported APIs were added (or possibly deleted).
In the past, I would have compared the exports from two similar DLLs like this:
- Run DUMPBIN /EXPORTS on both DLLs, redirecting the output to separate files
-
Run the SORT program on both files, redirecting the output
- Run the FC (File Compare) program on the two sorted files
The output from the FC program would consist of the APIs that had changed between the two DLLs.
Although
this process isn't horrible and can be done in under a minute, it
obviously isn't suitable when you have dozens or hundreds of DLLs to
examine. While I could have spent considerable time designing an
elaborate executable file-comparison program, there's something to be
said for just hacking out something good enough for the job at hand. No
GUI, no fancy algorithms—just grab the information and let the CPU burn
cycles to give you the results.
The
program I came up with is called PEDIFF, and I was pleasantly surprised
that I spent only an hour getting the first version to work well enough
for my own needs. Alas, there were some restrictions and limitations
that made it unsuitable for a column. Therefore, for you, the intrepid
reader, I went back in and made it more robust (and, as a by-product,
faster). The name PEDIFF is somewhat of a misnomer since it only
compares PE file exports, but the code can easily be extended to list
other PE file differences.
PEDIFF Basics
The structure of PEDIFF operations can be expressed very simply in pseudocode:
- Get the names of the two files to compare
- Load the exports for both files into two separate lists
- Walk though list 1, flagging all entries that are also in list 2
- Walk though list 1, printing out all entries not previously flagged
- Walk though list 2, printing out all entries not previously flagged
The code that implements this sequence of steps is PEDIFF.CPP (see Figure 1).
Function main begins by calling ProcessCommandLine, which parses the
command-line arguments to come up with the names of the two files to
compare. Assuming the command line is reasonable, function main creates
two instances of a class that I called PEExportList. Both PEExportList
class instances are passed separately to a helper function called
LoadExportInfo, which I'll describe later.
The PEExportList class is defined and implemented in PEExportList.H and PEExportList.CPP (see Figure 2).
Internally, the PEExportList class maintains a list (actually, an
array) of ExportedSymbolInfo structures. An ExportedSymbolInfo struct is
minimalist and really just associates an exported function name with a
set of flags:
struct ExportedSymbolInfo
{
char * m_pszName;
unsigned m_flags;
};
The three methods of the PEExportList class all work with
ExportedSymbolInfo structures. The PEExportList::
AddSymbol method adds a new entry to the end of the
list. The PEExportList::LookupSymbol method takes the name of an
exported function and returns a pointer to the ExportedSymbolInfo
previously created for it, or NULL if the symbol isn't found.
The
PEExportList::GetNextSymbol method allows for list enumeration. To start
an enumeration, pass in a NULL pointer. The method returns a pointer to
the first ExportedSymbolInfo in the list. To continue the enumeration,
pass in the ExportedSymbolInfo pointer returned by the previous call. I
won't go into the class's implementation details any further, except to
say that the list is really an array that's dynamically reallocated if
the number of entries grows beyond the current limits. Perhaps not the
most elegant implementation, but it was fast and easy to write.
Now
let's look at the LoadExportInfo code, which fills up a PEExportList
instance with API goodies to compare. In keeping with my fast,
brute-force mandate, I needed to enumerate the exported functions from a
DLL with minimum fuss. My first implementation used the IMAGEHLP
MapDebugInformation, SymLoadModule, and SymEnumerateSymbols APIs. The
underlying idea is that IMAGEHLP.DLL synthesizes a symbol table from the
exports of
an image (in the absence of debug information). The flaw with this
approach is that if you have symbol tables lying around, SymLoadModule
won't bother loading the exports. An example of having a symbol table
lying around is if you installed the .DBG files that are available for
Windows NT system DLLs.
Using
IMAGEHLP's symbol table routines ultimately didn't work out, but it did
get me thinking about other ways that I could use IMAGEHLP to get at the
exported API list. The approach I took was to read the export list
directly, but to use IMAGEHLP APIs to save me from messy low-level
details. I ended up using the MapAndLoad API to map the DLL into memory
and get a pointer to its PE header. From the PE header, it's easy to get
the relative virtual address (RVA) of the export table.
While
knowing the RVA of the exports section is great, an RVA isn't a real
pointer. Luckily, IMAGEHLP has the ImageRvaToVa API. You pass in
information about the previously mapped-in DLL and the RVA you want to
translate. What comes out is a usable pointer. Once I had a usable
pointer to the exports table, it wasn't that hard to walk the array of
named exports. The exact format of an export table is described in
WINNT.H, so I won't give the details here. However, it's worth
mentioning that many of the fields in an export table are given as RVAs,
so I had to make use of the ImageRvaToVa API a few more times. As the
code loops through the entries in the export table, it calls the
PEExportList::AddSymbol method to add each API to the DLL's list of
APIs. Mission accomplished.
After loading up the two PEExportList class instances, function main flags APIs that are in both DLLs. Using
the PEExportList::GetNextSymbol method, the code iterates through all the APIs in the first DLL. For each API,
the code checks to see if an identically named API is also in
the second DLL's list. If so, the EXPSYMINFO_MATCH flag is set in the ExportedSymbolInfo struct for that API
in both lists.
The
last task for function main is to iterate through both lists again. For
each API that doesn't have the EXPSYMINFO_MATCH flag set, the code
prints the name of the API. Once again, simple brute force is used. To
make things easier, the code precedes both API enumerations by printing
out the name of the DLL for which it is about to start spewing out API
names.
Using PEDIFF
The
comprehensive online help for PEDIFF can be obtained by running PEDIFF
from the command line with no arguments. The normal use of PEDIFF is to
specify the names of both files to compare on the command line. For
example, on my system the beta copy of Windows NT 5.0
is on my E: drive, while Windows NT 4.0 is on my C: drive. Running
|