Here's a problem
nearly every C++ programmer has encountered. In your code, you've made
a call to a function in some DLL and the linker complains that it can't
find the symbol. It usually doesn't take too long to figure out that
you need to add another library (.LIB) file to the linker's command
line. The only problem is, which .LIB file?
From day one, certain formats have remained relatively constant in the Microsoft® Win32®
development tools, and many tools have sprung up around them. For
example, the Microsoft DUMPBIN utility can be used to display the
contents of both Portable Executable files and COFF (Common Object File
Format) .OBJ files. (For users of Visual Basic® 5.0, the command line
LINK dump
is functionally the same as DUMPBIN.) However, when it comes to
libraries, there seems to be a real dearth of tools that can
intelligently tell you the contents of a COFF format .LIB file. All
32-bit Microsoft tools use COFF.
Perhaps
you need to know if a function is imported by name versus ordinal
value. DUMPBIN isn't much help here. Sure, DUMPBIN has a few obscure
command options for .LIB files (/ARCHIVEMEMBERS and /LINKERMEMBER, for
example). But they just provide raw output of portions of the .LIB file.
A few gurus can cast the runes of DUMPBIN's output to figure out what
they're after. However, to really see what's in a .LIB file, you need
either a good understanding of .LIB file structures or a tool that
displays the .LIB contents in a meaningful manner. In this column, I'll
provide some relief on both counts.
While
mucking about inside .LIB files might appear forbidding, they're really
not complicated. Essentially, a .LIB file is just a collection of COFF
format .OBJ files strung sequentially together. A table of contents at
the beginning tells the linker where things are. Actually, there are two
tables of contents, but this detail isn't important for the ensuing
discussion.
In my July 1997
column, I described the basic principles of how a linker works. The
important factoid for this column is that a linker is responsible for
resolving symbols between compilation units. For example, if MyFile1.CPP
calls function FooBar in another source file, the linker has to locate
the .OBJ file containing FooBar's binary code and include it in the
finished executable. From the linker's perspective, a .LIB file is just a
collection of .OBJ files. The table of contents in a .LIB file is a
list of all the symbols from all the .OBJs contained in the library. For
each symbol, the table of contents also indicates which .OBJ file the
symbol came from. This mapping of a symbol name to an .OBJ file allows
the linker to quickly bring in just the .OBJ from the .LIB file that it
needs, while ignoring the rest of the library.
You
might be thinking, "What about import libraries? Aren't they special?"
Under the Win32 COFF format, the answer is no. The linker resolves calls
to DLL functions the same way as it does for internal (static)
functions. The only real difference is that when you call a DLL
function, the .OBJ file in the import library provides data for the
executable's import table rather than code for the actual function.
The
data that an import library provides for an imported API is kept in
several sections whose names all begin with .idata (for instance,
.idata$4, .idata$5, and .idata$6). The .idata$5 section contains a
single DWORD that, when the executable loads, contains the address of
the imported function. The .idata$6 section (if present) contains the
name of the imported function. When loading the executable into memory,
the Win32 loader uses this string to call GetProcAddress on the imported
function effectively.
As I described in the July 1997
column, the linker lumps together sections that have the same name up
to, but not including, the $. The portion after the $ is used to order
the sections. Thus, all the .idata$4 sections are put in the executable
contiguously, followed by all the .idata$5
sections, and finishing with all the .idata$6 sections. The linker's
combining and sorting of sections is what builds
the import address table (IAT) and other parts of the imports table in a
finished executable. Not surprisingly, an executable's imports table is
usually in a section that is named .idata.
If you've used OLE, COM, or ActiveX®,
you probably remember that there are also .LIB files that are used for
predefined class IDs (CLSIDs) and interface IDs (IIDs). Both CLSIDs and
IIDs are forms of GUIDs, which are 16-byte unique values. If you poke
around in one of these import libraries (for instance, UUID.LIB), you'll
see that the GUID values are stored in a section called .rdata. The
linker takes all the referenced .rdata sections in the .LIB file and
creates the .rdata section in the executable. Put differently, every
GUID that you reference in your program reserves 16 bytes in the final
executable.
The COFF .LIB File Structure
Before
I explain how a tool can provide an intelligent display of a .LIB
file's contents, it's helpful to have a basic understanding of how COFF
.LIBs are constructed. The first thing you'll need to tuck away in your
memory banks is that in COFF the words "archive" and "library" are used
interchangeably. The second tidbit to remember is that components of a
.LIB file are referred to as members. Thus, a .LIB file is really just a
series of contiguous archive members. With two exceptions that I'll get
to momentarily, each archive member corresponds to an .OBJ file.
All
COFF .LIB files begin with an 8-byte header, which reads
"!<arch>\n" when viewed as ASCII text. You can see this in WINNT.H
as the #define for IMAGE_ARCHIVE_START. Following this header is the
first of potentially many archive members. Each archive member begins
with a structure called an IMAGE_ARCHIVE_MEMBER_
HEADER, which is also defined in WINNT.H. This structure contains
information such as the member's name and size. Interestingly, one of
the strings in an archive member header is in the octal number format.
Yes, these throwbacks to computing's infancy continue to rattle around
in today's supercharged barn-burners.
The
first two archive members in a COFF .LIB file are special. Instead of
.OBJ files, they act
as a table of contents to the other archive members (that is, to the
.OBJs). These are called linker members (see the
IMAGE_ARCHIVE_LINKER_MEMBER #define in WINNT.H). These members map a
symbol name (for instance, _CreateProcessA@40) to the offset of the
archive member containing the code or data associated with that symbol.
The two special linker members both contain the same information. The
only difference is in how the symbol names are sorted.
Figure 1 shows the format of a names linker member. Following the
IMAGE_ARCHIVE_MEMBER_HEADER is a DWORD with the number of symbols in
the library. Next is an array of DWORD offsets to other archive members
in the library. Following the DWORD array is a series of null-terminated
symbol name strings. Each successive entry in the DWORD array
corresponds to the next string in the string table.
 |
Figure 1 Names Linker Member
|
 |
Figure 2 Archive Member
|
The
format of the other non-names archive members is even simpler. It's
just an archive member header, followed by an .OBJ file. If you're not
familiar with the layout of an .OBJ file, it consists of an
IMAGE_FILE_HEADER followed by one or more IMAGE_SECTION_HEADER
structures, one for each code or data section. Next comes the raw code
and data for the sections. Bringing up the rear is the symbol table,
which correlates symbol names to specific locations in the .OBJ's code
and data. All of these data structures are the same as those used in
executable files, and are described
in WINNT.H. Figure 2 shows the layout of one of these .OBJ-based archive
members.
Inside LibDump
If
you really understand everything
I just described, you could use DUMPBIN with the /ALL option to figure
out anything you might want to know about a .LIB file. For example, if
you needed to know what the import ordinal for the CreateUpDownControl
API is, you'd run DUMPBIN /ALL on COMCTL32.LIB. In the beginning of
DUMPBIN's output, you'd find the string "CreateUpDownControl". On the
same line would be the offset of the matching .OBJ file. You'd then
search the dump output for the archive member at that file offset.
Somewhere within the information for that .OBJ, you'd locate the raw
data for .idata$5, which reads:
|