The IMAGE_SEPARATE_DEBUG_HEADER is a condensed version of the fields found in an executable file. Figure 3
shows the fields of an IMAGE_SEPARATE_
DEBUG_HEADER and how they map to the executable's fields. The first
field (Signature) must contain the value 0x4944 to indicate that the
file is a .DBG file. If you translate 0x4944 into ASCII, you will end up
with DI (Debug Information).
Following
the IMAGE_SEPARATE_DEBUG_HEADER is an exact copy of the executable's
section table. This is just an array of IMAGE_SECTION_HEADER structures,
with one structure for each code and data section in the executable.
Between the information in the .DBG file's section table and the
IMAGE_SEPARATE_DEBUG_HEADER, most debuggers have everything they require
without needing to locate and read the executable file.
Following the .DBG file header and section table is the debug directory. This consists of an array of IMAGE_
DEBUG_DIRECTORY structures, which is the same layout used to describe debug information in
executable files. Some of the fields are meaningful, while some don't seem to be used. Figure 4 shows my interpretation of the IMAGE_DEBUG_ DIRECTORY fields.
This
ends my whirlwind tour of .DBG files. Generating a .DBG file shouldn't
be terribly hard, at least as far as creating the .DBG file
infrastructure goes. Creating the IMAGE_SEPARATE_
DEBUG_HEADER and the section table is really just a matter of copying
data out of the corresponding executable. I'll be generating only one
type of debug information, so I'll need to create and write only a
single IMAGE_DEBUG_DIRECTORY.
Things
start to get messy when you create the debug information representing
symbol names and their associated addresses. Up to this point, I've
deferred deciding what debug format to generate. However, it can't be
avoided any longer, so let's look at the issues and decide.
Which Debug Format?
Recall
that ADVAPI32.DBG had both COFF and CodeView symbols. Why two
overlapping forms of debug information? Some Microsoft tools such as the
Working Set Tuner (WST) require COFF symbols, while other tools require
CodeView symbols. In the prehistory of Win32, COFF was the only game in
town, since the early tools were written by the Windows NT team.
Eventually, the Microsoft language folks turned their focus away from
16-bit products and the CodeView format was extended for 32-bit
programming.
Of
the three possible debug formats (COFF, CodeView, and PDB), the PDB
format can be eliminated immediately. There's no documented interface to
read .PDB files directly, much less write one. For the very basic
symbol table I want to generate from CoClassSyms.EXE, it would be
easiest to generate COFF symbols since the format is relatively simple
as compared to the CodeView format.
As I
began writing CoClassSyms, my intention was to generate COFF symbols.
However, I quickly learned that the Microsoft debuggers (WinDbg and the
Visual Studio debugger) require CodeView format symbols. I briefly
flirted with the idea of writing COFF symbols and then converting them
to CodeView symbols. The Platform SDK contains the source code for a DLL
called SYMCVT.DLL, which reads COFF symbols and writes an equivalent
CodeView symbol table. (If you're curious, it's in the
\Examples\Sdktools\
Image\Symcvt directory.) However, I didn't want to rely on SYMCVT.DLL
being present on the user's system. Facing this self-imposed
restriction, my only option was to create CodeView symbols.
If
you just want to read symbols and don't care what format they're in,
consider using IMAGEHLP.DLL. It can read COFF, CodeView, and .PDB format
information. The IMAGEHLP APIs such as SymGetSymFromAddr provide a
common, abstracted layer over the different symbol table formats.
For
those of you seeking enlightenment about the .PDB format, you won't
find it here. Microsoft doesn't document the format, and it has changed
over time. The IMAGEHLP APIs are the only supported means of accessing
.PDB information.
However, it is interesting to note that .PDB information appears in the
IMAGE_DEBUG_ DIRECTORY as CodeView information, but with the NB10
signature. Unlike regular CodeView symbols, an NB10 CodeView symbol
table in an executable is simply a string containing a path to the .PDB
file. Conceptually, this is similar to .LNK shortcuts.
The CodeView Way
As
a rich symbol table format, CodeView symbols convey quite a bit of
information. Besides associating symbol names with addresses, CodeView
symbols also convey details such as user-defined types and source line
to address mappings. When pushed to its full capabilities, the CodeView
information produced by a compiler and linker is complex (to put it
mildly).
Part
of the format's complexity is because CodeView information was
originally supposed to be as small as possible. (Remember the carefree
days of the 640KB MS-DOS®
address space?) Cramming information into every spare bit means more
complexity. CodeView information is also cumbersome because the format
has evolved over many iterations of compilers and linkers. Various
tables and records are no longer generated by today's tools, yet they
remain part of the specification and need to be dealt with properly when
encountered.
Under the Specifications\Technologies and Languages node of the MSDN™
documentation, you'll find relatively up-to-date information on the
CodeView format published with recent editions of Visual C++®.
However, it's so full of details that it's hard to separate the basics
from the esoteric stuff. I'll go over just the basic pieces needed to
generate a minimal CodeView symbol table.
A
CodeView symbol table always begins with a DWORD-sized signature, which
is interpreted as ASCII text. These days, you'll usually see signatures
of either NB09 or NB11. (An NB10 signature indicates that the symbol
table is just a path to a .PDB file containing the actual symbols. I'm
not concerned with .PDB files or the NB10 signature here.) The location
of this DWORD signature in the file is known as the lfaBase. All offsets
in the CodeView information are relative to the lfaBase value. This
makes it easy to move the CodeView information to another file entirely
(such as a .DBG file), without needing to recalculate all the file
offsets stored throughout the CodeView information.
Following the initial DWORD NBxx
signature is another DWORD containing the offset to the subsection
directory. The subsection directory is a table of contents for all the
subsections found in the symbol table. A subsection contains data such
as source line information and public symbols. The subsection directory
is an array of OMFDirEntry structures, one per subsection.
The
OMFDirEntry structure is defined in CVEXEFMT.H (along with most of the
other structures I'll mention from here on). You won't find CVEXEFMT.H
in any of the standard C++ compiler include directories. Rather, on the
most recent Platform SDK I found CVEXEFMT.H in the
\Samples\Sdktools\Image\Include directory. What's more interesting is
that the file is dated 9/7/1994. There are several other .H files in
that directory that relate to CodeView symbols. Be forewarned that these
.H files are old enough that they're missing many things described in
the MSDN documentation.
Returning
to CodeView subsections, a variety of subsection types are defined.
Subsections define information such as compilation units (sstModule),
source line to address mappings (sstSrcModule), public symbols
(sstGlobalSym and sstGlobalPub), and user-defined types
(sstGlobalTypes). The subsections have a variety of formats, some of
which can be pretty contorted. Luckily, for the purpose of CoClassSyms,
you need just a few of the relatively simple sections. Even within the
few subsections my code writes, it takes some shortcuts to keep things
as simple as possible.
When
I first set out to write a symbol table, my thought was to create just
one CodeView subsection, an sstGlobalPub. This subsection would contain
nothing more than symbol names and their addresses. In other words, the
same thing you'd find in a .MAP file, albeit encoded in the proper
CodeView binary format. As it turned out, it was necessary to create two
other supporting subsections. However, the sstGlobalPub subsection is
at the heart of the bare-bones symbol table. The key point is that I
escaped the need to create complex subsections such as the types and
source line information.
In
the sstGlobalPub subsection, the code writes a series of simple records
representing the symbol to address mappings created by CoClassSyms.EXE.
For each symbol name and address pair, the code emits an S_PUB32 record.
The simple sstGlobalPub subsection created is just the header (an
OMFSymHash structure), followed by a bunch of S_PUB32 records.
The
S_PUB32 record is interpreted as a PUBSYM32 struct defined in CVINFO.H.
(CVINFO.H is buried in the same sample directory as CVEXEFMT.H). Here's
the layout of a PUBSYM32 record:
|