Where
do typelibs come from? Anybody who's endured the agony of using
Interface Definition Language (IDL) probably knows the answer. The
Microsoft® IDL (MIDL)
compiler provides all sorts of facilities to completely describe
classes, interfaces, methods, constants, and so on. The input IDL code
has to be concise enough for the Remote Procedure Call (RPC) and DCOM
code in Windows® to marshal
code and data across process and machine boundaries. For a good overview
of IDL, see the Platform SDK documentation, as well as Bill
Hludzinski's article, "Understanding Interface Definition Language: A Developer's Survival Guide," in the August 1998 issue of MSJ.
An
MIDL-generated typelib is really just a binary form of the IDL. In
fact, you can work backwards from a typelib to get IDL. As the
right-hand side of Figure 1 shows, OLEVIEW recreates the IDL (minus things like comments) for any typelib it displays.
In
your everyday work it's possible to avoid using typelibs explicitly,
but they're definitely around on your system. Files with a .TLB or .OLB
(Object Library) extension are probably typelibs and can be viewed with
OLEVIEW. Typelibs can also be embedded into executables. Prime examples
are the OCXs that seem to be taking over your hard drive. When embedded
in an executable, the typelib is stored as a named resource with the
name TYPELIB.
What's
really funkadelic about typelibs is that there's a relatively simple
mechanism to grovel around in them at any level of detail. Sure, you can
examine them with the Object Viewer or OLEVIEW. However, real
programmers know that it's only fun if you can do the same sort of thing
with your own code. As you might expect, the mechanism to access
typelibs is a group of COM interfaces. The binary format of typelib
files isn't documented, but in my experience the COM interfaces have
always been sufficient to extract all the information they contain.
For
typelib work, the two COM interfaces to stamp on your forehead are
ITypeInfo and ITypeLib. There are various articles out there describing
why you might need a type library or what to do with it, but there isn't
much written about ITypeInfo or ITypeLib other than the interface
references in the Platform SDK. Languages such as Visual Basic use these
interfaces to grok the nature of external COM objects. I'm going to use
these same interfaces for a different purpose: to create symbol tables
for use in a debugger.
ITypeLib, ITypeInfo, and Friends
As
a first-time user of the typelib interfaces, where do you start? Take a
look at the LoadTypeLib API exported by OLEAUT32.DLL. It takes the name
of a typelib file (or a file containing a typelib resource) and returns
an ITypeLib interface instance. The ITypeLib interface has various bits
of information that can be obtained via its methods. More importantly,
the ITypeLib interface is a container of sorts for ITypeInfo interface
instances (see Figure 2). As you'll see in more detail later, an
ITypeInfo is how you get at the names of interface methods, parameters,
enums, structs, and so on.
Figure 2 ITypeInfo Instances
Each
typelib (and its associated ITypeLib interface instance) has a unique
GUID assigned to it. This GUID is typically found in the registry under
the HKEY_CLASSES_ROOT\TypeLib key. The ITypeLib::GetLibAttr method
allocates
and returns a pointer to a TYPEATTR structure that contains the
typelib's GUID and other assorted information. When you're done with the
TYPEATTR, you'll need to call ITypeLib::ReleaseTLibAttr to free the
memory for the TYPEATTR structure.
The
ITypeLib::GetDocumentation method retrieves things like the typelib
name, a description string, a help file lookup string, and the name of
the typelib's associated help file. These are the very same strings that
OLEVIEW uses when it generates IDL from a typelib. Passing -1 as the
first parameter to GetDocumentation gets information specific to the
typelib rather than for one of the ITypeInfo instances that it
encapsulates. In true COM fashion, the strings that GetDocumentation
returns are allocated internally, and you're responsible for freeing
them by calling SysFreeString. This is a good time to point out that all
strings used with typelibs are Unicode.
There's
a variety of other ITypeLib methods that you can explore on your own.
However, let's look at the single most important ITypeLib method,
GetTypeInfo. Each ITypeLib acts as a dispenser for ITypeInfo interface
instances. To get an ITypeInfo instance, simply call GetTypeInfo,
passing in the index of the desired ITypeInfo. What's this index thing?
Each ITypeInfo instance in an ITypeLib has an associated index value.
The index values are 0 through n-1, where n is the number of ITypeInfos.
The GetTypeInfoCount method returns the number of ITypeInfos in the
typelib. As you'll see shortly, calling GetTypeInfo repeatedly with a
monotonically increasing ITypeInfo index is all that's necessary to pick
apart a typelib.
After
getting your hands on an ITypeInfo instance, the real fun begins. Each
ITypeInfo represents something
such as an interface (TKIND_INTERFACE), a structure (TKIND_RECORD), or a
creatable object (TKIND_COCLASS). The complete list of ITypeInfo
representable items are the TKIND_XXX enums in OAIDL.H.
Like
ITypeLib, the ITypeInfo interface has a GetDocumentation method that
returns the name of the interface, a description string, a help file
lookup string, and the name of the associated help file. Passing -1 as
the memberid parameter returns information for the interface, structure,
enum, or whatever the ITypeInfo represents. Passing other values for
the memberid parameter retrieves the corresponding strings for the
desired interface's methods, an enum's string values, and so on. If
you've ever wondered how Visual Basic can call up context-sensitive
online help for third-party controls, here's your answer.
Much
more information about an ITypeInfo is obtained via the GetTypeAttr
method, which returns a pointer to a TYPEATTR structure containing all
sorts of goodies concerning the ITypeInfo. Included in this structure is
a GUID and a TYPEKIND (TKIND_XXX) indicating what the ITypeInfo
describes. If the ITypeInfo is of type TKIND_INTERFACE or
TKIND_DISPATCH, the GUID is the IID for the interface. This is the same
IID you'll see under the registry's HKEY_CLASSES_ROOT\Interface key.
Likewise, for TKIND_COCLASS ITypeInfos, the GUID is the CLSID. Most
CLSIDs are found in the registry under the HKEY_
CLASSES_ROOT\CLSID key. Note that TKIND_COCLASS ITypeInfos are
essentially what you think of as creatable objects (for instance, an
MSGraph.Chart object).
Some
TYPEATTR fields are meaningful only for specific TYPEKIND values. For
example, in a TKIND_DISPATCH the TYPEATTR.cFuncs element indicates how
many methods are associated with this class, while a nonzero cImplTypes
element indicates if the interface is derived from another interface.
For a TKIND_COCLASS ITypeInfo, however, the cImplTypes field indicates
how many interfaces the creatable object exposes directly. Typically, a
TKIND_COCLASS exposes two interfaces. The first is the incoming
interface that you think of as the object's properties and methods. The
second interface is the out interface that you associate with the object
firing events. (Note to COM geeks: think IConnectionPoint.) For
example, the Graph1_DblClick event handler subroutine you'd write in
Visual Basic is called through the out interface.
Some
other notable fields in the TYPEATTR are cVars and cbSizeInstance. For a
TKIND_ENUM, cVars indicates the number of enum values. In a
TKIND_RECORD, the cVars field indicates how many structure elements are
present, and cbSizeInstance tells you the size of the structure.
Just the Facts, Please
While
I could go on for quite a while about the ins and outs of ITypeInfos
and TYPEATTRs, let's narrow the focus to what I'll use in creating
symbol tables from typelibs. In particular, I need the names of the
interface methods exposed by a TKIND_COCLASS object. By matching these
names with their executable file addresses, I have the essential
information to create a symbol table.
How
do you obtain the method names for interfaces described by a typelib?
Interfaces are described by TKIND_
INTERFACE or TKIND_DISPATCH ITypeInfos. To get information for a
particular method, use ITypeInfo::GetFuncDesc, passing the index of the
desired method. The index values range from 0 to cFuncs-1, where cFuncs
is the value from the TYPEATTR structure. The GetFuncDesc method returns
a pointer to a FUNCDESC structure. (If there's one thing typelibs
aren't lacking, it's structures with oddly truncated names.) When you're
done with the FUNCDESC, don't forget to release its memory by calling
ITypeInfo::ReleaseFuncDesc.
A
FUNCDESC is the root node of a whole bunch of information that
completely describes a COM method. Although it's called a FUNCDESC, in
most typelibs a FUNCDESC describes COM methods rather than normal
C-callable functions. However, in the relatively few cases where you see
an ITypeInfo of type TKIND_MODULE, the FUNCDESC describes a regular
function.
Figure 3
shows the elements in a FUNCDESC. The first element is a MEMBERID,
which, for the purpose of my description, can be thought of as an
Automation dispatch ID (DISPID). The memid value can be passed to
ITypeInfo::GetNames to get the name for this method, and optionally the
name of the method's parameters (if supplied in the original IDL file).
The
lprgelemdescParam element in a FUNCDESC points at an array of ELEMDESC
structures. Each ELEMDESC represents one of the method's parameters. The
vital component of an ELEMDESC is its TYPEDESC value. To make a long
story short, a TYPEDESC corresponds to a language data type. For
example, there are TYPEDESCs for two-byte integers, four-byte integers,
BSTRs, pointers to BSTRs, user-defined structures, and just about
anything else you can imagine. The complete list of fundamental TYPEDESC
values is the VT_XXX enums from WTYPES.H.
If
you've worked with type information in symbol tables, you'll notice
that TYPEDESCs are quite similar. In particular, there are provisions
for extending the primitive types with arrays, pointers, and
user-defined types. I won't delve further into this area since it's off
my main topic. Nonetheless, it can be quite enlightening to write code
that parses TYPEDESCs. Hint: think recursion.
Moving
along in the FUNCDESC structure, you come to the INVOKEKIND element
that indicates if this is a regular method call, a property put, or a
property get. Experienced COM Automation programmers know that
properties are really just special types of method calls with an implied
effect (that is, setting or retrieving a property value). The cParams
element specifies how many parameters are in the method. Any optional
parameters are included in this count. The number of optional parameters
is given by the cParamsOpt element. Using the cParams and
lprgelemdescParam elements, you can enumerate through the ELEMDESC array
to find out everything about each of the method's parameters. Again,
this is off my main theme, so I'll leave it to the ambitious reader for
further exploration.
The
oVft element of a FUNCDESC contains the vtable offset of the method's
pointer. If vtable offsets don't make sense, it's time to hit the
remedial COM books. The oVft value is critical to the symbol table
generation code that I'll get to later.
The
elemdescFunc field indicates the return type of the method. This
ELEMDESC is treated just like the ELEMDESCs that describe the
parameters. The final FUNCDESC element, wFuncFlags, is a set of flags
indicating primarily if and how the method should be exposed. For
example, FUNCFLAG_FRESTRICTED means that this method shouldn't be
exposed by macro languages like VBScript. FUNCFLAG_FHIDDEN means that
this method shouldn't be shown in programs like the Visual Basic Object
Viewer. (Of course, if you write your own typelib code, you're free to
ignore these flags.)
The COMTypeLibDump Program
Before
plunging ahead to the symbol table generation, let's take a look at a
basic program that uses the interfaces I've described. The code isn't
anywhere as fancy as OLEVIEW or the Visual Basic Object Browser.
However, I think you'll be quite surprised by just how much of a typelib
can be cracked open with a relatively tiny amount of code.
Figure 4
contains the code for COMTypeLibDump, a console program that does a
rudimentary display of type library information in a file. The file to
display is passed as the command-line argument and can be a .TLB file,
an .OLB file, or any DLL containing a type library. The first thing to
note about the COMTypeLibDump code is that it's a Unicode-enabled
program. There's no escaping Unicode when you work with the type library
interfaces.
Other
than necessary housekeeping (such as calling Co-Initialize), the
important thing that function _tmain does is pass the file name to
DisplayTypeLib. DisplayTypeLib calls the LoadTypeLib API, which returns
an ITypeLib instance if a valid typelib file name was specified. After
passing the ITypeLib instance to the EnumTypeLib function, the code
releases the ITypeLib instance.
Inside
EnumTypeLib is where things get interesting. The function first calls
ITypeLib::GetTypeInfoCount to determine how many ITypeInfo instances can
be obtained from the typelib. Next, EnumTypeLib iterates through each
instance with a for loop. For each instance (starting with index 0), the
code calls ITypeLib::GetTypeInfo, which returns an ITypeInfo instance.
After each ITypeInfo is passed to DisplayTypeInfo, the loop code
releases the ITypeInfo instance. Each ITypeInfo created here represents
an interface, an enum, a CoClass, or one of the other TKIND_XXX types.
The
DisplayTypeInfo function begins by calling the GetDocumentation method
to retrieve the name of the interface, enum, CoClass, or whatever. Next,
the code uses GetTypeAttr to retrieve the TYPEATTR for the ITypeInfo.
This TYPEATTR is passed to EnumTypeInfoMembers, which enumerates through
each method and variable. After the call, the code releases the
TYPEATTR and the BSTR allocated by GetDocumentation.
Finally,
the EnumTypeInfoMembers function uses the cFuncs and cVars elements of
the TYPEATTR to enumerate through all the methods and variables in the
ITypeInfo. For each method, the GetFuncDesc retrieves a FUNCDESC
structure. The equivalent for variables, GetVarDesc, retrieves a VARDESC
structure. In either case, the GetDocumentation method obtains the name
of the method or variable.
Figure 5
shows an abbreviated version of the results from running COMTypeLibDump
on the Visual Basic runtime DLL (MSVBVM60.DLL). Notice that the
TKIND_ENUM ITypeInfos describe Visual Basic constants (for instance,
vbNull). Visual Basic runtime library functions such as ChDir are in a
TKIND_MODULE, in this case named FileSystem. Finally, note the interface
definition for _ErrObject (a TKIND_INTERFACE), and the corresponding
CoClass, ErrObject (a TKIND_COCLASS). Within the _ErrObject interface,
observe that certain methods such as Source appear twice. Upon closer
examination, you'll see that one of these is an INVOKE_PROPERTYGET,
while the other is an INVOKE_PROPERTYPUT. This is proof that properties
are implemented as special-purpose method calls.
All
in all, the COMTypeLibDump program doesn't hold a candle to more
sophisticated viewers such as OLEVIEW. For example, it doesn't display
method parameters or the types of variables. On the other hand,
COMTypeLibDump shows the basics of accessing typelib information in a
small amount of code. More importantly, this code acts as a starting
point for the ultimate task at hand: matching up method names with
addresses to create a symbol table.
Getting Dirty
Now
that you can extract method names from a type library, the next hurdle
is to find the method addresses. Unlike a symbol table, a typelib
contains no addresses, so it's of no direct help. However, the oVft
field in a FUNCDESC provides a big hint of what you can do instead. The
oVft field contains the offset into a vtable where you can find the
method's address.
I
won't give a full account of vtables and COM interfaces since it's
covered in every COM fundamentals text. The important thing here is that
a vtable contains addresses of interface methods. To be strictly
accurate, the preceding statement is true only for COM interfaces within
the same apartment. For out-of-process servers, the vtable points to
stubs that marshal the data across apartment, process, or machine
boundaries. However, OCXs and most other executables for which you want
to create symbol tables will run in-process.
The
$1.28 question now is, "How do I find a vtable for a given interface?"
Your knowledge of COM basics comes to the rescue here. By definition,
the first thing an interface pointer points to is a pointer to the
vtable for the interface. Thus, if you have an interface instance
pointer, you can treat the first pointer-sized slot (a DWORD in Win32®) as a pointer to the vtable.
Rephrasing
the problem, how can you get an interface instance from which you can
then extract a vtable pointer? As fate would have it, the
CoCreateInstance API creates interface instances for you. Just tell
CoCreateInstance the IID and CLSID for the interface you'd like created.
As I've shown, type libraries are chock full of CLSIDs and IIDs. Recall
that TKIND_COCLASSs are thought of as creatable objects. This is where
all that type library grunginess becomes useful.
At
a high level, the heart of my typelib-to-symbol-table program is pretty
simple: just skim through a type library looking for TKIND_COCLASSs.
For each TKIND_COCLASS, use the CLSID and IIDs as parameters to
CoCreateInstance. The result is an interface instance. Next, enumerate
through the interface methods to get their vtable offsets. The value at
the appropriate offset in the vtable is a pointer to the method in
memory. I'll describe the code that does this, but there's a loose end
to deal with first.
All
addresses in a vtable are virtual addresses. That is, they're actual
addresses at which the CPU executes code. In a symbol table, you need a
logical address. Logical addresses are values relative to where the
containing executable loaded. For example, say a DLL loads at virtual
address 0x10000000. Within the loaded DLL is a method, Foo::Bar, at
virtual address 0x10002034. The logical address would be 0x2034. The
importance of logical addresses is that they don't change, regardless of
where the operating system loads the DLL.
There
are actually two forms of logical address. What I just described is
known as a Relative Virtual Address (RVA), and is used in COFF-format
symbol tables. The second form of a logical address uses two components:
the executable file section number containing the address and the
offset within the section.
Returning
to the previous DLL example, its first section is a code section that
begins 0x1000 bytes into the executable. This first code section
encompasses its virtual address. The second form of its logical address
is 1:0x00001034. That is, section 1, offset 0x00001034. The
section:offset form of logical addressing is used in .MAP files and
CodeView® debug information.
This digression into logical addresses is necessary because you can't
just slap virtual addresses from a vtable into the generated symbol
table.
What Shall You Write?
You've
come a long way, and now there's a tough choice necessary before you
can proceed: what symbol table format do you want? The easiest thing to
do is to emit a .MAP file, which has a simple, well-defined format. An
added bonus is that .MAP files are human-readable text files. The
downside to .MAP files is that no debugger that I'm aware of reads them
directly. Thus, you'd need some way to translate the .MAP file into
something usable by a debugger such as WinDBG or the Visual Studio IDE
debugger.
Recent
excavations by archaeologists have shown that in ancient times
primitive programmers ran a program called MAPSYM that converted a .MAP
file into a .SYM file. Alas, there's not much support for .SYM files in
most 32-bit programming tools. I dug around in an old SDK and found some
IMAGEHLP.DLL source code that ostensibly converted .SYM files into
CodeView-format information. This code wasn't salvageable for my
purposes since it assumed 16-bit addresses, although .MAP files can
contain 32-bit addresses.
Knowing
that I'd need to go beyond simple .MAP files, I started looking closely
for the simplest symbol table format supported by Microsoft debuggers.
At first I thought that the COFF symbol format would be it—after all,
COFF symbols are somewhat documented in WINNT.H, and have a reasonably
simple format. However, an obscure Knowledge Base article, as well as
the IMAGEHLP sources, showed me that the Visual Studio debugger and
WinDBG don't work with COFF symbols directly.
For
my symbol-table-generation program, I didn't want to generate symbols
that would need to be converted to another format. I didn't see any good
solution short of generating CodeView information. The CodeView format
is an industry standard debug format, and is partially documented in
MSDN™. In fact, the .PDB
(Program Database) files that are used as symbol tables by Microsoft
compilers are somewhat based upon CodeView data structures.
If
you want to know more about .PDB files, let me tell you what I know.
The .PDB file format is not publicly documented and changes from version
to version of the Visual Studio tools. The APIs that Microsoft uses to
read and write .PDB files are private, and I can't answer questions
about them. The best suggestion I can offer is to use the most recent
version of IMAGEHLP.DLL since it knows how to use the private .PDB file
APIs.
Knowing
that I'd need to produce CodeView format symbols, the next question was
where to put the symbol information. I could either attach the symbols
to the appropriate executable or put them in a separate .DBG file. Since
I'm generally leery of altering a working executable, using a .DBG file
seemed the better choice, but it entails more work. It's necessary to
create the trappings of a .DBG file that encapsulate the CodeView
symbols in addition to creating the actual CodeView symbols.
Looking
at the code required to create both CodeView symbols and the
surrounding .DBG file framework, I decided that there was simply too
much material for a single article. Thus, I'm going to describe the
CodeView symbol table and .DBG file generation in this month's Under the Hood column. However, the code to create a .MAP file is much simpler, so I included it as part of this article's code.
The CoClassSyms Program
The
result of everything I've described so far is a program called
CoClassSyms. "CoClass" refers to the TKIND_COCLASS entries in the
typelib from which the symbols are created. CoClassSyms is a
command-line program that operates on executable files containing a type
library. This can be an .OCX or some other DLL such as MSHTML.DLL
(which is a core component of Microsoft Internet Explorer).
The
output from CoClassSyms is either a .MAP or .DBG file. The code
included with this article only supports .MAP file generation. However,
if you drop in the DLL from this month's Under the Hood column,
CoClassSyms generates a .DBG file instead. In either case, the output
file has the same root file name as the input executable. Thus, running
CoClassSyms on MSHTML.DLL creates MSHTML.MAP or MSHTML.DBG.
Regardless
of whether you make a .MAP or .DBG file, you'll no doubt want to get
the debugger to recognize and load the symbol information. If you
generate a .DBG file, make sure it is in the same directory as the
associated executable. In my experience, the Visual Studio 6.0 debugger
automatically loads the .DBG file as needed. Using WinDBG, I had to
explicitly load the .DBG file in the command window. I wasn't able to
get Visual Studio 5.0 to load the .DBG file, but I couldn't determine
the cause of the problem.
If
everything goes well and the debugger loads your generated .DBG file,
you should be able to set breakpoints by name on the methods. (Hint: you
may want to first generate a .MAP file to get an idea of the available
method names.) Of course, since you likely don't have source code for
the executable, you'll be in the assembly language view when the
breakpoints hit. You should also see method names in the call stack.
The CoClassSymsCallout API
To
allow for both .MAP file and .DBG file generation, the main executable,
CoClassSyms.EXE, doesn't write the output files. Instead, I defined a
set of three APIs that the CoClassSyms.EXE code calls at known points.
By implementing and exporting these APIs from DLLs, I achieved
modularity of code and allowed for enterprising readers to add support
for other symbol tables with a minimum amount of work.
The three APIs implemented by both the .MAP and .DBG file DLLs are defined in CoClassSymsCallouts.H (see Figure 6).
The first API, CoClassSymsBeginSymbolCallouts, is called once near the
beginning of the symbol table generation. Its single argument is the
name of the executable. Both .MAP files and CodeView information need to
include information such as the location and size of the code and data
sections in the executable. The implementation of
CoClassSymsBeginSymbolCallouts can use the executable name to open the
executable and read its header to get the information you want.
The
second API, CoClassSymsAddSymbol, is invoked for each symbol that is
matched up with a logical address. For this program, a symbol is just
the name of a COM method. Symbol names are of the format
Interface::MethodName (for example, IDispatch::Invoke). Note that the
only information passed to CoClassSymsAddSymbol is the symbol name and
address (in section:offset format). While I could theoretically extract
and pass along information about the parameter names and types, it would
make the code much more complex. Consider this an exercise for the
ambitious reader.
The
final callout API is CoClassSymsSymbolsFinished, which is invoked after
all symbols have been processed. This is the place where the symbol
table generation DLL can do any additional cleanup work, and presumably
close the file handles of the output file. For a .MAP file, this
includes appending the executable's entry point. In the case of the .DBG
generation DLL, my implementation writes all of the data structures
that can't be written until the number and size of the CodeView public
symbols are known.
The CoClassSyms Code
Figure 7
contains excerpts from the code for CoClassSyms.CPP. If you examine it
closely, you'll see that its structure is much the same as the earlier
COMTypeLibDump program. The main difference between the programs is that
while COMTypeLibDump displays a little about everything in a typelib,
CoClassSyms skims off just the interesting information, does a little
calculation, and ships the results off to the CoClassSymsCallout DLL.
The
first indication that CoClassSyms isn't just a typelib dumping program
is in the ProcessTypeInfo function. This function ignores every
ITypeInfo that isn't of type TKIND_
COCLASS. Remember, a TKIND_COCLASS corresponds roughly to a creatable
COM object. The key thing about TKIND_COCLASSs is that they usually just
contain references for the two primary interfaces that implement the
object. One is the incoming interface and the other is the outgoing or
event sink interface. The referenced interfaces that implement an object
are described by other ITypeInfos found elsewhere in the typelib. I
apologize if what I've said sounds a bit muddy, but that's the
terminology used to describe typelibs.
To
create a COM object instance from which you can get a vtable, you need
the IID of a referenced interface. This involves two steps. First, you
call ITypeInfo::GetRefTypeOfImplType, which returns an HREFTYPE (Handle
to REFerenced TYPE). You'll see this in the ProcessTypeInfo code.
Second, call ITypeInfo::GetRefTypeInfo, which takes the HREFTYPE as
input and returns the ITypeInfo for the referenced type as output. In my
code this occurs in the ProcessReferencedTypeInfo function. The
returned ITypeInfo will be either a TKIND_DISPATCH or a TKIND_
INTERFACE.
At
this point in the code,
I have two ITypeInfos: one for the TKIND_COCLASS, the other for the
interface that implements the object. Finally, the magic can happen. The
code calls the CoCreateInstance API to attempt to make an instance of
the
desired interface for the specified TKIND_COCLASS object. For the GUID
parameter I use the GUID for the TKIND_COCLASS. For the IID parameter I
use the IID retrieved from the TYPEATTR of the implementing interface.
Because vtable addresses for interface instances that aren't in-process
will point to marshaling stubs, they're not of much practical use.
Therefore, I made the dwClsContext parameter specify only in-process
interface instances.
If
a COM interface instance is created successfully, the code in
EnumTypeInfoMembers performs the grungy work of creating symbol names,
matching them up with a logical address, and shipping them off to the
symbol table writing APIs. The first time the function is called, it
invokes the CoClassSymsBeginSymbolCallouts API.
At
this point, EnumTypeInfoMembers creates a pointer to the vtable by
dereferencing the interface pointer obtained via CoCreateInstance. It
then enters into a loop where it uses the ITypeInfo to enumerate the
methods of the designated interface. For each method, the code
constructs a symbol name such as IFoobar::MyMethod. It also reaches into
the vtable, grabs the virtual address for the method, and converts it
into a logical form. Finally, the loop's code sends the symbol name and
address off to the CoClassSymsAddSymbol API.
The CoClassSymsMapFile DLL
The code for generating .MAP files is isolated in CoClassSymsMapFile.DLL (see Figure 8).
Earlier, I described the CoClassSymsCallout API, which consists of
three functions. CoClassSymsMapFile.DLL is a very simple implementation
of these functions, partly because its output is simple text. The other
reason it's so simple is because the information passed to the DLL's
exported APIs is in the same order as the various sections of a .MAP
file.
The
CoClassSymsBeginSymbolCallouts function writes the section information
that begins a .MAP file. This information includes the section number,
its length, its name, and whether the section is code or data. To get
these details, the function uses MapAndLoad from IMAGEHLP.DLL.
MapAndLoad returns a structure that includes a pointer to the Portable
Executable section information. The final action of
CoClassSymsBeginSymbolCallouts is to write the Publics by Value line
that indicates the end of the section information and the beginning of
the symbol information.
The
second exported API, CoClassSymsAddSymbol, is called once for each
symbol. The implementation here simply appends each symbol's information
to the end of the file using fprintf. Technically, I should have sorted
the symbols by address before writing them out to the file. However, I
haven't noticed any adverse effects from unsorted symbols. Caveat
emptor!
The
final API, CoClassSymsSymbolsFinished, could get away with just closing
the file handle used for writing the .MAP file and calling
UnMapAndLoad. However, to keep MAPSYM happy, I added a small bit of code
that retrieves the executable's entry point, converts it to a logical
address, and appends it to the .MAP file. Without the entry point line,
MAPSYM complains about "no entry point" and announces that it's assuming
0000:0100. Why that address? Would you believe that it's the predefined
entry point for .COM files, circa 1981?
Wrap-up
While
I think CoClassSyms is a decent implementation of a pretty cool
concept, it has serious limitations. For starters, it only works with
typelibs that are embedded in an executable such as an .OCX. There are
many cases where typelibs are in separate files. A prime example of this
is the Visual Basic runtime DLL. The typelib for the standard Visual
Basic controls is called VB6.OLB, but the implementation of these
interfaces is in MSVBVM60.DLL. CoClassSyms requires the typelib to be in
the executable so that it can easily match the typelib to the
implementing executable. However, it wouldn't be hard to extend my code
to let you specify the typelib and executable files separately.
Another
shortcoming of CoClassSyms is that it only sees the top-level
interfaces for a COM object—that is, interfaces that are referenced in a
TKIND_COCLASS. Put another way, if you can't use CoCreateInstance to
make an interface of the desired IID, then CoClassSyms won't know how to
get a vtable pointer. This is an especially acute limitation where a
typelib describes a global or application object with methods that
return instance pointers for other interfaces. In this scenario,
CoClassSyms might generate a symbol for a method such as
global::DataSheet, but it surely won't generate symbols for the much
more interesting DataSheet interface methods.
CoClassSyms
doesn't pick up on the event sink interfaces of COM objects. To make
this work, you'd have to muck about with connection point code to make
an instance of the event sink interface. To keep the code small and
understandable, I didn't do this.
While
CoClassSyms is far from perfect, the ability to get any sort of symbol
table where previously there was nothing should be a big boost to your
debugging capabilities. How many times has some third-party control or
library faulted, leaving you with no idea of what call was at fault? The
symbols from CoClassSyms might be enough to give you a fighting chance.
Also, don't forget to check out this month's Under the Hood column,
where I describe creating .DBG files and CodeView information.
From the March 1999 issue of Microsoft Systems Journal.
|