 |
TypeRefViewer Utility Shows TypeRefs and MemberRefs in One Convenient GUI |
Matt Pietrek |
Download the code for this article: TypeRefViewer.zip 7/4/2003 UPDATE (52KB) / compatibility fix for VS.NET 7.0 RTM from wheaty.net
|
love
writing software tools. Whenever I'm learning about a new operating
system or technology, I dig through its specifications to see the sort
of details I can get. I then write tools to process, view, dissect, or
whatever. In doing so, I learn how the system works. One of the most
interesting types of tools is one that lets you see what external items a
component relies on. This month,
I'll describe just such a tool for Microsoft .NET. No, I'm not talking
about the Microsoft Intermediate Language Disassembler (ILDasm).
Instead, I've written a tool that shows you the moral equivalent of API
imports for .NET binaries. As part of this, I'll describe one of the
major functional differences between .NET's managed and unmanaged
metadata APIs. Finally, I'll show how Visual C++ with managed extensions
makes it relatively painless to mix hip, modern C# code with old school
C++ code that calls COM methods.
All About .NET Imports In
plain Win32®, an EXE or DLL that calls APIs from another DLL is said to
import the other DLL. .NET, EXEs still contain regular Win32-style
imports, since they're basically a superset of Win32 Portable Executable
(PE) files. However, it's rare for a .NET executable to import anything
more than a single API from MSCOREE.DLL. The
real action is in a .NET executable's metadata, where references to
other assemblies are stored. A .NET assembly is composed of one or more
modules. In Win32 terminology, a .NET module is a DLL. It's very common
for a single .NET DLL to be both a module and the assembly itself;
however multiple module assemblies are certainly possible. The key point
is that normal Win32 executables import DLLs directly, while .NET
executables import assemblies, which are containers for DLLs. In
Win32, all of an executable's imports are stored in a well-defined
location in the executable. The PE file header contains an offset to the
imports section. The imports section begins with an array of
structures, one structure for each imported DLL. For each imported DLL,
there are data structures that tell exactly which APIs are imported from
that DLL. (For you PE experts, these are the Import Address Table and
Import Names Table.) You can see evidence of this information by running
the executable through Depends (from the Platform SDK), or DUMPBIN
/imports. Fast forward to .NET
and its vast library of classes that do practically everything but
butter your toast. How would you see what classes and methods a given
.NET executable is using? .NET has the System.Reflection classes which
not only let you examine the extremely detailed metadata of an
executable, but also allow you to create new methods dynamically in
memory. For our purposes, we'll stick to examining metadata. So what can
you get from the reflection classes? Burrowing
into the reflection class hierarchy, it doesn't take too long to find
the Assembly.GetReferencedAssembly method. This method returns an array
of AssemblyName instances. Unfortunately, the AssemblyName class doesn't
contain any information about which types were actually used from the
referenced assembly. Even worse, there's no information about which
methods and properties of the classes were used. Surely
this information must exist somewhere. Without it, using code from
another module would be horribly expensive. Search all you want in the
.NET classes but you won't find this information. Luckily, there's an
escape hatch—the unmanaged metadata APIs. I discussed these APIs, as
well as the System.Reflection classes in my October 2000 article,
"Avoiding DLL Hell: Introducing Application Metadata in the Microsoft .NET Framework".
In a nutshell, the unmanaged metadata APIs are regular old COM
interfaces. They're more work to use than the reflection classes, but as
you'll see, they provide more information.
Seeing .NET Imports with Existing Tools ILDasm
uses the unmanaged metadata APIs. Unfortunately, it doesn't show
anything about which classes and methods are imported. There is another
Microsoft tool that uses the unmanaged APIs, and which does tell you
about imported classes and methods. Alas, Microsoft has done a good job
of hiding this tool in the current Beta 2 release. The
tool is called MetaInfo, and the source code can be found in the
Program Files\Microsoft.NET\FrameworkSDK\Tool Developers
Guide\Samples\metainfo directory. MetaInfo is the tour de force of the
unmanaged metadata APIs. Currently there's no binary supplied, so you
have to compile it. Hey, it's almost like going the open source route! Let's
say you've compiled MetaInfo and run it. How would the imported classes
and methods appear? Figure 1
shows a snippet of MetaInfo output showing two imported types with fine
imported methods. It's important to note that for each referenced
class, there's something called a TypeRef. For each imported method from
that class, there's a MemberRef. TypeRefs and MemberRefs are key
components of the unmanaged metadata APIs, and I'll show later how I
exposed them to my C# code.
 Figure 2 TypeRefViewer
While
MetaInfo is functional, it writes to stdout or a file. But people want
GUIs! ILDasm is a GUI app, but doesn't show the TypeRefs and MemberRefs.
MetaInfo does, but it's a text mode program. Sensing an opportunity, I
wrote the TypeRefViewer program shown in Figure 2. The TypeRefViewer Program When
writing TypeRefViewer, I wanted to use C#, WinForms, and the reflection
classes as much as possible. On the other hand, I needed to be able to
call the unmanaged metadata APIs. In ordinary circumstances, this
wouldn't be a big deal, since .NET makes it relatively easy to call
existing COM methods. However, if you dig a little deeper, you'll find
that the simplest form of the .NET Interop code assumes that you're
calling a COM component with a type library. The TLBIMP program takes
the COM type library and translates it into an assembly with equivalent
metadata. In my case, I was out
of luck. The unmanaged metadata APIs don't come with a type library.
While I could muck around with IDL files to synthesize a type library,
it's a pain, and oh so Twentieth Century. Or, if I was into pain, and
had lots of time to kill, I could create custom wrappers for the dozens
of methods in the unmanaged API. Not something I felt like doing. So
I chose a third route. I had existing unmanaged C++ code (the Meta
programs from the aforementioned article) that encapsulates the messy
details of using the unmanaged APIs. This code readily compiles as
managed .NET code. Just add the /CLR switch to Visual C++ and stir! It's
then trivial to write a managed class that wraps the C++ code. Compile
this into a .NET assembly, and the code is easily imported into a C#
program. The only real work was
to stuff the data from the unmanaged APIs into managed types accessible
to C# code. The beauty of metadata is that I simply declared the managed
data structures in my C++ code, and the C# code implicitly knew about
the types from the metadata. No need to declare the same structure in
both the C++ and C# code! The
basic design is as follows: I wrote the main program (TypeRefViewer) in
C#, and included a WinForms-based GUI. When the user selects an
assembly, the main program calls down to MetaDataHelper.DLL. This DLL is
a fully fledged .NET assembly, written in Visual C++ with managed
extensions. The MetaDataHelper code calls the unmanaged APIs, and stores
the results in managed data structures that are returned to the calling
C# code. As a bonus, the .NET garbage collection means that I don't
have to worry about freeing the data, since it in the managed heap.
Using TypeRefViewer Looking at Figure 2,
you'll see that the TreeView control is the focal point. In the
treeview, there are three levels. The topmost level nodes are the
namespaces imported by the executable under examination. The children of
the top-level nodes are the classes imported from the namespace.
Finally, beneath the class nodes are the class's methods that are
imported. When you first start
TypeRefViewer, you'll want to select an executable to view. This is done
with the Browse button, which brings up the familiar Open dialog. You
can also specify an executable name on the command line. At
the bottom of the TypeRefViewer window is the Export button. Clicking
it causes TypeRefViewer to write a text mode version of the information
to the file you specify. Also at the bottom is an assembly name which
changes to reflect whichever assembly the selected treeview node comes
from. TypeRefViewer Internals Let's
take a look at how TypeRefViewer is implemented. First, let's examine
MetaDataHelper.DLL, which calls the unmanaged metadata APIs. The primary
source file for this DLL is MetaDataHelper.CPP (see Figure 3). Within this file, I put all the code and data structures in the Wheaty.UnmanagedMetaDataHelper namespace. Right
inside the namespace declaration are two class definitions:
MemberRefInfo and TypeRefInfo . These are managed types (note the __gc
modifier) that contain the information returned to the calling code.
What's returned is an array of TypeRefInfo class instances, one for each
imported type. A TypeRefInfo contains the name of the imported type and
the assembly it comes from. In
addition, each TypeRefInfo contains an array of MemberRefInfo classes.
Each MemberRefInfo represents one imported method. Along with the name
of the imported method, the MemberRefInfo also has the method's
signature blob which is a very compact binary encoding of the method's
calling convention, return type, and parameters. This encoding only
includes the parameter types, not their names. It's
worth noting that the signature blob is only exposed via the unmanaged
metadata APIs. You won't see any mention of signature blobs in the
reflection APIs, even though they use the blobs internally. I included
the signature blobs in the MemberRefInfo as an aid when working with the
reflection APIs in the C# portion of the code. The gist is that because
of method overloading, a method name by itself isn't unique. The
signature blob can help narrow down the exact MethodInfo instance when
using the reflection APIs. More on this later. After
the first two class definitions is the TypeRefInfoHelper class which
contains a single method: GetTypeRefInfo. This method takes a filename
as a parameter, and attempts to read the metadata for that file. The
unmanaged metadata interfaces used are IMetaDataImport and
IMetaDataAssemblyImport, defined in cor.h. Getting hold of these
interface pointers is boilerplate code which I encapsulated into yet
another class, MetaDataImportWrapper, defined in
MetaDataImportWrapper.cpp (see Figure 4). After
creating an instance of the CMetaDataImportHelper class, the code has
IMetaDataImport and IMetaDataAssemblyImport interface instances. Using
these, a variety of methods are called (including EnumTypeRefs and
EnumMemberRefs) to pull out the desired information. All relevant
information is placed into managed MemberRefInfo and TypeRefInfo class
instances. If you examine the
MetaDataHelper code, you'll see that it looks like fairly standard C++
code that uses COM. The only unusual spots are where I needed to declare
the managed data types, and the new syntax needed to allocate instances
of these types. There are also a few lines of code that use
System::Runtime::InteropServices::Marshal methods to convert the file
name (passed as a .NET String) into a classic char * string. With MetaDataHelper.DLL covered, let's turn to the main program, TypeRefViewer, shown in Figure 5.
Most of the code at the top of the file is standard Windows Forms code
that sets up the main window and its controls. After the user selects a
file with the Browse button, the selected file name is passed to
MetaDataHelper.DLL, and an array of TypeRefInfos are obtained. The
primary task of the main program is to process each TypeRefInfo, and
store the results in the treeview control. Most
of the logic for processing the TypeRefInfos is in the
DisplayTypeRefsFromFile method. This routine attempts to obtain a Type
class for each of the TypeRefInfos. The Type class is the primary
starting point for working with metadata via the System.Reflection
classes. I used the reflection classes since they're easy and intuitive
to work with. For example, it's almost trivial to create a formatted
string representing a method's parameters. To
put what I was trying to do another way, I used the unmanaged APIs
first to get metadata information not obtainable via the reflection
classes. Then I switched to the reflection classes to finish the job.
It's in this transition where things can get messy. In
earlier versions of .NET, you could pass just about any name to
Type.GetType, and the runtime would do a decent job of locating which
assembly the Type came from. By default, starting with Beta 2 of .NET,
the runtime only searches in the calling assembly and the System
assembly (mscorlib.dll ). Thus,
if you call Type.GetType with just the string
"System.Windows.Forms.TreeView", the runtime won't find the type, even
though System.Windows.Forms.dll is in the path. To make this work
properly, you need to qualify the type with the assembly name. You can
take formal type qualification to elaborate lengths, as described in the
.NET documentation. However, for our purposes, it's usually sufficient
just to postfix the type name with the name of the enclosing assembly,
preceded by a comma. Where did I
get the assembly name from? Back when MetaDataHelper.dll was building
the TypeRefInfos for each type, it also queried for and stored away the
assembly name. When running TypeRefViewer, you may occasionally run into
a situation where the Type.GetType can't find the assembly. When this
happens, the treeview entry says "Error with .XXX", where "XXX" is the
type name. Once the Type instance
is obtained, the next job is to figure out what namespace the type
comes from. The unmanaged metadata APIs don't make it easy to figure
this out, nor do they keep the imported types sorted in any sort of
namespace order. Thus, the TypeRefViewer code uses a dictionary class
instance (called imported_namespaces) that maps a namespace to the
treeview node representing that namespace. As each TypeRefInfo is
processed, the Type information is either stored under an existing
namespace node, or a new node is created as needed. The dictionary is
just a convenience to prevent searching through all the treeview
namespace nodes whenever a new Type is added. Once
the appropriate namespace node has been identified, and an appropriate
class node created under it, the next major task is to add all the
imported members from the class. The TypeRefInfo contains an array of
MemberRefInfos, each of which contains the name of the imported method.
As a side note, you may sometimes see class nodes with no method nodes
underneath them. This is to be expected. When this occurs, the unmanaged
APIs are indicating that the class is imported, but that no methods
from that class are actually used. In
the first versions of TypeRefViewer, each method name was simply
inserted under the appropriate class node. It didn't take long before I
became dissatisfied with this approach. The .NET classes make heavy use
of overloading, so it's important to know exactly which method is being
imported. The most direct way of showing the actual method is to append
the method's parameters and return value to the method name. All that's
needed is to get the appropriate System.Reflection.MethodInfo or
System.Reflection.ConstructorInfo instance. From that, you can get a
ParameterInfo array, and format each element. The FormatParameterString
method in TypeRefViewer.CS does that. The
tough part is getting hold of the correct MethodInfo or ConstructorInfo
class. (From here on, I'll use "methods" to mean both methods and
constructors.) The normal way to get a MethodInfo is via Type.GetMethod,
which itself is overloaded. The simplest form of Type.GetMethod takes a
method name, but can return multiple MethodInfos. Which one do we want?
Other forms of Type.GetMethod let you be very specific about which
overloaded method you want. However, calling it requires that you know
in advance what all the parameters are. Catch 22! Sensing
that there had to be a better way, I thought back to how the unmanaged
metadata APIs provide a signature blob for each imported method. With
lots of work, you could take a signature blob, create the values
necessary to call Type.GetMethods, and get back the exact matching
method. However, with just a little work, you can read the signature
blob to figure out how many parameters the method takes. With this
additional knowledge, you can call the simple version of
Type.GetMethods, and filter out the ones that don't have the desired
number of parameters. In lots of cases, this is sufficient to single out
the exact MethodInfo we're looking for. If this hack doesn't succeed,
TypeRefViewer punts and displays the method name with "(???)—Overloaded"
after it. The last loose end of
TypeRefViewer is the TypeRefTreeNode node class in TypeRefTreeNode.CS
(see Figure 6).
This is a simple class derived from the .NET TreeNode class. It exists
solely to associate the node with an assembly. If you select a node in
the treeview, the bottom of the form updates to show the assembly name. Wrap-up TypeRefViewer
is a utility I cooked up for my own use. It comes in handy when trying
quickly to get an idea of how some .NET component does its magic. Sure, I
could run the component though ILDASM and sort through thousands of
instructions to see what imported methods are called. TypeRefViewer is a
much simpler way to see what's being imported. In
writing TypeRefViewer, I've shown that the managed and unmanaged
metadata APIs can be mixed in a reasonably simple manner. I've also
demonstrated mixing C# code with a managed C++ DLL, and passing
significant amounts of data between the two. I was pleased once again to
see how simple it is to use multiple languages in the same project. TypeRefViewer
packs a lot of functionality into a small amount of code. However,
there are some easily added features that I'll leave to the ambitious
reader. One such feature would be to add a search capability that would
find and highlight the tree nodes matching a search string. Another
would be to use the signature blob information to find the matching
MethodInfo more aggressively. If you do add something significant to the
code, I'd love to hear about it!
Send questions and comments for Matt to hood@microsoft.com.
| Matt Pietrek
is an independent writer, consultant, and trainer. He was the lead
architect for Compuware/NuMega's Bounds Checker product line for eight
years and has authored three books on Windows system programming. His
Web site, at http://www.wheaty.net, has a FAQ page and information on previous columns and articles.
|
|