Recently I attended
a customer council meeting at
the lab where I work. During the lunch break I
got into a conversation with two customers, both with huge
applications. They were bemoaning the fact that it's so difficult to
figure out where their program's memory is going.
In
the Mem Usage column of the Windows NT® Task Manager, you can see that
the memory used by a process may rise or fall drastically. A common
developer problem is that an application's memory usage as shown by
TaskMan goes up, but your typical memory-tracking tools don't show
corresponding memory or resource leaks. The underlying reason for this
discrepancy is that most tools focus on heap-allocated memory. This is a
very narrow view of the world!
It's
easy for people to forget that memory consumed by a process is much
more than just its calls to malloc or new. Generally speaking, almost
all the memory used by a process falls into one of these categories:
- Executable code in a loaded module
- Read-only data in a loaded module (including resources)
- Writable memory in a module (for example, the .data section)
- Win32® heaps (including the default heap)
- Suballocated heaps (for example, from the Visual C++® runtime library)
- VirtualAlloced memory
- Memory-mapped files
- Thread stacks
- Environment
- System data structures (including the Thread Information Block and page tables)
In its
normal course of execution, Windows® pages memory in and out of the
process address space. For instance, pages making up the code and data
areas of a loaded DLL don't use any physical memory until something
references them. When a reference occurs, only the touched page is
mapped in. Likewise, when system memory gets tight, Windows can swap out
pages of code and data. In this very dynamic situation, limiting
yourself to watching heap allocations quickly leads to frustration.
MemDiff: A First Stab
Pondering
this problem, it came to me that viewing the process memory space at
the page level is a more logical approach than monitoring heaps. Thus
was born the MemDiff library. MemDiff is a crude yet effective way to
see how your process's use of memory changes between two points in your
code. Although I would have liked to have made MemDiff work on Windows
95 and Windows 98, too many APIs I needed are only available in Windows
NT.
MemDiff
consists of three simple functions. The first is MDTakeSnapshot, which
takes a process handle and returns a snapshot handle. A snapshot is a
relatively compact representation of your process's address space at the
time you call it. In the simplest usage scenario, you'll make two calls
to MDTakeSnapshot, once before the target section of code, and again
after the target code has executed.
The
second function, MDCompareSnapshot, compares the two snapshots to
report how much and where your address space has changed. In the report,
logically related pages (for instance, adjacent pages in a heap) are
lumped together. In addition, it attempts to provide a meaningful
description of the pages.
MDCompareSnapshot
writes its output to a standard Win32 file handle, which you provide.
This gives you flexibility for where the report goes. MDCompareSnapshot
has an optional verbose parameter. If it's not specified, the default is
a nonverbose report. This mode is usually easier to work with. More on
this later.
The
third function is MDFreeSnapshot. After calling MDCompareSnapshot,
you'll want to pass the snapshot handle to MDFreeSnapshot. As you'd
expect, it frees the associated memory, which can be a nontrivial
amount. A snapshot is good for only one comparison. As I'll show later,
the act of comparing two snapshots destroys some of the data. I could
have worked around this at the expense of additional memory and code
complexity. I chose the easy way out, and put safeguards in the code to
prevent erroneous results from using a snapshot more than once.
To
have the least side effects on your code, the downloadable sources
build MemDiff to a static .LIB file that you link into your application
or DLL. The .LIB file is built with the multithreaded C++ runtime
LIBCMT(D).DLL. If you want to use MemDiff from a language that doesn't
support static .LIBs (such as Visual Basic®), feel free to play with the
project settings to make the MemDiff project compile as a DLL. If you
do this, be sure to export the three functions mentioned previously.
Interpreting MemDiff Results
To create results for discussing here, I wrote a small sample program that exercises the MemDiff code. Figure 1
shows the code for MemDiffDemo.cpp, which uses all three MemDiff APIs.
Intermixed with the two MDTakeSnapshot calls, the program plays around
with heap memory, loads and unloads DLLs, VirtualAllocs memory, and
opens a memory-mapped file. The goal is to create an address space
scenario where MDCompareSnapshot has a variety of things to display.
For
output, MemDiffDemo writes to stdout. This lets me see the results in a
console window or redirect the output to a file with the >
redirection operator. In your own code, you'll probably want to use
CreateFile or some other API that returns a handle WriteFile can work
with. Heck, go nuts and use named pipes to see MemDiff results from a
process running on another machine. Who says I haven't embraced n-tier computing?
Figure 2
shows the normal, nonverbose output from running MemDiffDemo. The
first line shows the net difference in memory between the first and
second snapshots. A negative value means less memory was being used when
Snapshot2 was taken. The next line, Memory allocated, is how much newly
paged-in memory was in Snapshot2. Line three, Memory freed, tells how
much memory was present during Snapshot1, but not during Snapshot2.
The
fourth line breaks down the new memory in Snapshot2 into private and
shared amounts. Private memory is memory used solely by your process.
This includes heap memory and normal (non-shared) data sections in EXEs
or DLLs. Shared memory is potentially usable by more than just your
process. When looking for what's leaking in your application, the
private memory is usually more likely to be of concern.
Prime
examples of shared memory are code pages and read-only data sections.
Because these pages of memory don't change, the operating system can use
the CPU hardware to map the physical pages of RAM into multiple
address spaces. For example, every process uses code in NTDLL.DLL, but
the same physical pages of RAM holding NTDLL.DLL's code are shared
between all processes. Remember though, just because the memory can be
shared doesn't mean that your process isn't the only one using it.
After
the initial four lines, the remainder of the MemDiff output is an
accounting of memory that's unique to each snapshot. Although internally
MemDiff is working with a raw list of physically present pages, it
tries to coalesce related blocks. Under Snapshot2 in Figure 2,
you'll see a handful of memory blocks of different sizes. For example,
starting at address 0x00910000 are four pages of memory (16KB) that
belong to a Win32 heap. A bit later, you'll see 8KB used by USER32.DLL,
20KB used by GDI32.DLL, and so on. Stop and think about this. Various
amounts of memory are being reported for DLLs. The DLLs listed were all
loaded before Snapshot1 took place, yet Snapshot2 shows that they're
using additional memory. How can this be the case?
What
you're seeing is the demand page loading of Windows NT in action. The
code and data of an EXE or DLL aren't assigned to physical RAM until the
EXE or DLL is read from, written to, or executed. Whatever I did
between the two MemDiffDemo snapshots caused additional pages from
USER32, KERNEL32, GDI32, and NTDLL to be mapped in.
In
this case, the additional DLL pages to were mapped in because
MemDiffDemo loaded WININET.DLL, and then unloaded it. Note that no
pages from WININET.DLL show up in the report, since it's completely gone
from memory. However, whatever WININET.DLL did in its DllMain caused
pages from other system DLLs to be mapped in.
This is a great example of how your reported memory usage can go up
without you doing anything wrong.
Figure 3
shows a snippet of the verbose variation of MemDiff's results. To get
this version of the output, set the final parameter of MDCompareSnapshot
to true. The primary difference in the report is that all pages in the
coalesced blocks must be contiguous and have exactly the same page
attributes.
Since
coalesced blocks in verbose mode must have the same attribute, you'll
often see multiple blocks for a given DLL (although not in the case of Figure 3).
For instance, in a single DLL you may see one block reported for the
code pages, another block for the resources, and a third block for the
writable data section. Since all pages in a verbose report block have
the same attributes, I included the attributes in the output. Refer to
the QueryWorkingSet documentation for a description of the possible page
attributes.
The MemDiff Source
Figure 4
shows the primary guts of MemDiff. MDTakeSnapshot uses QueryWorkingSet
from PSAPI.DLL, which I've included in the download files (Nov99Hood.exe
(31KB)) in case it's not on your system. QueryWorkingSet returns an
array of DWORDs, with each DWORD representing the address and page
attributes of a page mapped into the address space. Since I don't know
in advance how much memory QueryWorkingSet needs for the entire address
space, I call VirtualAlloc in a loop until I get enough memory to
hold all the page DWORDs.
The
snapshot handle that MDTakeSnapshot returns is just a pointer to the
VirtualAlloced memory. At the beginning of the snapshot memory is a
MEMDIFF_SNAPSHOT structure. Immediately following the MEMDIFF_SNAPSHOT
is the array of DWORDs that QueryWorkingSet fills in. The
MEMDIFF_SNAPSHOT members let me verify the validity of snapshots passed
to the other MemDiff functions, determine how many pages are in the
snapshot, and perform other housekeeping duties.
The
meat of the code is called from the MDCompareSnapshot function. The
code first verifies that both snapshot handle parameters are valid.
Next, the FilterOutCommonPages function throws out all pages that
shouldn't be in the reports that follow. It does this by sorting and
comparing both snapshots. A page that's in both snapshots is
automatically thrown out. In addition, the pages holding the snapshot
data itself are thrown out. Throwing out a page means setting its value
in the snapshot array to 0, which is why a snapshot is only good for one
comparison. Feel free to improve on this quick-and-dirty algorithm.
After
filtering down the snapshot data, MDCompareSnapshot then calls the
SummaryReport and DetailedReport functions. Both functions write their
output to the file handle you designate. The summary report is the
simple stuff at the beginning of the output. It simply spins through
both snapshots, counting the different types of pages as it goes. A
little math and voilà, your summary results!
The
DetailedReportHelper function is much more complicated since it has the
onerous task of identifying where a block of pages comes from. First,
though, the function has to coalesce related ranges of pages. This is
where the fVerbose parameter to MDCompareSnapshot comes in. With
fVerbose set to false, the code lumps together all pages that have the
same allocation base as reported by VirtualQuery. With fVerbose set
to true, all pages in a coalesced block must be contiguous, have the
same attributes as reported in the QueryWorkingSet bits, and must
share the same allocation base.
Once
coalesced, DetailedReportHelper first tries to identify the block by
calling GetModuleFileName. This easily finds pages that belong to the
memory image of a loaded EXE or DLL. If the page isn't in that category,
I next check to see if the block is in one of the process Win32 heaps.
This includes the default heap (GetProcessHeap) and any heaps created by
HeapCreate. Suballocator-style heaps, which use VirtualAlloc and
partition the memory themselves, won't be detected. The current Visual
C++ new and malloc fall into this category.
The
heap identification code is written as a class (CProcessHeaps), and
implemented in ProcessHeaps.cpp and ProcessHeaps.h. This code, which can
be found at the link at the top of this article, was written to be fast
rather than excruciatingly accurate. It's possible for some blocks to
slip by and not be identified. Feel free to fix my implementation to use
the HeapWalk API at the expense of CPU time. While you're at it, you
could add code to search whatever suballocator-style heaps are
present. Happy hunting!
Meanwhile
back at the block type identification code, Marlin has a few words on
insurance. Sorry, wrong storyline! If the block isn't identified by
GetModuleFileName, or by the heap code, the options are dwindling.
PSAPI.DLL has the GetMappedFileNameA API, which tells you if the page is
from a memory-mapped file. If it's not a memory-mapped file, you're out
of luck (at least in this episode). The memory could be plain old
VirtualAlloced memory, additional stack pages, or who knows what. If you
have a good algorithm for identifying what an arbitrary page of memory
is, by all means try it out. If it works well, let me know.
MemDiff Caveats
I've
already mentioned some of the restrictions on the MemDiff library as
currently written. It's not easy to use from Visual Basic or other
languages that don't support static .LIBs. The solution is to rebuild
the code as a DLL and be aware of the side effects. And of course,
snapshots are only valid for one comparison.
Beyond
these usage restrictions, darker things lie. The big one is the
potential for misidentifying a block of memory, particularly in the
detailed report for the first snapshot. When a snapshot is taken,
MemDiff doesn't attempt to identify each block in the snapshot. To do so
would take extra time and memory. When the snapshots are compared, the
identification code only has the process state at the time of the
comparison to lean on. Since the process state can change dramatically
between snapshot and comparison time, there's plenty of opportunity for
error.
You
can easily construct scenarios where a page of memory used for one
thing at the time of a snapshot is used for something else entirely at
the time of identification (inside MDCompareSnapshot). The simplest
example is a DLL in place during Snapshot1 that unloads, then another
DLL loads in its place at the time of Snapshot2. The block
identification code is likely to identify Snapshot1 pages as belonging
to the wrong DLL. Alternatively, pages may go away entirely between
snapshots. In this case, the identification code fails completely and
reports <unknown>.
A
final note on strange results from MemDiff. You may occasionally see
pages with addresses above 2GB. Usually these addresses are something
like 0xC017F000. These are pages that Windows NT is using for storing
the process memory map. If the process memory space grows enough, the
kernel-mode code that manages the process memory space needs to allocate
another page of memory to hold page table entries. Note that the
discussion here assumes a "normal" Windows NT-based system with a
maximum user-mode address of 2GB. It's possible to boot Windows NT in
the /3GB mode where specially marked processes can use addresses up to
0xC0000000.
Some Final Words
The
MemDiff library has the potential to help in many circumstances.
However, the code is still pretty rough. Given the restrictions of time
and space, I leaned toward simplicity. I'd expect MemDiff to be tweaked
and modified by others to better report on their specific scenarios.
Among
ideas for improvement, you could identify and tag memory pages at the
time of a snapshot, look for and include compiler runtime heaps, and
identify thread stack memory. (Remember, there might be multiple threads
in the process.) If your code VirtualAllocs memory and has identifiable
patterns to the data, its easy enough to add code to look for your
specific data.
Finally,
a big thank you to Osiris Pedroso at Autodesk for helping me test
MemDiff. Aside from just making suggestions, Osiris ran MemDiff on
AutoCad itself. The output files he sent back made me knuckle down to
come up with better algorithms than I originally wrote. MemDiff is
significantly better for his help.
|