<-- Visual C++ 7.1 PDB Handle Leak Bug - Overview / Visual C++ 7.1 (2003) PDB Handle Leak Bug - Patch #1 Details 
1
Visual C++ 7.1 (2003) PDB Handle Leak Bug - Patch #1 Details

Date: Jul 4, 2016

DETAIL SUMMARY:
I found that Visual Studio stored the project's .PDB handle in a COM object that is refcounted many times (up to 47) when initializing the debugging session, at least in one of my test projects. The majority of the refcount increments and decrements were occurring as a result of inbound Debugging Events. Trying to figure out which refcount was the bad one or which release wasn't occurring was too monstrous a task without the actual source code. Instead, I found that when debugging under XP, there was a particular call to the object's Release() that only occurred once near the final destruction of the leaked object, causing the object's refcount to decrement from 2 to 1. Under Windows 7, the refcount in this same function was being decremented from 3 to 2. I found that if I artificially altered the refcount so that it was 1 upon exit from this function (simulating the XP behavior), Visual Studio properly released ALL handles to the objects it opened during the debugging session, including the project's .PDB file. Problem solved.

The underlying cause of the bug may be as simple as a change in the way the operating system sends debugging events back to the debugger (via the Debugging API). Perhaps a debugging event Visual Studio relied upon to release a reference to this object is no longer being sent in versions of Windows greater than XP? Who knows? In the case of a COM object leak, the underlying refcount is all that matters.

The fix I decided on was to invoke the COM object's Release() method twice if the refcount wasn't the expected value at a specific location during the teardown of the debugging session. While the fix doesn't address which AddRef/Release event was mismatched, it is certainly the next best fix. Because this patch "fixes" the refcount to match that of XP at the same point in the code, there shouldn't be any strange side effects elsewhere such as the object getting freed too early. If you apply this patch to Visual Studio running under XP or other version of Windows that didn't experience the bug, the patch has a check to leave the refcount alone (if it was already "correct"). The fix happens to be in the NatDbgDE.dll module which can be found in Visual Studio's "Common7\Packages\Debugger" directory. Simply replace the original version of this module with the patched version:

Download Patch #1 with instructions (327 k)

To ensure this patch is compatible with your installation, please ensure that you have already installed the Visual Studio.NET 7.1 (2003) Service Pack 1 and that the original NatDbgDE.dll has the following attributes:

size  708,608 bytes (692 k)
date03/19/2003 02:59 AM
version7.10.3077.0
md587f2402cae3d54478c63fe8fc6f8f72f

The patch DLL has the following attributes:

size  708,608 bytes (692 k)
md57ed7b7b74fa774b756163d9b3888a55b

MANUAL PATCHING:
If you'd like to perform the patch manually on your own version of NatDbgDE.dll, download the bytepatch tool and use the following commands to patch the two locations necessary for the fix:
bytepatch -pa 0x5473DA94 natdbgde.dll E907DB050090
bytepatch -pa 0x5479B5A0 natdbgde.dll 5589E583EC08894DFC8B018945F88B45F88B4DFC51FF50089C83F802750C9D8B45F88B4DFC51FF50089C9D89EC5DC3
NOTE: The opcodes used in the patch are described in detail below.

After manually patching, you'll probably want to update the PE checksum with peupdate using "peupdate -f NATDBGDE.DLL" or using your favorite checksum calculation/replacement tool. Since the patched bytes indirectly alter the checksum of the DLL, future versions of Windows may abort loading this DLL. Windows 7 doesn't care about a checksum mismatch so this step isn't technically necessary. If you choose to download the patched DLL, the checksum has already been updated so none of these steps are necessary.

PATCH DETAILS:
When I set out to find the culprit of this file locking behavior, I was originally looking for a strategic location to explicitly close the handle to the file within the denenv.exe process. Finding this location meant I should still try to to understand where the handle was being opened and where it was [supposed to be] being closed for optimum stability. Armed with OllyDbg, IDA and the public symbols for Visual Studio 7.1 which are still available on Microsoft's symbol servers (at least Microsoft helped in that regard), I arrived at a solution that is probably the next best thing to an official fix from Microsoft.

My investigation began with debugging under XP, to get a baseline for the behavior that was supposed to happen. I placed a breakpoint on KERNEL32.CreateFileW() to find out who was opening the handle to the PDB file that would never be closed. I found the handle was being opened at MSPDB71.103186CF which corresponded to the IStreamCRTFile::Create() function. I then set a breakpoint on all subsequent calls to CloseHandle(). When I found the call that was being passed the PDB file handle, I stepped my way back up the stack to find that it was being executed as a result of a call to the CNativeProcess::UninitSymbolHandler() function. The call stack from there to the CloseHandle() function is shown from my notes below:
CNativeProcess::Destroy()
NATDBGDE:5473D547  CALL NatDbgDE.5473DA9B ;CNativeProcess::UninitSymbolHandler() / called under Win7

    NATDBGDE:5473DAC0  CALL NatDbgDE.5473921A ;dbg::MRefcountedObject::Release() / called under Win7
    
        NATDBGDE:5473BB42  CALL DWORD PTR DS:[EAX] ;CEnc::`scalar deleting destructor'() / called under Win7
        
            NATDBGDE:5473DB1C  CALL NatDbgDE.5473DACB ;CEnc::~CEnc() / called under Win7
            
                NATDBGDE:5473DAF5  CALL DWORD PTR DS:[ECX+8] ;CSymbolHandlerX::Release() / called under Win7
                
                    ;CSymbolHandlerX::Release() - THIS FUNCTION DECREMENTS REFCOUNT (also called under Win7)
                        this is called once before debugging starts, and many times after debugging ends.
                    
                    NATDBGDE:5473D296  CALL NatDbgDE.5473D262; CSymbolHandlerX::~CSymbolHandlerX()
                    
                        NATDBGDE:5473D27B  CALL NatDbgDE.5473D1D7; SHDeleteProcess()
                        
                            (SHDeleteProcess() function NEVER called on Win7 because refcount is too high!)
                            5473D200  PUSH  EBX
                            5473D201  CALL  NatDbgDE.54738250
                            5473D206  MOV   EDI, EAX
                            5473D208  PUSH  DWORD PTR DS:[EDI+4]
                            5473D20B  CALL  NatDbgDE.5473D1AE                  ;invokes CloseHandle()
                            5473D210  AND   DWORD PTR DS:[EDI+4], 00000000
                            5473D214  PUSH  EBX
                            5473D215  PUSH  DWORD PTR DS:[ESI]
                            5473D217  CALL  NatDbgDE.5473C68F                  ;invokes CloseHandle(PDB)
                            5473D21C  PUSH  0                 
                            5473D21E  PUSH  DWORD PTR DS:[ESI]
                            5473D220  CALL  NatDbgDE.5473866C 
                            5473D225  MOV   EBX, EAX
                            5473D227  TEST  EBX, EBX
                            5473D229  JNZ   SHORT NatDbgDE.5473D200
                        
                            NATDBGDE:5473D217  CALL NatDbgDE.5473C68F ;LLFDeleteHlleFromLl()
                            
                                NATDBGDE:5473C6CC  CALL NatDbgDE.5473C6E2 ;LLDeleteHlle()
                                
                                    NATDBGDE:5473C710  CALL EAX ;call to private function
                                    
                                        NATDBGDE:5473CE92  CALL NatDbgDE.5473CD02 ;OLUnloadOmf()
                                        
                                            NATDBGDE:5473D3EE  CALL DWORD PTR DS:[NatDbgDE.5479C078] ;PDBClose()
                                            
                                                MSENC71:546D8EAD  CALL DWORD PTR DS:[EAX+28]
                                                
                                                    MSPDB71:1031CD69  CALL DWORD PTR DS:[EAX+34]
                                                    
                                                        MSPDB71:10318D1E  CALL mspdb71.103170C3
                                                        
                                                            MSPDB71:103170CF  CALL DWORD PTR DS:[ECX+8]
                                                            
                                                                MSPDB71:103176B5  CALL mspdb71.10316CD3
                                                                
                                                                    MSPDB71:10316CF3  CALL DWORD PTR DS:[EAX+0C]
                                                                    
                                                                        MSPDB71:10317B5B
                                                                        
                                                                            MSVCR71._close
                                                                            
                                                                                MSVCR71(private function 7C37421D) -> CloseHandle(PDB file handle)

                                                                                    7C374257  CALL DWORD PTR DS:[<&KERNEL32.CloseHandle>]
                                                                            
                                                                                    OLE32 -> CloseHandle(PDB file handle)
                                                                                    
                                                                                        7752AA9E  CALL DWORD PTR DS:[<&KERNEL32.CloseHandle>]

                                                                                    returns 1

                                                                                returns 1
                                                                            EAX 0
                                                                        EAX 0
                                                                    EAX 04AF0FD8
                                                                EAX 0
                                                            EAX 0
                                                        EAX 0
                                                    EAX 1
                                                EAX 1
                                            EAX 1
                                        EAX 1
                                    EAX 0
                                EAX 1
                            EAX 1
                        EAX 1
                    EAX 0
                EAX 0
            EAX 0
        EAX 40E2470
    EAX 0
EAX 0
If the refcount finally became 0 by the final call to CSymbolHandlerX::Release() in the CNativeProcess::UninitSymbolHandler() function, the SHDeleteProcess() would get executed in the call-stack shown above which ultimately released the .PDB handle. Under Windows 7, the refcount at this point in the code was 1, so SHDeleteProcess() never got executed.

Within the CNativeProcess::UninitSymbolHandler(), there were actually two calls to CSymbolHandlerX::Release(). The one shown in the stack-map above was the last and final. I decided to patch the first call instead because it was in a simpler function (actually within CSymbolHandler::DeInitialize()). At the end of this particular call to CSymbolHandlerX::Release(), the refcount needed to be 1 so that the final CSymbolHandlerX::Release() call would become zero, freeing the object. The original CSymbolHandler::DeInitialize() was a simple function that only called CSymbolHandlerX::Release():

5473DA94 8B01 MOV EAX, DWORD PTR DS:[ECX] ;EAX = dereferenced object "this" pointer 5473DA96 51 PUSH ECX ;pass "this" pointer to called function 5473DA97 FF50 08 CALL DWORD PTR DS:[EAX+8] ;call vtable function 0x54739EEA / CSymbolHandlerX::Release() 5473DA9A C3 RETN ;on XP EAX(refcount)==1 -- on Win7 EAX(refcount)==2

For the fix, we want to decrement the refcount twice (i.e. execute the CALL two times instead of one) and add a sanity check to skip the second CALL if the refcount is already correct such as if running under XP for example or if the Windows 7 behavior happens to be intermittent.

This patch involves adding code instead just replacing it. Since you can't easily increase the size of a function without altering the surrounding code (thus breaking the executable image addressing), I replaced the function's code with a single JMP instruction and referenced some unused padding bytes at the end of the PE image's .text section (mapped to 0x5479B5A0) where I built the replacement function. This technique is known as a code cave. I could have used any spot within the PE image that would be mapped to a predetermined address, however the .text section padding was ideal as we know its memory pages are already marked as executable, allowing the fix to be as simple as possible. The original function would be altered as follows to jump to the replacement function:

5473DA94 E9 07DB0500 JMP NatDbgDE.5479B5A0 ;jump to new function 5473DA99 90 NOP ;instructions below no longer execute 5473DA9A C3 RETN

The replacement function (jump target) can now be added to the unused padding at the end of the .text section:

5479B5A0 55 PUSH EBP ;create a stack frame so we can store 2 local variables 5479B5A1 89E5 MOV EBP, ESP ; and keep remaining registers pristine, because callers DO depend on them! 5479B5A3 83EC 08 SUB ESP, 8 ;allocate 2 local 32-bit variables (object ptr and dereferenced object pointer) 5479B5A6 894D FC MOV DWORD PTR SS:[EBP-4], ECX ;store object pointer as 1st local 5479B5A9 8B01 MOV EAX, DWORD PTR DS:[ECX] 5479B5AB 8945 F8 MOV DWORD PTR SS:[EBP-8], EAX ;store dereferenced object pointer as 2nd local 5479B5AE 8B45 F8 MOV EAX, DWORD PTR SS:[EBP-8] ;prepare for first call to Release() 5479B5B1 8B4D FC MOV ECX, DWORD PTR SS:[EBP-4] ; by simulating exact contents of registers 5479B5B4 51 PUSH ECX ;pass argument to Release() - this is the same as the original function's code 5479B5B5 FF50 08 CALL DWORD PTR DS:[EAX+8] ;call CSymbolHandlerX::Release() from vtable to decrement the refcount 5479B5B8 9C PUSHFD ;save post-function flags just in case a caller uses them because they'll be 5479B5B9 83F8 02 CMP EAX, 2 ; destroyed by our CMP call (safety check); EAX contains new object refcount 5479B5BC 75 0C JNE SHORT NatDbgDE.5479B5CA ;if refcount is not 2, SKIP the second Release(), such as if we are running under XP! 5479B5BE 9D POPFD ;dump the flags we saved for the first call to Release() because we don't need them 5479B5BF 8B45 F8 MOV EAX, DWORD PTR SS:[EBP-8] ;prepare for a second call to Release() 5479B5C2 8B4D FC MOV ECX, DWORD PTR SS:[EBP-4] 5479B5C5 51 PUSH ECX 5479B5C6 FF50 08 CALL DWORD PTR DS:[EAX+8] ;call CSymbolHandlerX::Release() a second time 5479B5C9 9C PUSHFD ;store the flags so the end of this function has something to pop 5479B5CA 9D POPFD ;pop the flags from the first or second calls to Release 5479B5CB 89EC MOV ESP, EBP ;restore original stack frame 5479B5CD 5D POP EBP 5479B5CE C3 RETN ;return to caller - object refcount should always be 1 by the time we get here

It would have been less code to simply decrement the refcount directly, but I wanted to leave the registers in the exact state they would have been after calling CSymbolHandlerX::Release() to minimize unintended side effects. More importantly, CSymbolHandlerX::Release() made use of a CriticalSection to ensure the decrement was threadsafe. Calling the function twice to maintain thread safety was clearly the most ideal solution. Also, the two jump instructions in the patch are relative jumps, so there should be no problem even in the instance the operating system relocates the DLL.

In summary, the changes above appear to fix the bug in all of the tests I've tried under Windows 7 and XP.
Please inform me if you have problems or questions.

 1:1