<-- Articles Index / Shellcode Part 1: How to Swallow Exceptions in Win32 Assembly
Shellcode Part 1: How to Swallow Exceptions in Win32 Assembly
Date: Aug 26, 2017
Last-Modified: Feb 14, 2018
This is the first article in a two-part series aimed at beginner to intermediate level reverse-engineers or any
programmer who wants to understand how to perform code branching using Win32's Structured Exception Handling
(SEH) facilities without relying on the Standard C Runtime library or any external functions. More precisely, we
will be using pure x86 assembly instructions to illustrate bare-metal exception handling code to comply with what the
operating system requires while at the same time, ignoring the rest of the complexities associated with the
equivalent exception-handling functionality produced by a C/C++ compiler.
If you need to understand or create your own anti-debugging/anti-disassembly tricks involving exceptions,
the information in this article may come in handy.
Therefore you're going to need some Win32 x86 assembly language experience.
Also note that these techniques will work in any 32-bit process running in either 32-bit or 64-bit versions of Windows.
SEH allows the __try/__except/__finally block features present in Microsoft's Standard C Runtime library to
capture exceptions and branch appropriately without allowing those exceptions to be seen outside of the
function blocks they are declared in. In Part Two of this series, we'll apply this knowledge towards one method
of discovering the randomly loaded location of KERNEL32.DLL so that we may dynamically access any API in the
system whether or not the function(s) have been formally imported into the current process. This will
bypass the Address Space Layout Randomization (ASLR) security feature present on post-XP versions of Windows
but will also work on any of the older NT-Based versions of Windows.
A firm grasp on how to encapsulate exceptions produced by your own code is necessary
before we move on to Part Two. Code must be able to recover
from Access Violation exceptions as regions of memory that are not mapped in to the current process are
In this article, we will also go to the effort to write position-independent code that isn't reliant on static external dependencies
or otherwise use hardcoded addresses of any kind. Also known as "shellcode", this is the type of code security professionals and malware
authors are the most interested in because it can be the most powerful due to its portability.
This will allow it to work whether compiled directly or injected into a process as it will make no assumptions about where it may be loaded in memory.
Although exception handling and writing shellcode are two separate things, we need this flexibility
for Part Two of this series where it will be used as a building-block in an injectable bootstrapping routine.
While the concepts of low-level Win32 exception handling are not new and have been discussed in several hacking
and security-related tutorials and publications over the past decade, most sample code only shows how to insert
and remove stack exception handler frames. There is little information on how that handler, once called by the
operating system, might get back to the same context the program was in prior to the exception.
This would be the assembly equivalent of branching into a C/C++ __except block
followed by cleanly exiting the block and proceeding with the remainder of the function normally. I wrote this
article not only to fill that information gap, but to share some of the things I've discovered along the way.
Almost all articles that deal with the low-level mechanics of Win32 SEH refer readers to Matt Pietrek's
legendary 1997 article,
A Crash Course on the Depths of Win32 Structured Exception Handling.
If you've never read it, I highly recommend reading at least everything prior to the "Compiler-level SEH" section.
If you have the time though, its worth reading the whole thing.
PART 1: READING ARBITRARY MEMORY LOCATIONS WITHOUT CRASHING:
We want the capability to "poke" at any location in memory that may or may not be mapped into the current process.
If an exception is generated because we've attempted to access an invalid memory location, we
don't want the hosting process to crash.
The first thing we need is a function that tells us whether or not we can READ memory at any given 32-bit address.
It would seem lucky for us that the KERNEL32.DLL function IsBadReadPtr() has the exact functionality we need.
Its prototype is:
CONST VOID *lp, // address of memory block
UINT ucb // size of block
This function has been
present in all versions of Windows NT since the early days, simply returning NONZERO if the passed memory range cannot be read,
or ZERO otherwise.
Although all of KERNEL32.DLL's functions are accessible to all Win32 processes,
shellcode can't predict where these functions will reside in memory because each time Windows boots (Vista and
up), the base load addresses for all system DLLs are randomized for security.
So we must write our own version of IsBadReadPtr() from scratch and it must be able to handle
exceptions resulting from accessing "bad" memory addresses.
The easiest way to determine if memory is accessible or not is by simply trying to read from that memory location.
If an Access Violation exception is generated, we know that memory is not accessible
and we return 1, otherwise return 0.
This is the same infamous exception that is generated anytime code attempts to access values through
a NULL pointer. Nothing could be simpler than the following C implementation using a __try/__except block:
BOOL CustomIsBadReadPtr(const DWORD* p)
//assume all memory is readable
BOOL bBad = 0;
DWORD dwDummy = *p;
//if we got here, memory was readable!
//safe to assume we got an ACCESS VIOLATION exception
bBad = 1;
The KERNEL32.DLL implementation of IsBadReadPtr() was also built using __try/__except blocks and is not
much different than my version above, except it supports reading a range of bytes.
For our purposes we'll just test that we can
read a single DWORD at the specified memory location.
While the function above seems simple enough, the assembly code generated by the Visual C++ 7.1 compiler is
This is no fault of the compiler, its just that C/C++ needs to be able to support
features like nested __try/__except blocks, object cleanup, etc. so the majority of the code
here is to comply with the Visual C++ implementation of SEH.
Besides the size of 97 bytes, the worst thing about the generated code is that it depends on
the C Runtime Library function _except_handler3(). Shellcode can't have external static dependencies for reasons already discussed, so
the code above is unusable for our purposes in its current state.
Instead we're going to create a version of the same function that will have no external dependencies.
Because we will also remove the code specific to the Visual C++ exception handling semantics, the resulting
code will be only about half the size!
The minimum we must do is register an exception handling frame within the current thread and then just perform the read operation
from the pointer passed.
To simply things, we can embed the exception handling block directly within the function no different than any conditional jump branch,
which just sets the return value to 1. The remainder of the function resets the exception handler back to its original state
and returns 0. You might write the initial version of the function like this:
0011123655PUSHEBP;set up function's stack frame001112378BECMOVEBP, ESP0011123968 55121100PUSHexception_handler (0x111255);build EXCEPTION_REGISTRATION structure on stack, first with our handler0011123E64:FF35 00000000PUSHDWORD PTR FS:; followed by previous handler in chain0011124564:8925 00000000MOVDWORD PTR FS:, ESP;install our exception_frame0011124C8B45 08MOVEAX, DWORD PTR SS:[EBP+8];eax = pointer passed as argument #10011124F8B00MOVEAX, DWORD PTR DS:[EAX];eax = *eax / dereference pointer - can we read DWORD memory?0011125133C0XOREAX, EAX;if we got here, no exception occurred, memory is readable, return zero00111253EB 03JMPSHORT cleanup (0x111258);skip past exception_handler code to cleanup and exit0011125533C0XOREAX, EAX;exception_handler entry point0011125740INCEAX;return 10011125864:8F05 00000000POPDWORD PTR FS:;cleanup; restore previous handler0011125F83C4 04ADDESP, 4;remove exception_handler from stack001112625DPOPEBP;restore caller's stack frame00111263C2 0400RETN4;remove STDCALL argument and return from function
The 48-byte code above starts by setting up a normal function stack frame, builds an
structure on the stack, and then installs that frame as the current exception handler. Under
Win32, accessing FS: returns a pointer to the first DWORD in
Thread Environment Block (TEB) which points to first exception
frame of the current thread's exception handler chain. We set our frame to be first in this list,
chaining on to original frame which is used if our handler
chooses not not to handle the exception. After all of the setup code, we load the memory location we are
testing and attempt to dereference that memory at instruction 0x11124F. If nothing happens, we simply drop to
the instruction following the dereference which zeroes EAX (our return value), jumps past exception_handler code to
the cleanup code. Cleanup consists of assigning the previous exception handling frame back to the TEB and removing our
exception frame from the stack. In the case an exception does occur at 0x11124F, which we
can safely assume will be a first-chance Access Violation exception 0xC0000005, the operating system first
notices the exception from an interrupt within kernel-mode. Kernel-mode then propagates the exception into user-mode
where our thread's EIP is changed from where the exception occurred into NTDLL.DLL code that ultimately walks
our thread's exception handler chain to search for someone to handle it. Since our handler happens to be first
in the list, our exception handling block is the first to gain control.
If we passed a readable memory address to the function above, it would return zero without any problems.
If an invalid address was passed, we'd run into a couple problems after our exception handler gained control.
The first hint that something is wrong would be returning to a seemingly random location in memory after the final RET instruction.
The first of our problems is that Win32 expects our handler to be a function
whose signature is formally:
Within an exception handler, the stack is set up so that a simple RET (without operands) will take
you back to the operating system's NTDLL.ExecuteHandler(), not the caller of the function we were
in that generated the exception.
Essentially the stack is in a different state than it was prior to the exception.
Regardless of its proximity to where the exception was generated, an exception handler will be running in a different stack context
which allow the operating system's SEH semantics to kick in. This includes providing the handler with all sorts of information
about the exception and even giving it a choice to "fix-things" and resume execution where the exception occurred.
In other words, the operating system pushed a bunch of stuff on the stack after the exception and our original function return
address is no longer aligned with the caller of our function.
Since our function's primary purpose is to set a flag that an exception was hit and return to our caller,
how can we safely break out of the handler?
HOW TO EXIT AN EXCEPTION HANDLER:
A Win32 exception handler must handle an exception in one of three ways, two of which require the handler returning a value through EAX back to the operating system's NTDLL.ExecuteHandler():
Return 0 (ExceptionContinueExecution) This tells the OS to retry executing the excepting instruction (with or without modifications to the passed
Return 1 (ExceptionContinueSearch) This tells the OS "I'm not handling that exception, try the next handler in the chain"
DON'T RETURN Just keep executing from the handler, usually to terminate the process
It is possible to accomplish our goal with choice #1 by resuming execution at a different branch within the
function (requires modification of the CONTEXT structure),
however this method has already had a lot of dicussion and isn't as lightweight. We definitely don't want choice #2 because
our code will lose control usually resulting in process termination. For our purposes, we want choice #3 so
you can see how to manually clean up all the stuff the operating system placed on the stack
and avoid returning back to the operating system. There may be other names for unwinding the stack inside an
exception handler, but I call it swallowing the exception.
Although this method is perfectly safe and super-elegant, its not exactly going to be endorsed by Microsoft.
Besides the unlikelihood of changing between versions of Windows, other side effects include smaller code and
faster execution; what's not to love?!
Swallowing an exception from the context of the handler requires a minimum of restoring the original value of
the ESP register. As long as you don't make assumptions about the contents of the other registers, you should be able to
pick up right where your function left off. Let's first discuss at least 3 reliable ways to restore ESP:
GRAB IT FROM THE CONTEXT STRUCTURE:
The handler can access a CONTEXT
structure the operating system placed on the stack as argument #3.
This structure contains the original register values as they existed prior to the exception.
The CONTEXT pointer will be at [ESP+0x0C] and the original value of the ESP register is at offset 0xC4 into this structure.
MARK THE STACK WITH A SENTINEL VALUE:
PUSH any "unnaturally" occuring DWORD pattern (e.g.: 0xBAADBEEF, 0xBADC0FEE, etc.) on the stack after
the local exception handler frame has been established. The exception handler can then unwind the stack
by POPing values off in a loop until the sentinel is encountered.
REFERENCE THE CURRENT EXCEPTION-HANDLER FRAME:
If we are in the handler for a "leaf" function (a function that doesn't call any other functions) we can
take advantage of the fact that FS: will always point at the frame that belongs to the current
handler. A side effect of this is that FS: also happens to be the value
of ESP after the exception frame was established, which is usually the value of ESP we want
Avoid this method if your function calls other functions within your established exception frame as
these functions could set up their own frames and defer back to your handler. In other words, if your
handler could ever catch a nested function's exception because the nested function's handler chose not
to handle it, your function will get the nested function's stack context and you'll surely crash.
I'm going with choice #3 because it results in the smallest most elegant solution for our simple function.
We can avoid hardcoding any values or structure offsets and also serves to teach
another point about how the operating system calls into your exception handler which we'll get in to below.
Referring back to address 0x111245 in the code shown above, notice the instruction that installed our exception
handler frame: MOV FS:,ESP.
Since that was the last thing we placed on the stack prior
to the exception, FS: will still contain the value of ESP we need to restore.
Therefore, all we need is:
MOV ESP, FS:
However there is one catch to this method that illustrates another important point about how the operating
system calls exception handlers. Just prior to invoking your handler, NTDLL.ExecuteHandler() will have installed
yet another exception handling frame in front of your frame to catch what is known as a nested exception.
This handler exists to prevent an infinite loop should an exception occur within your exception handler.
Therefore for our code to work, we must dereference the current frame's "previous" pointer to get back to our
frame which will now be second in the chain. The code should instead look like this:
Note that EBP will still not be within the context of our function but since this simple function doesn't use EBP after the exception,
we'll just allow the caller's EBP to get restored as it normally does in the function's epilogue (cleanup-code).
More complex functions that need EBP to access variables after the exception handling block completes might just restore EBP from
a saved position on the stack after both ESP is restored and the exception frame is removed. Or you might just
simplify your life by pulling both EBP and ESP from the
CONTEXT structure to kill two birds with one stone
and don't bother referencing FS:.
THE FIXED FUNCTION:
Incorporating the changes above, the fixed version of our function is now 63 bytes and it can fully encapsulate
an exception while maintaining stack integrity:
00AF127055PUSHEBP;set up function's stack frame00AF12718BECMOVEBP, ESP00AF1273E8 00000000CALL$+5 (next_instruction);get EIP of next instruction (after 5-byte CALL) - MASM-syntax00AF127858POPEAX;EAX now equals whatever memory location THIS instruction is loaded at00AF127983C0 1CADDEAX, 01Ch;apply relative offset so EAX now points to our exception-handler entry point00AF127C50PUSHEAX;build EXCEPTION_REGISTRATION structure on stack, first with our handler00AF127D64:FF35 00000000PUSHDWORD PTR FS:; followed by previous handler in chain00AF128464:8925 00000000MOVDWORD PTR FS:, ESP;install our exception_frame to be first in chain00AF128B8B45 08MOVEAX, DWORD PTR SS:[EBP+8];eax = pointer passed as argument #100AF128E8B00MOVEAX, DWORD PTR DS:[EAX];eax = *eax / dereference pointer - can we read DWORD memory?00AF129033C0XOREAX, EAX;if we got here, no exception occurred, memory is readable, return zero (eax=0)00AF1292EB 0DJMPSHORT CLEANUP_label (0000000Fh);skip past exception_handler code00AF129433C0XOREAX, EAX;exception_handler entry point00AF129640INCEAX;return 100AF129764:8B25 00000000MOVESP, DWORD PTR FS:; **** SWALLOW EXCEPTION BY ****00AF129E8B2424MOVESP, DWORD PTR SS:[ESP]; **** RESTORING ESP ****00AF12A164:8F05 00000000POPDWORD PTR FS:;CLEANUP_label - restore previous exception handler00AF12A883C4 04ADDESP, 4;remove our exception_handler from stack00AF12AB5DPOPEBP;restore caller's stack frame00AF12ACC2 0400RETN4;remove STDCALL argument and return from function
The code above is also a little larger because we've changed the 3rd instruction from pushing a hardcoded
exception handler address to dynamically-calculating the address relative to where these instructions
are currently executing in memory. Most shellcode that needs to reference
itself in a portable manner must employ some technique to find where it is loaded into memory.
So now we have a shellcode function that will safely indicate whether a particular memory address is accessible.
We'll use this function as a building-block for Part Two
of this series where we search memory for the location of KERNEL32.DLL.
If any VEH handlers (Vectored Exception Handling) were previously registered by the process,
note that a VEH chain receives exception notifications before the thread's SEH chain.
Available in Windows XP and up, VEH is primarily used for debugging purposes.
How a VEH handler chooses to react to an exception may prevent a normal exception handler from gaining control.
Please refer to
Matt Pietrek's VEH article or
for more information.
A final note worth mentioning is that while swallowing exceptions is perfect for exceptions occurring in
small "leaf" functions, you probably shouldn't do this when your exception handler could catch
exceptions originating from nested function's exception handlers who chose not to handle them.
Swallowing these exceptions short-curcuits the operating system's normal unwind semantics
that take place after a handler volunteers to handle the exception by returning back to NTDLL.ExecuteHandler().
Because we don't return, we prevent a nested handler's cleanup code from executing (destructors and __finally blocks).