|size||3.36 mb (3,523,584 bytes) |
|type||XP/Win7 32-bit Kernel-mode Driver|
|Original FLARE Author||Tom Bennett|
||IDA / Disassembler
||Exe2Aut / AutoIt3 Decompiler
||AutoIt 22.214.171.124 / AutoIt Interpreter
||VMWare Workstation / Guest copy of Windows XP
||Debugging Tools for Windows / WinDbg
||VirtualKD / Kernel Debug Virtualization Accelerator
This challenge will test your ability to step-through a driver by debugging Windows in kernel mode. You will soon
see why a debugger is necessary to reveal the solution.
A hex dump of the first few bytes of the supplied file reveals that it is a Windows PE executable image. When
it is renamed to EXE and run it however, nothing seems to happen. If this program is run under a debugger, an
IsDebuggerPresent() check at 403D57 is encountered early on, resulting in messagebox that appears prior to the
program's early termination:
If you choose to bypass this rudimentary anti-debug check either by patching the executable, adjusting the contents of EAX upon return
from IsDebuggerPresent(), or manually adjusting EIP to follow the false branch, you can figure out what is going
on provided you want to spend a good deal of time in the debugger. However, since the application has so nicely
informed us that it is a compiled AutoIt script, we might be able to save a lot of time and hassle by trying our
luck with an AutoIt decompiler. If successful, the decompiler would give us access to the script that
was previously executed so we can see what was done to the system.
Although the state of decompiling current-version AutoIt executables as of 2015 seemed to indicate a lack of
a currently maintained decompiler, the Exe2Aut tool seemed to have no problems with this particular executable.
From the Exe2Aut directory, I used the command "Exe2Aut.exe -nogui
loader.exe", resulting in the extraction of 4 files in the current directory:
07/23/2015 10:51 AM 46,080 ioctl.exe
07/28/2015 04:42 PM 2,688,640 challenge-xp.sys
07/28/2015 04:42 PM 2,689,536 challenge-7.sys
06/06/2016 02:17 PM 12,444 loader_.au3
Besides the author of Exe2Aut being a fan of Robin Hood, we can guess we will be dealing with some drivers
due to the presence of the .sys files.
The AutoIt script is the file with the .AU3 extension. An initial examination of the file in a text editor like
notepad or using the free AutoIt editor is shown below. The file begins with some constant definitions followed
by a mini-library of functions to maintain Windows services, such as _startservice(), _stopservice(),
It isn't until we get to the last screenful that we see the script's entry point where the processor
architecture and operating system version is first checked. The script requires a 32-bit version of either
Windows XP or Windows 7. The script then drops the corresponding
driver file as challenge.sys into the Windows system32 directory, or the script exits with the "Unsupported OS" error. Next,
some unknown executable with the name ioctl.exe is also dropped in the same directory.
Things start to get interesting on lines #277 and up. The last three actions performed by the script appear to be encrypted,
where gibberish-looking strings are passed to the function dothis(). We can probably guess a driver-service is
installed, but we need to know for sure. Each dothis() call is passed the
additional string "flarebearstare", so we can guess this is probably the decryption key. Because dothis() must
decrypt the commands before AutoIt can run them (via Execute()), we shouldn't have
any problems discovering those commands as long as we output the string after decrypt() has already done the
work for us.
Although we can bypass the decryption algorithm, I thought it was neat how
decrypt() injected shellcode into the CallWindowProc() API with the help of AutoIt's buffer creation facilities
as the basis for the decryption function since native AutoIt script would have been easier to reverse-engineer.
The decryption function shellcode is shown here for educational purposes and was derived from pasting the
$opcode string (line #285) into your favorite hex editor and running the resulting file through a
disassembler like IDA. I used
by just pasting
the hex opcode string directly on the command line (without the 0x prefix):
|00000000 C8 1001 00 enter 272, 0
00000004 6A 00 push 0
00000006 6A 00 push 0
00000008 53 push ebx
00000009 56 push esi
0000000A 57 push edi
0000000B 8B55 10 mov edx, dword ptr [ebp+16]
0000000E 31C9 xor ecx, ecx
00000010 89C8 mov eax, ecx
00000012 49 dec ecx
00000013 89D7 mov edi, edx
00000015 F2:AE repne scasb
00000017 48 dec eax
00000018 48 dec eax
00000019 29C8 sub eax, ecx
0000001B 8945 F0 mov dword ptr [ebp-16], eax
0000001E 85C0 test eax, eax
00000020 0F84 DC000000 je 00000102h
00000026 B9 00010000 mov ecx, 256
0000002B 88C8 mov al, cl
0000002D 2C 01 sub al, 1
0000002F 88840D EFFEFFFF mov byte ptr [ecx+ebp-273], al
00000036 E2 F3 loop short 0000002bh
00000038 8365 F4 00 and dword ptr [ebp-12], 00000000h
0000003C 8365 FC 00 and dword ptr [ebp-4], 00000000h
00000040 817D FC 00010000 cmp dword ptr [ebp-4], 256
00000047 7D 47 jge short 00000090h
00000049 8B45 FC mov eax, dword ptr [ebp-4]
0000004C 31D2 xor edx, edx
0000004E F775 F0 div dword ptr [ebp-16]
00000051 92 xchg eax, edx
00000052 0345 10 add eax, dword ptr [ebp+16]
00000055 0FB600 movzx eax, byte ptr [eax]
00000058 8B4D FC mov ecx, dword ptr [ebp-4]
0000005B 0FB68C0D F0FEFFFF movzx ecx, byte ptr [ecx+ebp-272]
00000063 01C8 add eax, ecx
00000065 0345 F4 add eax, dword ptr [ebp-12]
00000068 25 FF000000 and eax, 000000ffh
0000006D 8945 F4 mov dword ptr [ebp-12], eax
00000070 8B75 FC mov esi, dword ptr [ebp-4]
00000073 8A8435 F0FEFFFF mov al, byte ptr [esi+ebp-272]
0000007A 8B7D F4 mov edi, dword ptr [ebp-12]
0000007D 86843D F0FEFFFF xchg byte ptr [edi+ebp-272], al
00000084 888435 F0FEFFFF mov byte ptr [esi+ebp-272], al
0000008B FF45 FC inc dword ptr [ebp-4]
0000008E EB B0 jmp short 00000040h
00000090 8D9D F0FEFFFF lea ebx, [ebp-272]
00000096 31FF xor edi, edi
00000098 89FA mov edx, edi
0000009A 3955 0C cmp dword ptr [ebp+12], edx
0000009D 76 63 jbe short 00000102h
0000009F 8B85 ECFEFFFF mov eax, dword ptr [ebp-276]
000000A5 40 inc eax
000000A6 25 FF000000 and eax, 000000ffh
000000AB 8985 ECFEFFFF mov dword ptr [ebp-276], eax
000000B1 89D8 mov eax, ebx
000000B3 0385 ECFEFFFF add eax, dword ptr [ebp-276]
000000B9 0FB600 movzx eax, byte ptr [eax]
000000BC 0385 E8FEFFFF add eax, dword ptr [ebp-280]
000000C2 25 FF000000 and eax, 000000ffh
000000C7 8985 E8FEFFFF mov dword ptr [ebp-280], eax
000000CD 89DE mov esi, ebx
000000CF 03B5 ECFEFFFF add esi, dword ptr [ebp-276]
000000D5 8A06 mov al, byte ptr [esi]
000000D7 89DF mov edi, ebx
000000D9 03BD E8FEFFFF add edi, dword ptr [ebp-280]
000000DF 8607 xchg byte ptr [edi], al
000000E1 8806 mov byte ptr [esi], al
000000E3 0FB60E movzx ecx, byte ptr [esi]
000000E6 0FB607 movzx eax, byte ptr [edi]
000000E9 01C1 add ecx, eax
000000EB 81E1 FF000000 and ecx, 000000ffh
000000F1 8A840D F0FEFFFF mov al, byte ptr [ecx+ebp-272]
000000F8 8B75 08 mov esi, dword ptr [ebp+8]
000000FB 01D6 add esi, edx
000000FD 3006 xor byte ptr [esi], al
000000FF 42 inc edx
00000100 EB 98 jmp short 0000009ah
00000102 5F pop edi
00000103 5E pop esi
00000104 5B pop ebx
00000105 C9 leave
00000106 C2 1000 retn 16 |
As mentioned above, the script already knows how to decrypt the command strings.
We just need them written to the console instead of executing them, so we know what was done to the system.
After researching the AutoIt basic-like syntax, I made the following modifications:
- Get rid of all the constants and
service functions at the top as they aren't necessary for the decryption
(they actually conflicted with the WinAPI.au3 file we needed to include
for the console functionality).
Proceed to remove everything starting from and including line #1 thru line
Global $standard_rights_required = 983040
FileInstall("ioctl.exe", @SystemDir & "\ioctl.exe")
- At top of the file, add this chunk of code to initialize the console so we have a place to
dump the decrypted strings:
If Not _WinAPI_AttachConsole() Then
$ret = DllCall("Kernel32.dll", "long", "AllocConsole")
If $ret = 0 Then Exit MsgBox(0, 'EXIT', "No Console allocated!")
$hConsole = _WinAPI_GetStdHandle(1)
$hConsoleIn = _WinAPI_GetStdHandle(0)
If $hConsole = -1 Then
MsgBox(0, "Error", "GetStdHandle failed")
- Modify the dothis() function, replacing the last line "Return Execute($exe)" with the following:
_WinAPI_WriteConsole($hConsole, $exe & @CRLF)
You should end up with a modified script that looks similar to the one below:
Run the modified script through the [separately installed] AutoIt interpreter via the command-prompt. I.e.:
The 3 decrypted AutoIt commands should appear on the console:
_CreateService("", "challenge", "challenge", @SystemDir & "\challenge.sys", "", "", $SERVICE_KERNEL_DRIVER, $SERVICE_DEMAND_START)
ShellExecute(@SystemDir & "\ioctl.exe", "22E0DC")
You can gather that a driver-level service called "challenge" is being created and started which is no surprise.
A driver has to be installed in the system before it can be run. Once installed, this
mysterious ioctl.exe is being executed with the argument "22E0DC".
Refer back to the original loader_.au3 script with the definitions of the service functions if needed. These
functions did appear to do what they seemed like they would do. Although the challenge service doesn't appear
in the Services MMC console snap-in, you can verify it is installed via registry entries and that it can be
stopped and started from the command line.
If you pull up ioctl.exe in IDA, you'll find a nice little program that appears to be
unmodified from a Microsoft Visual Studio release build, as indicated by the lack of
obfuscation or alterations to impede analysis as well as the embedded PDB path
"C:\Users\Me\documents\visual studio 2010\Projects\ioctl\Release\ioctl.pdb".
As we might guess, this program probably has something to do with the challenge.sys driver.
This program appears to:
Convert the first command line in the form of a numeric hex string (base-16) argument to a DWORD;
if you don't supply at least one argument, the program crashes because it doesn't check argc first.
CreateEvent() creates an unnamed manual reset event, initialized to nonsignaled.
A handle to the driver is opened with CreateFile() (shown with numeric constants converted to the
corresponding named constants):
"\\.\challenge", //object path to driver (string in quotes as-is without backslashes escaped)
NULL, //default security descriptor
CREATE_ALWAYS|CREATE_NEW, //creation disposition
FILE_FLAG_OVERLAPPED, //flags and attributes
NULL //no template
DeviceIoControl() is called with the DWORD of the first command line argument passed as
dwIoControlCode, with no input buffer arguments. The event created earlier is also passed.
Finally, the program uses GetOverlappedResult() to wait until the driver's handler returns before
exiting. This program does not look at or output the returned buffer.
At this point, we can gather that ioctl.exe is used to communicate with the driver via numeric code. It is unclear
what the 22E0DC code means (passed to the driver from the AutoIt script) or if ioctl.exe should be
reading a response value back. Might this challenge involve fixing ioctl.exe so we can read back, for instance
whether or not our "code" was a success or failure?
Because of the name of the supplemental program (ioctl.exe), we might also guess that challenge.sys is an IOCTL driver. If you download the
Windows 7.1 DDK
you can read about IOCTL drivers and view the sample, located off of the installation directory at "src\general\ioctl\wdm".
The next logical step is to look inside the driver.
The entrypoint for a driver is the same as its PE entry point. This address, also known as the DriverEntry() is
called when Windows loads the driver and its primarily responsibilities are to fill-in a structure of pointers to
callback functions that Windows will use to communicate with the loaded driver. If we can locate the code that
populates these callback addresses, we should be able to find the address of the IOCTL code handler.
This tutorial deals with the analysis of the Windows XP version of the driver (challenge-xp.sys) instead of the one for Windows 7.
I assume the Windows 7 version of the driver conformed to the
driver model changes needed to run on Windows 7, but was otherwise identical in functionality.
When we load challenge-xp.sys in IDA, we can easily locate the DriverEntry() from the "Exports" tab at address
29EDBE. The DriverEntry() code happens to reference two addresses. The first sets some "BugCheck" values to a couple global
variables but is otherwise uninteresting. These are usually used to verify the integrity of the stack before
the function returns to the OS, but otherwise look boilerplate. The 2nd code reference in DriverEntry() jumps to 29CC90.
Within 29CC90, there are two calls to RtlInitUnicodeString()
followed by a call to a function IDA named sub_29D9E4(). Although IDA didn't recognize this function,
I gathered that it was probably a call to the IoCreateDeviceSecure() API. This was due to the arguments supplied and the call to
IoDeleteDevice() if the subsequent call to IoCreateSymbolicLink() failed. This is a common sequence for a
driver initialization routine. The call is shown below (again, with numeric constants converted to the
corresponding named constants):
"\Device\challenge", //device name (string in quotes as-is without backslashes escaped)
1, //device is Exclusive (only one handle to the device can be open at a time)
L"68", //default SDDL string (security permissions)
&guid, //registry guid 0xDDEEAAFF to override DefaultSDDLString, DeviceType, DeviceCharacteristics, and Exclusive parameters
The thing we care about however are the registered callback functions. If we look to the top of the current routine, even before any APIs are called, an array of 27 function
pointers are filled in using the same address: 29C1A0. The exception is the DriverUnload()
routine which is filled in with a different address after the loop exits.
|.text:0029CCEB 8B 55 E8 mov edx, [ebp+var_18] ;this block of code populates the registered callback functions
.text:0029CCEE 83 C2 01 add edx, 1
.text:0029CCF1 89 55 E8 mov [ebp+var_18], edx
.text:0029CCF4 83 7D E8 1B cmp [ebp+var_18], 1Bh ;loop control index >= 27?
.text:0029CCF8 7D 10 jge short loc_29CD0A ;if so, exit loop
.text:0029CCFA 8B 45 E8 mov eax, [ebp+var_18]
.text:0029CCFD 8B 4D 08 mov ecx, [ebp+arg_0]
.text:0029CD00 C7 44 81 38 A0 C1+ mov dword ptr [ecx+eax*4+38h], offset sub_29C1A0 ;initialize all DRIVER_OBJECT.MajorFunction (offset 0x38) entries to function 39C1A0
.text:0029CD08 EB E1 jmp short loc_29CCEB ;loop until done
.text:0029CD0A 8B 55 08 mov edx, [ebp+arg_0]
.text:0029CD0D C7 42 34 C0 B5 29+ mov dword ptr [edx+34h], offset sub_29B5C0 ;initialize DRIVER_OBJECT.DriverUnload (struct offset 0x34) to a different callback |
You'll find the DriverUnload() routine is as boilerplate as it is uninteresting, so we'll move on to the registered callback
that handles everything else.
We are now relatively sure that when a code is passed on the command line to ioctl.exe, the function at 29C1A0
will be called. When we look at this routine in IDA, we see a simple function that utilizes a rather large switch statement.
The function can be distilled down to this "massaged" C++ representation, which is similar to the boilerplate callbacks seen in the DDK samples. So far,
there is nothing that jumps out as out of the ordinary, except the large number of cases in the switch statement.
//challenge.sys' generic handler
int sub_29C1A0(DEVICE_OBJECT* pDevice, IRP* pIrp)
pIrp->IoStatus.Status = 0;
pIrp->IoStatus.Information = 0;
IO_STACK_LOCATION* pIrpStackLoc = IoGetCurrentIrpStackLocation(pIrp);
//only handle DeviceIoControl communication
if (pIrpStackLoc->MajorFunction == IRP_MJ_DEVICE_CONTROL)
DWORD dwSwitch = pIrpStackLoc->IoControlCode - 0x22E004
//remaining 99 cases go here
The switch statement contains a total of 100 cases, which all correspond to specific
IO control codes supported by the driver. The driver is supposed to do something in response to an IO control code.
The subtraction of the constant 0x22E004 is actually part of a compiler generated jump table
for the switch statement reflecting the lowest case value. It is important to note the compiler-generated jump table contains
400 entries due to the nature of how jump tables can be oversized to compensate for gaps between the case values.
Nevertheless,there are still only 100 cases. The gap between each case value reflects how IO control codes are structured
and usually created with the CTL_CODE macro found in the DDK:
#define CTL_CODE(DeviceType,Function,Method,Access) (((DeviceType) << 16) | ((Access) << 14) | ((Function) << 2) | (Method))
More importantly, the bits of the control codes have more embedded within them than a unique application-defined
code. As an example, here is a breakdown of the internal meaning of code 0x22E0DC that the AutoIt
installer script passed to ioctl.exe:
0 000000000100010 11 1 00000110111 00
| | | | | |__transfer type (METHOD_BUFFERED)
| | | | |____function code (in this case, 0x37)
| | | |________________custom
| | |__________________required access (FILE_READ_DATA|FILE_WRITE_DATA)
| |_____________________device type 0x22 (FILE_DEVICE_UNKNOWN)
|_____________________________________common bit (vendor-assigned)
For our purposes, the exact IO control codes don't matter as long as we can easily figure out
how they correlate to a particular branch in the switch statement. For this, we
just need to add 0x22E004 to the value of the case code and ensure we pass the result as a hex value when using ioctl.exe.
From now on, I generally refer to the case branches by the case code as reported by IDA (not the full-blown IO control codes), otherwise
known as the index into the switch table.
Let's see what IDA can tell us about case 216 which is represented by IO control code 22E0DC (0x22E0DC - 0x22E004) invoked by the AutoIt
installer script. When we view the routine called by this branch (sub_29B620) we are
faced with a giant function comprised of sections of "junk" instructions glued together with jumps.
It's clear from the first few pages it probably does
nothing useful. IDA's C-pseudocode decompiler collapses the entire function down to the equivalent of a
"return 0". Although its a good practice not to trust what the IDA decompiler reports, we can
assume it is correct for now, and go back to the main handler's switch statement. I didn't imagine the installer script
would give us the correct code so easily.
If you study the switch statement a little further, you'll notice a pattern. The majority of the
functions called by each branch, including #216 (the one we just looked at) are passed the value 0x41 (the ASCII
character "A"). Only three case branches out of 100 are not passed any arguments, so they kind of
stand out. Those branches
are 100, 260 and 352. Picking the 100 branch resulted in a function very large and
complex. IDA's graph for this function alone is shown below (zoomed to fit) where each connector line indicates
a conditional jump:
The amount of conditional jumps in this function is so abnormally large, I conclude the author must have
auto-generated it somehow. Unlike the previous obfuscated code we encountered with case #216, this one embeds
calls to more functions just like this! Amongst the obfuscated code inside these sub-functions, there is one
difference not present in the main function. These sub-functions all seem to write different bytes to different
positions within some portion of memory in the .data section. These functions have dozens and sometimes hundreds of
references to other parts of the program's code, so you can gather that the order in which they are called is
A quick glance at the other two branches from the main switch statement (#260 and #352) resulted in similar
obfuscated code that call sub-functions to write different bytes to the
same portion of memory in the .data section. The rest of the case branches appear to only call the
sub-functions just described, which are at most only responsible for writing one byte to this special target
buffer. Unfortunately, the "big 3" case branches (#100, #260 and #352) are too large to analyze further, at least not without
employing some more tools.
We are going to improve our odds of understanding "the big 3" by watching these branches run under a debugger.
This is based on the idea that we can step through the obfuscated code faster than we can
understand it, at least until we arrive at code that looks more interesting. We can also run large sections
at a time. Because IDA has features to allow it to use the WinDbg backend, we can
retain the features of IDA while using the facilities of a debugger.
Another benefit is that static analysis will can get hung up on on opcodes that may be used in multiple
instructions should we encounter any (i.e. jumping to bytes out of sync with the currently analyzed opcode flow).
When using a debugger, this is no longer a problem as the debugger will always keep the disassembler in sync with the current EIP,
even when the previous instruction goes out of sync.
Debugging a driver is a little more involved than using a user-mode debugger like OllyDbg or Visual Studio.
Since drivers run at the operating system level in kernel mode, we must debug the entire operating system using a kernel
Traditional kernel debugging uses two separate machines (one for the debugger and one for the debugee) but
luckily with the advent of virtualization software like VMWare and VirtualBox, we can use the same machine to
simultaneously run the kernel debugger and a virtualized operating system in separate processes. The two
then communicate via a named pipe. The communication between the debugger and the virtualized operating system
is rather slow, so I recommend using the free VirtualKD performance-boosting tool
which allows you to step through code as fast as if you were debugging a user-mode application. Speed
is important, especially when IDA downloads the initial kernel memory snapshot.
The instructions that follow happen to use a VMWare "guest" (the virtualized OS) running Windows XP SP3.
The "guest" OS is where our challenge.sys driver will be run, while we'll kernel debug it on the "host" OS.
I also used a Windows host operating system as I don't think VirtualKD runs on Linux.
If you haven't done so already, install the Debugging Tools for Windows on your
host and guest operating systems so both machines have WinDbg suite of tools.
During the FLARE challenge, the latest version of VirtualKD was 2.8, so that is the version used here.
Install VirtualKD on your host operating system by running the self-extracting download from their website.
There is no installation Wizard, so choose a final destination folder when prompted.
After the files have been extracted to the destination, copy the contents of the "target" directory to the guest operating system and run
"vminstall.exe" on the guest.
NOTE: There are a number of ways to get files to the guest operating system which is beyond the scope of this
tutorial. If you don't know how, search the internet for a guide.
vminstall.exe ultimately creates a boot.ini entry in your guest operating system like the one shown below:
multi(0)disk(0)rdisk(0)partition(1)\WINDOWS="XP Pro SP3 [VirtualKD]" /DEBUG /DEBUGPORT=bazis /fastdetect /noexecute=optin
Also note the debugger connection string used (as you'll need to plug this in to IDA later).
Finally reboot when prompted and you'll soon find your guest at
the boot options menu.
Select the new VirtualKD option, BUT DON'T PRESS <ENTER> YET.
Press your VMWare hotkey (such as the Windows-Key) to "ungrab" input focus from the guest, and run VirtualKD on your host
(i.e. vmmon.exe). Unless you want to use WinDbg without IDA (such as if you don't have IDA), ensure "Start debugger
automatically" is unchecked.
Once vmmon looks like it has initialized (only a couple seconds), switch back to your guest
and hit <ENTER> to begin the boot process. Launch IDA after VirtualKD indicates "yes" for the "OS"
column. NOTE: The boot process will wait indefinitely until you attach a kernel debugger, so you can take your
time setting up IDA (below).
Within IDA, select "[Go] Work on your own" and navigate to the menu "Debugger" -> "Attach" -> "WinDbg debugger"
to bring up the debugger connection dialog.
Enter the connection string used by the virtual guest OS that you took note of above, such as:
NOTE: The reconnect option is not necessary, but you can add it to provide a more stable debugger connection. For example,
if the guest OS goes into suspend-mode, IDA will be able to automatically and seamlessly reconnect to the kernel
debugging session. Otherwise your session will be "frozen" and you'll have to restart the guest OS, VirtualKD
and IDA to restart kernel debugging.
You can set various debug options in the next two dialogs as you see fit, but for Kernel Debugging,
you must at least select one of the following from "Debug options" -> "Set specific options":
- "Kernel mode debugging"
- "Kernel mode debugging with reconnect and initial break"
Select the latter if you used the "reconnect" option in your connection string or you want to set a breakpoint (such as an unresolved one)
before the operating system boots. Finally click "OK".
You might see the messagebox "The current debugger backend (windbg) does not provide memory information to IDA...".
Despite getting this warning, I had no problems loading and viewing memory under the debugger, so you can safely
A "Choose process to attach to" dialog should appear with only one entry to select from:
"0 <Kernel>" and click "OK". If you see the popup "Searching for crypto constants...", just
click the "Cancel" button. I never did figure out what this meant, but it takes a long time and never seems to
finish. At this point, your guest operating system will begin to load. If you chose "Kernel mode debugging with reconnect and
initial break", the debugger will break even before you see the OS boot logo. Press F9 to resume loading Windows.
To stop debugging, use one of the following methods:
From a breakpoint, navigate to the
"Debugger" menu and select "Detach from process". NOTE:
You can then re-attach by going to "Debugger" -> "Attach to process", selecting <Kernel>, etc.
or restarting IDA followed by "Debugger" -> "Attach" -> "WinDbg debugger".
With the debugger running, shut down the guest operating system. Then click the "Suspend"
button in IDA's popup window (you may need
to click the button twice and choose to forcibly close the connection).
Once the guest OS is shut down or detached, the IDA and VirtualKD windows can be closed.
With all the kernel debugging setup out of the way, we can finally debug this thing!
If you have not yet copied and run loader.exe on the guest operating system (to install the driver), you must do
that before continuing. If the driver has already been installed, and you've just booted, you'll need to issue
the following command so the kernel loads challenge.sys as it is not automatically loaded upon startup:
net start challenge
If you get an error, the driver is either already running or wasn't installed
(such as if you are running an unsupported version of Windows). Otherwise you might notice
a debug message appears in the kernel debugger log window that reads "Challenge Driver Loaded..". This
message originates from the challenge driver itself via the DbgPrint call after IoCreateSymbolicLink() succeeds.
On your host
machine, click the "Suspend" button in IDA to break into the debugger. Locate the "Modules" window (usually in
the right pane) and select "challenge". Note its base address (more on this later). Right-click your selection
and choose "Analyze module".
IDA pops up a confirmation dialog, which is annoying because the very condition it is trying to warn about is
being addressed by the very module analysis you have just selected.
NOTE: What IDA really means is "are you sure you want to download kernel memory" (may take a while under a
slow debugger connection) and for you to not forget to re-analyze the module if the driver is reloaded
(such as by starting and stopping the associated service) as the driver's location in memory may change.
The first time you analyze a module in a kernel debugging session, IDA will download the contents of memory from the guest
OS. This only takes about 20-30 seconds with VirtualKD and you'll see bytes being transferred in the lower left
corner of the status bar. Wait until the you see "idle" in the same area of the status bar and you'll know the
download and analysis is complete. If you are re-analyzing a
module within the same debugging session after it has been reloaded (such as by restarting the service), IDA
won't download the guest's memory again. It will update its internal addressing to reflect the new
location of the previously analyzed module.
During static analysis, the address of the function we wanted to debug (the one with the giant switch statement)
was at address 29C1A0.
Static analysis sets up the memory pointers assuming the image loads at its default/preferred base address.
When it comes to drivers, Windows rarely loads them at their preferred base address, so our address 0x29C1A0
needs to be converted. First, you must subtract the default base address (0x10000) to obtain the
pure offset (known as an RVA in the PE specification). Then simply add the current base address shown
in the debugger to this
offset and that's the pointer we can use. The formula is:
<static_analysis_address> - <default_base> + <current_base>
One nice thing about WinDbg syntax is that we can mostly use RVAs combined with the module's name and let the
debugger do the calculation for us:
While the WinDbg command window supports this syntax, unfortunately IDA does not; which is a little weird
because we're using both at the same time. If you want to jump to an addresses in the disassembly or
hex-dump windows using the "g" key for example, you must manually calculate the address you specify by always
taking into account the current module's base address.
Because the RVA offsets don't change between driver reloads, this is the address syntax we'll use when possible
for the remainder of the tutorial.
Another important thing to remember is that IDA doesn't interpret base-16 (hex) numbers unless prefixed with "0x", whereas
WinDbg defaults to base-16 numbers. Its a good idea to get in the habit of prefixing all of the numbers
and addresses that you intend on being hex with "0x" to ensure they are interpreted properly regardless of which
window you are typing in.
Using the WinDbg command window (lower left corner), we'll set a breakpoint using the simplified RVA module offset syntax to specify
our callback handler (gathered above):
Now run the debugger by hitting the F9 key. The guest machine should "unfreeze" and become usable once
again. On the guest machine, open a command prompt if one is not already open and navigate to the system32
directory where ioctl.exe was placed by the installer script. Let's simulate the same command the AutoIt installer script passed to the
driver by running:
The debugger should immediate break on the driver's main callback function. Switch to it, and begin
stepping-over instructions using the F8 key. Since this handler is used for all of the driver's events,
the handler will get called on each of these events:
IRP_MJ_DEVICE_CONTROL (0e) <--- this event is the one we want
The check to ensure that the handler is dealing with an IRP_MJ_DEVICE_CONTROL event is the
"cmp [ebp+var_1C], 0Eh" instruction a couple lines above the jmp. Just hit F9 to run past the events you don't
When finally arriving in the case branch for #216, step-into (F7) the call at challenge+0x28C605.
Stepping through this function
happens to be a lot quicker than statically analyzing it as it ends only after a couple of jumps. The debugger
helps us to quickly see how the constants are moved around and used in the conditional jumps that turn out to be
unconditional. The majority of the function never used. The conclusion is that this function doesn't do anything useful, the same as what IDA's
pseudocode reported to us previously.
We'll now focus our attention on switch case #100 as it is the first of the "big 3"
character-building sequence branches. We'll set a breakpoint on
challenge+0x28C3DA and run the debugger. We can then hit the breakpoint by running "ioctl.exe 22E068" on the
guest. Stepping over most of the obfuscated code in the main function and into a few of the sub-functions,
we can see that bytes are being built in a section of memory between challenge+0x28D840 and challenge+0x28D8B8
(0x78 bytes) in no particular order. There are gaps they are never filled-in and certain locations
where different bytes are repeatedly set.
The sub-functions setting these bytes are significantly shorter, however they also comprise a large amount of
obfuscated code. Its not until we step through a whopping 1391 bytes
of instructions in the main function that we arrive at the first conditional jump (which is in fact
unconditional due to its semantics).
After the jump we step through another large
section of obfuscated code that shares the same traits as the last.
Its not until the 9th conditional jump (where the previous conditional jumps were either unconditionally taken
or unconditionally skipped) that
we hit a jump-sled of 4 consecutive jmp's finally dumping us into small chunk of critical-looking code before returning. By the time we get here, we
are already viewing (in one of IDA's hex windows) the section of memory that was built by the
various sub-functions. When we step over the CALL
at challenge+0x9D0B1, a chunk of 40 bytes between challenge+0x28D890-0x28D8B8 are transformed into other random
looking characters. One could guess that if the right characters were already in the target memory section,
this routine might properly decrypt them.
|challenge:B1F9D0A4 8D 4D D0 lea ecx, [ebp+var_30]
challenge:B1F9D0A7 51 push ecx
challenge:B1F9D0A8 8D 55 C4 lea edx, [ebp+var_3C]
challenge:B1F9D0AB 52 push edx
challenge:B1F9D0AC 68 90 D8 18 B2 push offset byte_B218D890 ;pointer to start of memory to be modified
challenge:B1F9D0B1 E8 BA 34 F6 FF call sub_B1F00570 ;decryption routine?
challenge:B1F9D0B6 8B E5 mov esp, ebp
challenge:B1F9D0B8 5D pop ebp
challenge:B1F9D0B9 C3 retn |
Altering the parameters passed to the routine to process a larger section of memory didn't produce a meaningful result.
A similar investigation into the two other switch case branches (#260 and #352) are like #100. Large blocks of
obfuscated code separated by conditional-looking jumps with interspersed calls to sub-functions that set bytes
in the same target memory section. Both of the branches end with a jump-sled to a code block that calls the same
decryption routine, but on a different portion of the same section of memory. Also like case #100, these decryption
calls do not produce meaningful results. Here is a "massaged" C++ representation of what decryption routine is doing:
DWORD funcCrypto(BYTE* pInOutBuffer, DWORD* pDwBufSize, char* pszHexValues)
//loop count is buffersize/8 because we're doing 8 byte chunks per iteration
DWORD uChunksToProcess = *pDwBufSize >> 3;
for (DWORD i = 0; i < uChunksToProcess; ++i )
DWORD* pdw8ByteChunk = (DWORD*)(pInOutBuffer+(i*8));
//CALLed inner function (inlined here)
DWORD dw1 = *pdw8ByteChunk; //get first dword
DWORD dw2 = *(pdw8ByteChunk+1); //get next dword
int runVal = 0xC6EF3720;
for (uint i = 0; i < 0x20; ++i ) // loop 32 times
dw2 -= (*(DWORD*)(pszHexValues + 12) + (dw1 >> 5)) ^ (runVal + dw1) ^ (*(DWORD*)(pszHexValues + 8) + 16 * dw1);
dw1 -= (*(DWORD*)(pszHexValues + 4) + (dw2 >> 5)) ^ (runVal + dw2) ^ (*(DWORD*)pszHexValues + 16 * dw2);
runVal += 0x61C88647;
*pdw8ByteChunk = dw1; //store modified dwords back to memory
*(pdw8ByteChunk + 1) = dw2;
//last dword processed is passed back as running varuable
*pDwBufSize = *(DWORD*)(pInOutBuffer + 8 * uChunksToProcess - 4);
The functions called from the remaining 96 switch cases were the same sub-functions referenced dozens (and
sometimes hundreds) of times by the giant #100, #260 and #352 sequencing branches that
ultimately attempted a decryption of the buffer at the end. This led me to believe I might need to call
some or all of these switch cases prior to one (or all) of the "big 3" branches.
One interesting property I noticed is that the contents of the memory section being built by the various
branches was being retained between ioctl.exe calls as long as the driver remained loaded.
The memory would be reset to
zeros once the service was restarted, so it got me thinking that I could restart the service in between
any number of switch case combinations as long as I ended on one of the "big 3"
branches responsible for calling the decryption routine.
I wrote my own beefed up version of ioctl.exe to call the various case branches in different
combinations. My version was also modified to pass an input buffer and output any results the driver
might send back (in case the supplied ioctl.exe was intentionally broken). Because there were too many switch branches to try all possible combinations,
I figured I'd try the common ones, such as all combinations of #100, #260 and #352. Then I tried all of the branches in
sequence, reversed, all excluding the "big 3" (then jumping to the decryption block), then that reversed, and so on.
Since the installer-run case branch (#216) was one of the first subroutines determined not to modify
the target memory section, I used it as a breakpoint branch where I didn't already have a convenient place to
break into the debugger at the end of the sequence.
I'll spare you the gory details of the wild goose chase I pursued for a couple of days. That memory section just
didn't seem to want to decrypt to anything meaningful. One of the biggest problems I noticed
is that the "big 3" branches destroyed most of what was placed into the buffer by the other switch cases by the
time the decryption routine was executed.
I shifted my focus to the individual sub-functions that were responsible for writing bytes to the target memory section
using IDA's handy cross-referencing feature (CTRL+x).
Clearly many of these function calls were buried in branches that would never execute, as the
.text section for this driver is a whopping 2.5 MB of mostly garbage instructions and fake conditional branches.
Many byte positions were only modified by one sub-function, so those were easy. The other positions were either
unmodified by any sub-function or modified by two or more.
Around the time I was thinking about how to determine the correct byte for those memory locations that were
modified by multiple sub-functions I decided to focus back on the "big 3" thinking that the correct sequence
must already be there. Maybe if I were to force a certain critical branch to execute that otherwise would not
have, that memory section might get initialized with the proper byte sequence.
Starting with case branch #100, I forced the first branch at challenge+0x1CCCF to skip the jump that would have
always been taken. This is done by pressing CTRL-n while highlighting the instruction you want to set EIP to. I
then ran the remainder of the normal code paths down to the decryption routine. Nothing meaningful was
decrypted. Following this technique, I forced the condition just described, and additionally forced the next
deeper jump condition (challenge+0x1D0EE made a jump that would have otherwise never been taken). To do this
you double-click on the jump address to bring the target in view and press CTRL-n to set the EIP. Still no
dice. Not long after, I found that if the first 4 jumps are allowed to execute normally, but you force the
opposite condition on the 5th jump, 8 characters of a partial word appeared in memory! This is
after the decryption routine processed the characters that were placed in the buffer as a result of the altered
branch paths. This small chunk of text surely wasn't random, so I knew I had to explore the technique further.
I started building a "map" (a
separate text file) of the addresses I had been to, the nesting level and the Jxx instruction information so I
could systematically try paths into these code sections until I arrived at the decryption block or a dead end (the function just returns), at which
point I start working my way back up the nesting levels. This is your basic binary tree traversal, with
the jump condition at each point reversed to see how the output is affected. The
portions of text that did properly decrypt did so in 8-byte blocks, and each correct branch section was capable of building
only one of these blocks (at least on the branches I tried).
After mapping all of #100, #260 and parts of #352, I wasn't able to decrypt anything prior
to the sequence: "email@example.com".
I spent 2 more days trying to decrypt the portion I thought would come before the word "unconditional" as I was
sure my solution was incomplete. The range of memory referenced by the various sub-functions led me to believe
there was more to decrypt. A solid month of doing these FLARE challenges had clearly fried my brain. At the
point of nearly giving up, I decided to fire off an e-mail to the decrypted portion of the address I did have. I figured
worst case, I get no replies and go back to the drawing board. So I e-mailed ... FLARE responded ... [and I
slapped my forehead]
The sequence I used to build the solution was based from traversing the case #100 branch.
The final 8-character block (comprising "e-on.com") was found by cross-referencing the sub-functions that wrote to
that "missing" block of memory. I called this block4.
I might not have gone to the trouble to decode this block had I realized the first part of the decoded e-mail
address was already complete: "unconditional_conditions@flar"; as the rest of the e-mail address could have been
follow case #100 branch:
[challenge+0x5D207] manually take the jump
[challenge+0x7D81C] manually skip jump
[challenge+0x82126] manually take the jump
[challenge+0x85166] manually take the jump (builds block3)
[challenge+0x859A5] break here, then manually set EIP to:
[challenge+0x9C04D] (builds block4)
The official solution
differs in a number of ways from my approach.
I felt the method I used to have the AutoIt script "echo" the already-decrypted result was far easier
and quicker than recognizing and implementing the RC4 cipher in a Python script to perform the decryption
separately. The author's purpose was likely to shed light on the inner workings of the shellcode, which I
I missed the clue for the installer-run case #216 branch (IOCTL code 22E0DC). Although I was correct that the function did
nothing execution-wise, the function wasn't a complete red herring after all.
If I had read through the function and translated each occurrence of the JZ and
JNZ instructions as 0 and 1 codes, I would have decoded the string "try this ioctl: 22E068". This
refers to case branch #100 described above, which I did ultimately use to derive the solution. The clue may
have helped me avoid trying all of the other useless IOCTL codes in different combinations.
The author then offered this information about the case #100 branch:
"Shortly before each test operation, the variable being
tested is set to zero. After checking a few branches, it becomes apparent that the branches filling the array
that we care about are never taken with the code in its current state."
The thing that consumed a large amount of time near the end of the challenge was manually traversing the
conditional branch "tree" until I got a decrypted result. My trial-and-error method wasn't exactly ideal but
it was all I had at the time. The author's ideal solution consisted of a massive opcode search and replace on
the case #100 branch:
|00000000 C645 9E 00 mov byte ptr [ebp-98], 0 ;replace this instruction
00000000 C645 9E 01 mov byte ptr [ebp-98], 1 ;with this one |
Wherever the top instruction appeared in the code, the 0 constant resulted in the wrong branch being taken
100% of the time. Maybe I should have paid more attention to the location of the constant that was responsible
for the chosen branch and I might have seen the pattern. The author then described a clever method to patch the function in place by
using a .writemem, patch, and .readmem sequence.
I felt really silly after reading internet posts about how other contestants arrived at the solution without
using a kernel debugger and probably didn't waste as much time as I did. In one case, the contestant zeroed in on
the target buffer and case #100 in IDA (in static analysis mode) by tracing the IOCTL code path. Then
memory from the target buffer was cross-referenced to find all of the locations that were written to by *one*
function (as opposed to multiple). The majority of these all happened to be that last block that contained the key,
obtaining the encrypted byte values. I was too stuck on the idea the whole range of memory was to be decrypted
- an assumption that cost me days. Then they ran the decryption function in isolation on that block and bingo.
Response from firstname.lastname@example.org:
Subject: FLARE-On Challenge #10 Completed!
Date: Tue, 01 Sep 2015 13:27:06 -0400
One day they recite the great minds of history, Newton, Shakespeare, Galileo, Ramanujan, Curie, Flare-On Contestant #743, etc. You are building a lasting legacy for yourself.
I have attached another file. You can either spend all weekend working on this challenge or just go on living your life, your call. The password to the zip archive is "flare" again.
<< Flare-On 2015 Index
-- Go on to Challenge #11