<-- Flare-On 2015 Index / FLARE-On 2015 Challenge #10 
FLARE-On 2015 Challenge #10

Date: Sep 1, 2015


filename:    loader    DOWNLOAD
size3.36 mb (3,523,584 bytes)
typeXP/Win7 32-bit Kernel-mode Driver
Original FLARE AuthorTom Bennett
tool:    IDA / Disassembler    Visit Website
tool:    Exe2Aut / AutoIt3 Decompiler    Visit Website
tool:    AutoIt / AutoIt Interpreter    Visit Website
tool:    VMWare Workstation / Guest copy of Windows XP    Visit Website
tool:    Debugging Tools for Windows / WinDbg    Visit Website
tool:    VirtualKD / Kernel Debug Virtualization Accelerator    Visit Website

This challenge will test your ability to step-through a driver by debugging Windows in kernel mode. You will soon see why a debugger is necessary to reveal the solution.

A hex dump of the first few bytes of the supplied file reveals that it is a Windows PE executable image. When it is renamed to EXE and run it however, nothing seems to happen. If this program is run under a debugger, an IsDebuggerPresent() check at 403D57 is encountered early on, resulting in messagebox that appears prior to the program's early termination:

Flare-On 2015 Challenge #10 - AutoIt IsDebuggerPresent() check

If you choose to bypass this rudimentary anti-debug check either by patching the executable, adjusting the contents of EAX upon return from IsDebuggerPresent(), or manually adjusting EIP to follow the false branch, you can figure out what is going on provided you want to spend a good deal of time in the debugger. However, since the application has so nicely informed us that it is a compiled AutoIt script, we might be able to save a lot of time and hassle by trying our luck with an AutoIt decompiler. If successful, the decompiler would give us access to the script that was previously executed so we can see what was done to the system.

Although the state of decompiling current-version AutoIt executables as of 2015 seemed to indicate a lack of a currently maintained decompiler, the Exe2Aut tool seemed to have no problems with this particular executable. From the Exe2Aut directory, I used the command "Exe2Aut.exe -nogui loader.exe", resulting in the extraction of 4 files in the current directory:
07/23/2015  10:51 AM            46,080 ioctl.exe
07/28/2015  04:42 PM         2,688,640 challenge-xp.sys
07/28/2015  04:42 PM         2,689,536 challenge-7.sys
06/06/2016  02:17 PM            12,444 loader_.au3
Flare-On 2015 Challenge #10 - Using Exe2Aut AutoIt Decompiler

Besides the author of Exe2Aut being a fan of Robin Hood, we can guess we will be dealing with some drivers due to the presence of the .sys files.

The AutoIt script is the file with the .AU3 extension. An initial examination of the file in a text editor like notepad or using the free AutoIt editor is shown below. The file begins with some constant definitions followed by a mini-library of functions to maintain Windows services, such as _startservice(), _stopservice(), _createservice(), etc. It isn't until we get to the last screenful that we see the script's entry point where the processor architecture and operating system version is first checked. The script requires a 32-bit version of either Windows XP or Windows 7. The script then drops the corresponding driver file as challenge.sys into the Windows system32 directory, or the script exits with the "Unsupported OS" error. Next, some unknown executable with the name ioctl.exe is also dropped in the same directory.

Things start to get interesting on lines #277 and up. The last three actions performed by the script appear to be encrypted, where gibberish-looking strings are passed to the function dothis(). We can probably guess a driver-service is installed, but we need to know for sure. Each dothis() call is passed the additional string "flarebearstare", so we can guess this is probably the decryption key. Because dothis() must decrypt the commands before AutoIt can run them (via Execute()), we shouldn't have any problems discovering those commands as long as we output the string after decrypt() has already done the work for us.

Flare-On 2015 Challenge #10 - Bottom portion of loader_.au3 script

Although we can bypass the decryption algorithm, I thought it was neat how decrypt() injected shellcode into the CallWindowProc() API with the help of AutoIt's buffer creation facilities as the basis for the decryption function since native AutoIt script would have been easier to reverse-engineer. The decryption function shellcode is shown here for educational purposes and was derived from pasting the $opcode string (line #285) into your favorite hex editor and running the resulting file through a disassembler like IDA. I used disasmdump by just pasting the hex opcode string directly on the command line (without the 0x prefix):

00000000 C8 1001 00 enter 272, 0 00000004 6A 00 push 0 00000006 6A 00 push 0 00000008 53 push ebx 00000009 56 push esi 0000000A 57 push edi 0000000B 8B55 10 mov edx, dword ptr [ebp+16] 0000000E 31C9 xor ecx, ecx 00000010 89C8 mov eax, ecx 00000012 49 dec ecx 00000013 89D7 mov edi, edx 00000015 F2:AE repne scasb 00000017 48 dec eax 00000018 48 dec eax 00000019 29C8 sub eax, ecx 0000001B 8945 F0 mov dword ptr [ebp-16], eax 0000001E 85C0 test eax, eax 00000020 0F84 DC000000 je 00000102h 00000026 B9 00010000 mov ecx, 256 0000002B 88C8 mov al, cl 0000002D 2C 01 sub al, 1 0000002F 88840D EFFEFFFF mov byte ptr [ecx+ebp-273], al 00000036 E2 F3 loop short 0000002bh 00000038 8365 F4 00 and dword ptr [ebp-12], 00000000h 0000003C 8365 FC 00 and dword ptr [ebp-4], 00000000h 00000040 817D FC 00010000 cmp dword ptr [ebp-4], 256 00000047 7D 47 jge short 00000090h 00000049 8B45 FC mov eax, dword ptr [ebp-4] 0000004C 31D2 xor edx, edx 0000004E F775 F0 div dword ptr [ebp-16] 00000051 92 xchg eax, edx 00000052 0345 10 add eax, dword ptr [ebp+16] 00000055 0FB600 movzx eax, byte ptr [eax] 00000058 8B4D FC mov ecx, dword ptr [ebp-4] 0000005B 0FB68C0D F0FEFFFF movzx ecx, byte ptr [ecx+ebp-272] 00000063 01C8 add eax, ecx 00000065 0345 F4 add eax, dword ptr [ebp-12] 00000068 25 FF000000 and eax, 000000ffh 0000006D 8945 F4 mov dword ptr [ebp-12], eax 00000070 8B75 FC mov esi, dword ptr [ebp-4] 00000073 8A8435 F0FEFFFF mov al, byte ptr [esi+ebp-272] 0000007A 8B7D F4 mov edi, dword ptr [ebp-12] 0000007D 86843D F0FEFFFF xchg byte ptr [edi+ebp-272], al 00000084 888435 F0FEFFFF mov byte ptr [esi+ebp-272], al 0000008B FF45 FC inc dword ptr [ebp-4] 0000008E EB B0 jmp short 00000040h 00000090 8D9D F0FEFFFF lea ebx, [ebp-272] 00000096 31FF xor edi, edi 00000098 89FA mov edx, edi 0000009A 3955 0C cmp dword ptr [ebp+12], edx 0000009D 76 63 jbe short 00000102h 0000009F 8B85 ECFEFFFF mov eax, dword ptr [ebp-276] 000000A5 40 inc eax 000000A6 25 FF000000 and eax, 000000ffh 000000AB 8985 ECFEFFFF mov dword ptr [ebp-276], eax 000000B1 89D8 mov eax, ebx 000000B3 0385 ECFEFFFF add eax, dword ptr [ebp-276] 000000B9 0FB600 movzx eax, byte ptr [eax] 000000BC 0385 E8FEFFFF add eax, dword ptr [ebp-280] 000000C2 25 FF000000 and eax, 000000ffh 000000C7 8985 E8FEFFFF mov dword ptr [ebp-280], eax 000000CD 89DE mov esi, ebx 000000CF 03B5 ECFEFFFF add esi, dword ptr [ebp-276] 000000D5 8A06 mov al, byte ptr [esi] 000000D7 89DF mov edi, ebx 000000D9 03BD E8FEFFFF add edi, dword ptr [ebp-280] 000000DF 8607 xchg byte ptr [edi], al 000000E1 8806 mov byte ptr [esi], al 000000E3 0FB60E movzx ecx, byte ptr [esi] 000000E6 0FB607 movzx eax, byte ptr [edi] 000000E9 01C1 add ecx, eax 000000EB 81E1 FF000000 and ecx, 000000ffh 000000F1 8A840D F0FEFFFF mov al, byte ptr [ecx+ebp-272] 000000F8 8B75 08 mov esi, dword ptr [ebp+8] 000000FB 01D6 add esi, edx 000000FD 3006 xor byte ptr [esi], al 000000FF 42 inc edx 00000100 EB 98 jmp short 0000009ah 00000102 5F pop edi 00000103 5E pop esi 00000104 5B pop ebx 00000105 C9 leave 00000106 C2 1000 retn 16

As mentioned above, the script already knows how to decrypt the command strings. We just need them written to the console instead of executing them, so we know what was done to the system. After researching the AutoIt basic-like syntax, I made the following modifications:
  • Get rid of all the constants and service functions at the top as they aren't necessary for the decryption (they actually conflicted with the WinAPI.au3 file we needed to include for the console functionality). Proceed to remove everything starting from and including line #1 thru line #276.
    Global $standard_rights_required = 983040
    FileInstall("ioctl.exe", @SystemDir & "\ioctl.exe")
  • At top of the file, add this chunk of code to initialize the console so we have a place to dump the decrypted strings:
        If Not _WinAPI_AttachConsole() Then
            $ret = DllCall("Kernel32.dll", "long", "AllocConsole")
            If $ret = 0 Then Exit MsgBox(0, 'EXIT', "No Console allocated!")
        $hConsole = _WinAPI_GetStdHandle(1)
        $hConsoleIn = _WinAPI_GetStdHandle(0)
        If $hConsole = -1  Then
            MsgBox(0, "Error", "GetStdHandle failed")
        _WinAPI_WriteConsole($hConsole,  @CRLF)
  • Modify the dothis() function, replacing the last line "Return Execute($exe)" with the following:
    _WinAPI_WriteConsole($hConsole,  $exe & @CRLF)
    Return 1
You should end up with a modified script that looks similar to the one below:

Flare-On 2015 Challenge #10 - Modified loader_.au3 to dump decrypted commands

Run the modified script through the [separately installed] AutoIt interpreter via the command-prompt. I.e.:
"%PROGRAMFILES%\AutoIt3\AutoIt3.exe" <modified_script_name>
The 3 decrypted AutoIt commands should appear on the console:
_CreateService("", "challenge", "challenge", @SystemDir & "\challenge.sys", "", "", $SERVICE_KERNEL_DRIVER, $SERVICE_DEMAND_START)
_StartService("", "challenge")
ShellExecute(@SystemDir & "\ioctl.exe", "22E0DC")
You can gather that a driver-level service called "challenge" is being created and started which is no surprise. A driver has to be installed in the system before it can be run. Once installed, this mysterious ioctl.exe is being executed with the argument "22E0DC". Refer back to the original loader_.au3 script with the definitions of the service functions if needed. These functions did appear to do what they seemed like they would do. Although the challenge service doesn't appear in the Services MMC console snap-in, you can verify it is installed via registry entries and that it can be stopped and started from the command line.

If you pull up ioctl.exe in IDA, you'll find a nice little program that appears to be unmodified from a Microsoft Visual Studio release build, as indicated by the lack of obfuscation or alterations to impede analysis as well as the embedded PDB path "C:\Users\Me\documents\visual studio 2010\Projects\ioctl\Release\ioctl.pdb". As we might guess, this program probably has something to do with the challenge.sys driver. This program appears to:
  • Convert the first command line in the form of a numeric hex string (base-16) argument to a DWORD; if you don't supply at least one argument, the program crashes because it doesn't check argc first.
  • CreateEvent() creates an unnamed manual reset event, initialized to nonsignaled.
  • A handle to the driver is opened with CreateFile() (shown with numeric constants converted to the corresponding named constants):
        "\\.\challenge",                    //object path to driver (string in quotes as-is without backslashes escaped)
        GENERIC_WRITE|GENERIC_READ,         //access
        NULL,                               //default security descriptor
        CREATE_ALWAYS|CREATE_NEW,           //creation disposition
        FILE_FLAG_OVERLAPPED,               //flags and attributes
        NULL                                //no template
  • DeviceIoControl() is called with the DWORD of the first command line argument passed as dwIoControlCode, with no input buffer arguments. The event created earlier is also passed.
  • Finally, the program uses GetOverlappedResult() to wait until the driver's handler returns before exiting. This program does not look at or output the returned buffer.
At this point, we can gather that ioctl.exe is used to communicate with the driver via numeric code. It is unclear what the 22E0DC code means (passed to the driver from the AutoIt script) or if ioctl.exe should be reading a response value back. Might this challenge involve fixing ioctl.exe so we can read back, for instance whether or not our "code" was a success or failure?

Because of the name of the supplemental program (ioctl.exe), we might also guess that challenge.sys is an IOCTL driver. If you download the Windows 7.1 DDK, you can read about IOCTL drivers and view the sample, located off of the installation directory at "src\general\ioctl\wdm".

The next logical step is to look inside the driver. The entrypoint for a driver is the same as its PE entry point. This address, also known as the DriverEntry() is called when Windows loads the driver and its primarily responsibilities are to fill-in a structure of pointers to callback functions that Windows will use to communicate with the loaded driver. If we can locate the code that populates these callback addresses, we should be able to find the address of the IOCTL code handler.

This tutorial deals with the analysis of the Windows XP version of the driver (challenge-xp.sys) instead of the one for Windows 7. I assume the Windows 7 version of the driver conformed to the driver model changes needed to run on Windows 7, but was otherwise identical in functionality.

When we load challenge-xp.sys in IDA, we can easily locate the DriverEntry() from the "Exports" tab at address 29EDBE. The DriverEntry() code happens to reference two addresses. The first sets some "BugCheck" values to a couple global variables but is otherwise uninteresting. These are usually used to verify the integrity of the stack before the function returns to the OS, but otherwise look boilerplate. The 2nd code reference in DriverEntry() jumps to 29CC90. Within 29CC90, there are two calls to RtlInitUnicodeString() followed by a call to a function IDA named sub_29D9E4(). Although IDA didn't recognize this function, I gathered that it was probably a call to the IoCreateDeviceSecure() API. This was due to the arguments supplied and the call to IoDeleteDevice() if the subsequent call to IoCreateSymbolicLink() failed. This is a common sequence for a driver initialization routine. The call is shown below (again, with numeric constants converted to the corresponding named constants):
    0,                          //DeviceExtensionSize
    "\Device\challenge",        //device name (string in quotes as-is without backslashes escaped)
    FILE_DEVICE_UNKNOWN         //0x22,
    1,                          //device is Exclusive (only one handle to the device can be open at a time)
    L"68",                      //default SDDL string (security permissions)
    &guid,                      //registry guid 0xDDEEAAFF to override DefaultSDDLString, DeviceType, DeviceCharacteristics, and Exclusive parameters
The thing we care about however are the registered callback functions. If we look to the top of the current routine, even before any APIs are called, an array of 27 function pointers are filled in using the same address: 29C1A0. The exception is the DriverUnload() routine which is filled in with a different address after the loop exits.

.text:0029CCEB 8B 55 E8 mov edx, [ebp+var_18] ;this block of code populates the registered callback functions .text:0029CCEE 83 C2 01 add edx, 1 .text:0029CCF1 89 55 E8 mov [ebp+var_18], edx .text:0029CCF4 83 7D E8 1B cmp [ebp+var_18], 1Bh ;loop control index >= 27? .text:0029CCF8 7D 10 jge short loc_29CD0A ;if so, exit loop .text:0029CCFA 8B 45 E8 mov eax, [ebp+var_18] .text:0029CCFD 8B 4D 08 mov ecx, [ebp+arg_0] .text:0029CD00 C7 44 81 38 A0 C1+ mov dword ptr [ecx+eax*4+38h], offset sub_29C1A0 ;initialize all DRIVER_OBJECT.MajorFunction (offset 0x38) entries to function 39C1A0 .text:0029CD08 EB E1 jmp short loc_29CCEB ;loop until done .text:0029CD0A 8B 55 08 mov edx, [ebp+arg_0] .text:0029CD0D C7 42 34 C0 B5 29+ mov dword ptr [edx+34h], offset sub_29B5C0 ;initialize DRIVER_OBJECT.DriverUnload (struct offset 0x34) to a different callback

You'll find the DriverUnload() routine is as boilerplate as it is uninteresting, so we'll move on to the registered callback that handles everything else. We are now relatively sure that when a code is passed on the command line to ioctl.exe, the function at 29C1A0 will be called. When we look at this routine in IDA, we see a simple function that utilizes a rather large switch statement. The function can be distilled down to this "massaged" C++ representation, which is similar to the boilerplate callbacks seen in the DDK samples. So far, there is nothing that jumps out as out of the ordinary, except the large number of cases in the switch statement.

//challenge.sys' generic handler int sub_29C1A0(DEVICE_OBJECT* pDevice, IRP* pIrp) { pIrp->IoStatus.Status = 0; pIrp->IoStatus.Information = 0; IO_STACK_LOCATION* pIrpStackLoc = IoGetCurrentIrpStackLocation(pIrp); //only handle DeviceIoControl communication if (pIrpStackLoc->MajorFunction == IRP_MJ_DEVICE_CONTROL) { DWORD dwSwitch = pIrpStackLoc->IoControlCode - 0x22E004 switch (dwSwitch) { case 0: sub_xxx(); break; ... //remaining 99 cases go here } //switch } IofCompleteRequest(pIrp,0); return 0; }

The switch statement contains a total of 100 cases, which all correspond to specific IO control codes supported by the driver. The driver is supposed to do something in response to an IO control code. The subtraction of the constant 0x22E004 is actually part of a compiler generated jump table for the switch statement reflecting the lowest case value. It is important to note the compiler-generated jump table contains 400 entries due to the nature of how jump tables can be oversized to compensate for gaps between the case values. Nevertheless,there are still only 100 cases. The gap between each case value reflects how IO control codes are structured and usually created with the CTL_CODE macro found in the DDK:

#define CTL_CODE(DeviceType,Function,Method,Access) (((DeviceType) << 16) | ((Access) << 14) | ((Function) << 2) | (Method))

More importantly, the bits of the control codes have more embedded within them than a unique application-defined code. As an example, here is a breakdown of the internal meaning of code 0x22E0DC that the AutoIt installer script passed to ioctl.exe:
        0 000000000100010 11 1 00000110111 00
        |               |  | |           | |__transfer type (METHOD_BUFFERED)
        |               |  | |           |____function code (in this case, 0x37)
        |               |  | |________________custom
        |               |  |__________________required access (FILE_READ_DATA|FILE_WRITE_DATA)
        |               |_____________________device type 0x22 (FILE_DEVICE_UNKNOWN)
        |_____________________________________common bit (vendor-assigned)
For our purposes, the exact IO control codes don't matter as long as we can easily figure out how they correlate to a particular branch in the switch statement. For this, we just need to add 0x22E004 to the value of the case code and ensure we pass the result as a hex value when using ioctl.exe. From now on, I generally refer to the case branches by the case code as reported by IDA (not the full-blown IO control codes), otherwise known as the index into the switch table.

Let's see what IDA can tell us about case 216 which is represented by IO control code 22E0DC (0x22E0DC - 0x22E004) invoked by the AutoIt installer script. When we view the routine called by this branch (sub_29B620) we are faced with a giant function comprised of sections of "junk" instructions glued together with jumps. It's clear from the first few pages it probably does nothing useful. IDA's C-pseudocode decompiler collapses the entire function down to the equivalent of a "return 0". Although its a good practice not to trust what the IDA decompiler reports, we can assume it is correct for now, and go back to the main handler's switch statement. I didn't imagine the installer script would give us the correct code so easily.

If you study the switch statement a little further, you'll notice a pattern. The majority of the functions called by each branch, including #216 (the one we just looked at) are passed the value 0x41 (the ASCII character "A"). Only three case branches out of 100 are not passed any arguments, so they kind of stand out. Those branches are 100, 260 and 352. Picking the 100 branch resulted in a function very large and complex. IDA's graph for this function alone is shown below (zoomed to fit) where each connector line indicates a conditional jump:

Flare-On 2015 Challenge #10 - Case 100 Zoomed Flow Graph

The amount of conditional jumps in this function is so abnormally large, I conclude the author must have auto-generated it somehow. Unlike the previous obfuscated code we encountered with case #216, this one embeds calls to more functions just like this! Amongst the obfuscated code inside these sub-functions, there is one difference not present in the main function. These sub-functions all seem to write different bytes to different positions within some portion of memory in the .data section. These functions have dozens and sometimes hundreds of references to other parts of the program's code, so you can gather that the order in which they are called is probably important.

A quick glance at the other two branches from the main switch statement (#260 and #352) resulted in similar obfuscated code that call sub-functions to write different bytes to the same portion of memory in the .data section. The rest of the case branches appear to only call the sub-functions just described, which are at most only responsible for writing one byte to this special target buffer. Unfortunately, the "big 3" case branches (#100, #260 and #352) are too large to analyze further, at least not without employing some more tools.

We are going to improve our odds of understanding "the big 3" by watching these branches run under a debugger. This is based on the idea that we can step through the obfuscated code faster than we can understand it, at least until we arrive at code that looks more interesting. We can also run large sections at a time. Because IDA has features to allow it to use the WinDbg backend, we can retain the features of IDA while using the facilities of a debugger. Another benefit is that static analysis will can get hung up on on opcodes that may be used in multiple instructions should we encounter any (i.e. jumping to bytes out of sync with the currently analyzed opcode flow). When using a debugger, this is no longer a problem as the debugger will always keep the disassembler in sync with the current EIP, even when the previous instruction goes out of sync.

Debugging a driver is a little more involved than using a user-mode debugger like OllyDbg or Visual Studio. Since drivers run at the operating system level in kernel mode, we must debug the entire operating system using a kernel debugger.

Traditional kernel debugging uses two separate machines (one for the debugger and one for the debugee) but luckily with the advent of virtualization software like VMWare and VirtualBox, we can use the same machine to simultaneously run the kernel debugger and a virtualized operating system in separate processes. The two then communicate via a named pipe. The communication between the debugger and the virtualized operating system is rather slow, so I recommend using the free VirtualKD performance-boosting tool which allows you to step through code as fast as if you were debugging a user-mode application. Speed is important, especially when IDA downloads the initial kernel memory snapshot.

The instructions that follow happen to use a VMWare "guest" (the virtualized OS) running Windows XP SP3. The "guest" OS is where our challenge.sys driver will be run, while we'll kernel debug it on the "host" OS. I also used a Windows host operating system as I don't think VirtualKD runs on Linux. If you haven't done so already, install the Debugging Tools for Windows on your host and guest operating systems so both machines have WinDbg suite of tools.

During the FLARE challenge, the latest version of VirtualKD was 2.8, so that is the version used here. Install VirtualKD on your host operating system by running the self-extracting download from their website. There is no installation Wizard, so choose a final destination folder when prompted. After the files have been extracted to the destination, copy the contents of the "target" directory to the guest operating system and run "vminstall.exe" on the guest. NOTE: There are a number of ways to get files to the guest operating system which is beyond the scope of this tutorial. If you don't know how, search the internet for a guide.

vminstall.exe ultimately creates a boot.ini entry in your guest operating system like the one shown below:
multi(0)disk(0)rdisk(0)partition(1)\WINDOWS="XP Pro SP3 [VirtualKD]" /DEBUG /DEBUGPORT=bazis /fastdetect  /noexecute=optin
Also note the debugger connection string used (as you'll need to plug this in to IDA later). Finally reboot when prompted and you'll soon find your guest at the boot options menu. Select the new VirtualKD option, BUT DON'T PRESS <ENTER> YET. Press your VMWare hotkey (such as the Windows-Key) to "ungrab" input focus from the guest, and run VirtualKD on your host (i.e. vmmon.exe). Unless you want to use WinDbg without IDA (such as if you don't have IDA), ensure "Start debugger automatically" is unchecked.

Flare-On 2015 Challenge #10 - VirtualKD Waiting on OS boot start

Once vmmon looks like it has initialized (only a couple seconds), switch back to your guest and hit <ENTER> to begin the boot process. Launch IDA after VirtualKD indicates "yes" for the "OS" column. NOTE: The boot process will wait indefinitely until you attach a kernel debugger, so you can take your time setting up IDA (below).

Within IDA, select "[Go] Work on your own" and navigate to the menu "Debugger" -> "Attach" -> "WinDbg debugger" to bring up the debugger connection dialog. Enter the connection string used by the virtual guest OS that you took note of above, such as:
NOTE: The reconnect option is not necessary, but you can add it to provide a more stable debugger connection. For example, if the guest OS goes into suspend-mode, IDA will be able to automatically and seamlessly reconnect to the kernel debugging session. Otherwise your session will be "frozen" and you'll have to restart the guest OS, VirtualKD and IDA to restart kernel debugging.

You can set various debug options in the next two dialogs as you see fit, but for Kernel Debugging, you must at least select one of the following from "Debug options" -> "Set specific options":
  • "Kernel mode debugging"
  • *OR*
  • "Kernel mode debugging with reconnect and initial break"
Select the latter if you used the "reconnect" option in your connection string or you want to set a breakpoint (such as an unresolved one) before the operating system boots. Finally click "OK".

Flare-On 2015 Challenge #10 - Setting up WinDbg options in IDA

You might see the messagebox "The current debugger backend (windbg) does not provide memory information to IDA...". Despite getting this warning, I had no problems loading and viewing memory under the debugger, so you can safely ignore it.

Flare-On 2015 Challenge #10 - IDA memory information warning

A "Choose process to attach to" dialog should appear with only one entry to select from: "0  <Kernel>" and click "OK". If you see the popup "Searching for crypto constants...", just click the "Cancel" button. I never did figure out what this meant, but it takes a long time and never seems to finish. At this point, your guest operating system will begin to load. If you chose "Kernel mode debugging with reconnect and initial break", the debugger will break even before you see the OS boot logo. Press F9 to resume loading Windows.

Flare-On 2015 Challenge #10 - Booting with Kernel Debugger attached

To stop debugging, use one of the following methods:
  • From a breakpoint, navigate to the "Debugger" menu and select "Detach from process". NOTE: You can then re-attach by going to "Debugger" -> "Attach to process", selecting <Kernel>, etc. or restarting IDA followed by "Debugger" -> "Attach" -> "WinDbg debugger".
  • With the debugger running, shut down the guest operating system. Then click the "Suspend" button in IDA's popup window (you may need to click the button twice and choose to forcibly close the connection).
Once the guest OS is shut down or detached, the IDA and VirtualKD windows can be closed.

With all the kernel debugging setup out of the way, we can finally debug this thing! If you have not yet copied and run loader.exe on the guest operating system (to install the driver), you must do that before continuing. If the driver has already been installed, and you've just booted, you'll need to issue the following command so the kernel loads challenge.sys as it is not automatically loaded upon startup:
net start challenge
If you get an error, the driver is either already running or wasn't installed (such as if you are running an unsupported version of Windows). Otherwise you might notice a debug message appears in the kernel debugger log window that reads "Challenge Driver Loaded..". This message originates from the challenge driver itself via the DbgPrint call after IoCreateSymbolicLink() succeeds.

On your host machine, click the "Suspend" button in IDA to break into the debugger. Locate the "Modules" window (usually in the right pane) and select "challenge". Note its base address (more on this later). Right-click your selection and choose "Analyze module".

Flare-On 2015 Challenge #10 - Analyze Challenge Module

IDA pops up a confirmation dialog, which is annoying because the very condition it is trying to warn about is being addressed by the very module analysis you have just selected.

Flare-On 2015 Challenge #10 - Analyze

NOTE: What IDA really means is "are you sure you want to download kernel memory" (may take a while under a slow debugger connection) and for you to not forget to re-analyze the module if the driver is reloaded (such as by starting and stopping the associated service) as the driver's location in memory may change.

The first time you analyze a module in a kernel debugging session, IDA will download the contents of memory from the guest OS. This only takes about 20-30 seconds with VirtualKD and you'll see bytes being transferred in the lower left corner of the status bar. Wait until the you see "idle" in the same area of the status bar and you'll know the download and analysis is complete. If you are re-analyzing a module within the same debugging session after it has been reloaded (such as by restarting the service), IDA won't download the guest's memory again. It will update its internal addressing to reflect the new location of the previously analyzed module.

During static analysis, the address of the function we wanted to debug (the one with the giant switch statement) was at address 29C1A0. Static analysis sets up the memory pointers assuming the image loads at its default/preferred base address. When it comes to drivers, Windows rarely loads them at their preferred base address, so our address 0x29C1A0 needs to be converted. First, you must subtract the default base address (0x10000) to obtain the pure offset (known as an RVA in the PE specification). Then simply add the current base address shown in the debugger to this offset and that's the pointer we can use. The formula is:
<static_analysis_address> - <default_base> + <current_base>
One nice thing about WinDbg syntax is that we can mostly use RVAs combined with the module's name and let the debugger do the calculation for us:
While the WinDbg command window supports this syntax, unfortunately IDA does not; which is a little weird because we're using both at the same time. If you want to jump to an addresses in the disassembly or hex-dump windows using the "g" key for example, you must manually calculate the address you specify by always taking into account the current module's base address. Because the RVA offsets don't change between driver reloads, this is the address syntax we'll use when possible for the remainder of the tutorial.

Another important thing to remember is that IDA doesn't interpret base-16 (hex) numbers unless prefixed with "0x", whereas WinDbg defaults to base-16 numbers. Its a good idea to get in the habit of prefixing all of the numbers and addresses that you intend on being hex with "0x" to ensure they are interpreted properly regardless of which window you are typing in.

Using the WinDbg command window (lower left corner), we'll set a breakpoint using the simplified RVA module offset syntax to specify our callback handler (gathered above):
bp challenge+0x28C1A0
Now run the debugger by hitting the F9 key. The guest machine should "unfreeze" and become usable once again. On the guest machine, open a command prompt if one is not already open and navigate to the system32 directory where ioctl.exe was placed by the installer script. Let's simulate the same command the AutoIt installer script passed to the driver by running:
ioctl.exe 22E0DC
The debugger should immediate break on the driver's main callback function. Switch to it, and begin stepping-over instructions using the F8 key. Since this handler is used for all of the driver's events, the handler will get called on each of these events:
IRP_MJ_CREATE           (00)
IRP_MJ_DEVICE_CONTROL   (0e) <--- this event is the one we want
IRP_MJ_CLEANUP          (12)
IRP_MJ_CLOSE            (02)
The check to ensure that the handler is dealing with an IRP_MJ_DEVICE_CONTROL event is the "cmp [ebp+var_1C], 0Eh" instruction a couple lines above the jmp. Just hit F9 to run past the events you don't care about.

Flare-On 2015 Challenge #10 - Break on Challenge main driver callback

When finally arriving in the case branch for #216, step-into (F7) the call at challenge+0x28C605. Stepping through this function happens to be a lot quicker than statically analyzing it as it ends only after a couple of jumps. The debugger helps us to quickly see how the constants are moved around and used in the conditional jumps that turn out to be unconditional. The majority of the function never used. The conclusion is that this function doesn't do anything useful, the same as what IDA's pseudocode reported to us previously.

We'll now focus our attention on switch case #100 as it is the first of the "big 3" character-building sequence branches. We'll set a breakpoint on challenge+0x28C3DA and run the debugger. We can then hit the breakpoint by running "ioctl.exe 22E068" on the guest. Stepping over most of the obfuscated code in the main function and into a few of the sub-functions, we can see that bytes are being built in a section of memory between challenge+0x28D840 and challenge+0x28D8B8 (0x78 bytes) in no particular order. There are gaps they are never filled-in and certain locations where different bytes are repeatedly set. The sub-functions setting these bytes are significantly shorter, however they also comprise a large amount of obfuscated code. Its not until we step through a whopping 1391 bytes of instructions in the main function that we arrive at the first conditional jump (which is in fact unconditional due to its semantics). After the jump we step through another large section of obfuscated code that shares the same traits as the last. Its not until the 9th conditional jump (where the previous conditional jumps were either unconditionally taken or unconditionally skipped) that we hit a jump-sled of 4 consecutive jmp's finally dumping us into small chunk of critical-looking code before returning. By the time we get here, we are already viewing (in one of IDA's hex windows) the section of memory that was built by the various sub-functions. When we step over the CALL at challenge+0x9D0B1, a chunk of 40 bytes between challenge+0x28D890-0x28D8B8 are transformed into other random looking characters. One could guess that if the right characters were already in the target memory section, this routine might properly decrypt them.

Flare-On 2015 Challenge #10 - Results of running 'stock' case 100 branch

challenge:B1F9D0A4 8D 4D D0 lea ecx, [ebp+var_30] challenge:B1F9D0A7 51 push ecx challenge:B1F9D0A8 8D 55 C4 lea edx, [ebp+var_3C] challenge:B1F9D0AB 52 push edx challenge:B1F9D0AC 68 90 D8 18 B2 push offset byte_B218D890 ;pointer to start of memory to be modified challenge:B1F9D0B1 E8 BA 34 F6 FF call sub_B1F00570 ;decryption routine? challenge:B1F9D0B6 8B E5 mov esp, ebp challenge:B1F9D0B8 5D pop ebp challenge:B1F9D0B9 C3 retn

Altering the parameters passed to the routine to process a larger section of memory didn't produce a meaningful result.

A similar investigation into the two other switch case branches (#260 and #352) are like #100. Large blocks of obfuscated code separated by conditional-looking jumps with interspersed calls to sub-functions that set bytes in the same target memory section. Both of the branches end with a jump-sled to a code block that calls the same decryption routine, but on a different portion of the same section of memory. Also like case #100, these decryption calls do not produce meaningful results. Here is a "massaged" C++ representation of what decryption routine is doing:

DWORD funcCrypto(BYTE* pInOutBuffer, DWORD* pDwBufSize, char* pszHexValues) { //loop count is buffersize/8 because we're doing 8 byte chunks per iteration DWORD uChunksToProcess = *pDwBufSize >> 3; for (DWORD i = 0; i < uChunksToProcess; ++i ) { DWORD* pdw8ByteChunk = (DWORD*)(pInOutBuffer+(i*8)); //CALLed inner function (inlined here) DWORD dw1 = *pdw8ByteChunk; //get first dword DWORD dw2 = *(pdw8ByteChunk+1); //get next dword int runVal = 0xC6EF3720; for (uint i = 0; i < 0x20; ++i ) // loop 32 times { dw2 -= (*(DWORD*)(pszHexValues + 12) + (dw1 >> 5)) ^ (runVal + dw1) ^ (*(DWORD*)(pszHexValues + 8) + 16 * dw1); dw1 -= (*(DWORD*)(pszHexValues + 4) + (dw2 >> 5)) ^ (runVal + dw2) ^ (*(DWORD*)pszHexValues + 16 * dw2); runVal += 0x61C88647; } *pdw8ByteChunk = dw1; //store modified dwords back to memory *(pdw8ByteChunk + 1) = dw2; } //for //last dword processed is passed back as running varuable *pDwBufSize = *(DWORD*)(pInOutBuffer + 8 * uChunksToProcess - 4); return (*pDwBufSize); }

The functions called from the remaining 96 switch cases were the same sub-functions referenced dozens (and sometimes hundreds) of times by the giant #100, #260 and #352 sequencing branches that ultimately attempted a decryption of the buffer at the end. This led me to believe I might need to call some or all of these switch cases prior to one (or all) of the "big 3" branches.

One interesting property I noticed is that the contents of the memory section being built by the various branches was being retained between ioctl.exe calls as long as the driver remained loaded. The memory would be reset to zeros once the service was restarted, so it got me thinking that I could restart the service in between any number of switch case combinations as long as I ended on one of the "big 3" branches responsible for calling the decryption routine.

I wrote my own beefed up version of ioctl.exe to call the various case branches in different combinations. My version was also modified to pass an input buffer and output any results the driver might send back (in case the supplied ioctl.exe was intentionally broken). Because there were too many switch branches to try all possible combinations, I figured I'd try the common ones, such as all combinations of #100, #260 and #352. Then I tried all of the branches in sequence, reversed, all excluding the "big 3" (then jumping to the decryption block), then that reversed, and so on. Since the installer-run case branch (#216) was one of the first subroutines determined not to modify the target memory section, I used it as a breakpoint branch where I didn't already have a convenient place to break into the debugger at the end of the sequence.

I'll spare you the gory details of the wild goose chase I pursued for a couple of days. That memory section just didn't seem to want to decrypt to anything meaningful. One of the biggest problems I noticed is that the "big 3" branches destroyed most of what was placed into the buffer by the other switch cases by the time the decryption routine was executed.

I shifted my focus to the individual sub-functions that were responsible for writing bytes to the target memory section using IDA's handy cross-referencing feature (CTRL+x). Clearly many of these function calls were buried in branches that would never execute, as the .text section for this driver is a whopping 2.5 MB of mostly garbage instructions and fake conditional branches. Many byte positions were only modified by one sub-function, so those were easy. The other positions were either unmodified by any sub-function or modified by two or more. Around the time I was thinking about how to determine the correct byte for those memory locations that were modified by multiple sub-functions I decided to focus back on the "big 3" thinking that the correct sequence must already be there. Maybe if I were to force a certain critical branch to execute that otherwise would not have, that memory section might get initialized with the proper byte sequence.

Starting with case branch #100, I forced the first branch at challenge+0x1CCCF to skip the jump that would have always been taken. This is done by pressing CTRL-n while highlighting the instruction you want to set EIP to. I then ran the remainder of the normal code paths down to the decryption routine. Nothing meaningful was decrypted. Following this technique, I forced the condition just described, and additionally forced the next deeper jump condition (challenge+0x1D0EE made a jump that would have otherwise never been taken). To do this you double-click on the jump address to bring the target in view and press CTRL-n to set the EIP. Still no dice. Not long after, I found that if the first 4 jumps are allowed to execute normally, but you force the opposite condition on the 5th jump, 8 characters of a partial word appeared in memory! This is after the decryption routine processed the characters that were placed in the buffer as a result of the altered branch paths. This small chunk of text surely wasn't random, so I knew I had to explore the technique further.

Flare-On 2015 Challenge #10 - Possible piece of the solution?

I started building a "map" (a separate text file) of the addresses I had been to, the nesting level and the Jxx instruction information so I could systematically try paths into these code sections until I arrived at the decryption block or a dead end (the function just returns), at which point I start working my way back up the nesting levels. This is your basic binary tree traversal, with the jump condition at each point reversed to see how the output is affected. The portions of text that did properly decrypt did so in 8-byte blocks, and each correct branch section was capable of building only one of these blocks (at least on the branches I tried). After mapping all of #100, #260 and parts of #352, I wasn't able to decrypt anything prior to the sequence: "unconditional_conditions@flare-on.com".

Flare-On 2015 Challenge #10 - Whole solution?

I spent 2 more days trying to decrypt the portion I thought would come before the word "unconditional" as I was sure my solution was incomplete. The range of memory referenced by the various sub-functions led me to believe there was more to decrypt. A solid month of doing these FLARE challenges had clearly fried my brain. At the point of nearly giving up, I decided to fire off an e-mail to the decrypted portion of the address I did have. I figured worst case, I get no replies and go back to the drawing board. So I e-mailed ... FLARE responded ... [and I slapped my forehead]

The sequence I used to build the solution was based from traversing the case #100 branch. The final 8-character block (comprising "e-on.com") was found by cross-referencing the sub-functions that wrote to that "missing" block of memory. I called this block4. I might not have gone to the trouble to decode this block had I realized the first part of the decoded e-mail address was already complete: "unconditional_conditions@flar"; as the rest of the e-mail address could have been guessed.

follow case #100 branch:
    [challenge+0x5D207] manually take the jump
    [challenge+0x7D81C] manually skip jump
    [challenge+0x82126] manually take the jump
    [challenge+0x85166] manually take the jump (builds block3)
    [challenge+0x859A5] break here, then manually set EIP to:
        [challenge+0x9C04D] (builds block4)
Flare-On 2015 Challenge #10 - Memory Decryption of Solution
The official solution differs in a number of ways from my approach.

I felt the method I used to have the AutoIt script "echo" the already-decrypted result was far easier and quicker than recognizing and implementing the RC4 cipher in a Python script to perform the decryption separately. The author's purpose was likely to shed light on the inner workings of the shellcode, which I appreciate.

I missed the clue for the installer-run case #216 branch (IOCTL code 22E0DC). Although I was correct that the function did nothing execution-wise, the function wasn't a complete red herring after all. If I had read through the function and translated each occurrence of the JZ and JNZ instructions as 0 and 1 codes, I would have decoded the string "try this ioctl: 22E068". This refers to case branch #100 described above, which I did ultimately use to derive the solution. The clue may have helped me avoid trying all of the other useless IOCTL codes in different combinations.

The author then offered this information about the case #100 branch:

"Shortly before each test operation, the variable being tested is set to zero. After checking a few branches, it becomes apparent that the branches filling the array that we care about are never taken with the code in its current state."

The thing that consumed a large amount of time near the end of the challenge was manually traversing the conditional branch "tree" until I got a decrypted result. My trial-and-error method wasn't exactly ideal but it was all I had at the time. The author's ideal solution consisted of a massive opcode search and replace on the case #100 branch:

00000000 C645 9E 00 mov byte ptr [ebp-98], 0 ;replace this instruction 00000000 C645 9E 01 mov byte ptr [ebp-98], 1 ;with this one

Wherever the top instruction appeared in the code, the 0 constant resulted in the wrong branch being taken 100% of the time. Maybe I should have paid more attention to the location of the constant that was responsible for the chosen branch and I might have seen the pattern. The author then described a clever method to patch the function in place by using a .writemem, patch, and .readmem sequence.

I felt really silly after reading internet posts about how other contestants arrived at the solution without using a kernel debugger and probably didn't waste as much time as I did. In one case, the contestant zeroed in on the target buffer and case #100 in IDA (in static analysis mode) by tracing the IOCTL code path. Then memory from the target buffer was cross-referenced to find all of the locations that were written to by *one* function (as opposed to multiple). The majority of these all happened to be that last block that contained the key, obtaining the encrypted byte values. I was too stuck on the idea the whole range of memory was to be decrypted - an assumption that cost me days. Then they ran the decryption function in isolation on that block and bingo.

Response from unconditional_conditions@flare-on.com:
Subject: FLARE-On Challenge #10 Completed! From: unconditional_conditions@flare-on.com To: <HIDDEN> Date: Tue, 01 Sep 2015 13:27:06 -0400 One day they recite the great minds of history, Newton, Shakespeare, Galileo, Ramanujan, Curie, Flare-On Contestant #743, etc. You are building a lasting legacy for yourself. I have attached another file. You can either spend all weekend working on this challenge or just go on living your life, your call. The password to the zip archive is "flare" again. -FLARE attachment_filename="42634F3F5FAF28306EB07675274AA6B6.zip"

<< Flare-On 2015 Index  --  Go on to Challenge #11 >>