Apparently, there's quite a bit more interest in
Win32® assembly language than I had originally
thought. After the February 1998 issue of MSJ hit the stands, I
received quite a bit of positive email and favorable comments from folks
at trade shows. Many readers said, "Have you also thought about
covering...?"
My February 1998 column
could have been called "Just Enough Assembly Language to Get By." Since
it was such a hit, it's time for the sequel: "Just Enough Assembly
Language to Get By, Part II." I'll look at additional instructions and
instruction sequences that come up often. I'll also describe some of the
most common scenarios when an instruction faults, and what to look for.
Before JMPing into the details, make sure you're at least familiar with the Intel x86
registers and instruction addressing modes. I covered both subjects in
my February column. Also note that none of the instructions mentioned
in my February column—and none of the ones I'll mention here—require
anything more than an 80386 system because the subset of instructions
that compilers typically use was standardized at least 12 years ago.
Common Instructions
Instructions INC value, DEC value
Purpose Increments or decrements integer value by 1
ExampleINC ESI
INC [EBP-8]
DEC [EAX+4]
The
INC and DEC instructions are used to increment and decrement values kept
in memory or registers. As you might imagine, these instructions map
precisely to the ++ and - - operators in C++ for standard integer
operations.
You
could use the ADD or SUB instructions to achieve the same effect as INC
and DEC, although it would be more expensive in terms of size. Since
they are so commonly used, the smallest versions of the INC/DEC
instructions take only a single byte. Looking at the Intel opcode map,
you'll see that there's an opcode for each of the eight general-purpose
registers that INC can be used against (EAX,
EBX, ECX, EDX, ESI, EDI, ESP, and EBP). Another eight opcodes are used
for the DEC instruction and the same
set of registers.
Instructions MUL value, value
DIV value, value
Purpose Multiplication and division
Example MUL EAX,EDX
MUL AL,BYTE PTR [EBP-14h]
DIV EAX,EBX
I
didn't cover the ADD and SUB instructions in my February column since
their operation is straightforward. However, the MUL and DIV
instructions have some quirks that make them difficult to read and
downright quirky to write. Throughout this column, when I mention (E)AX,
I'm referring to AL, AX, or EAX. Likewise, when I mention (E)DX, I'm
referring to DL, DX, or EDX.
Both
MUL and DIV treat their operands as unsigned values. The operands can't
be immediate values (such as 3); rather, they must be in registers or
memory. You may have noticed that the destination value (the first
argument) always seems to be (E)AX. This is by design. The use of the
(E)AX register is an implicit part of the instruction. Beyond the
implicit use of (E)AX, the (E)DX register is also silently involved. The
high bits of the MUL instruction end up in (E)DX. Likewise, for the DIV
instruction, E(DX) holds the remainder and (E)AX holds the quotient.
If you write any assembler code, MUL and DIV get even weirder. The assembler (both MASM and the Visual C++®
inline assembler) won't let you specify the (E)AX operand. Thus, if you
want the instruction MUL EAX,ECX, you would write MUL ECX—just another
example of the intuitive language syntax that's made assembly language
wildly popular in recent years.
Instructions IMUL value, value
IDIV value, value
Purpose Signed multiplication and division
Example IMUL WORD PTR [EBP+8]
IMUL EDX,ECX,8
IDIV EAX,DWORD PTR [EDX]
The
IMUL and IDIV instructions treat the operands as signed values. Contrast
this to MUL and DIV, which work on unsigned values. IDIV uses (E)AX as
the implicit first operand, just as DIV does. Also, like its DIV
counterpart, IDIV only works with register or memory values. IMUL, on
the other hand, doesn't fit the general patterns of MUL, DIV, and IDIV.
It can work with immediate values and it can have a non-(E)AX register
as the destination. There's even a form of the IMUL instruction that
takes three operands. To my knowledge, this is the only instruction in
the Intel opcode set with this distinction.
Instructions PUSHAD, POPAD
Purpose Saves or restores all general-purpose registers via the stack
PUSHAD
and POPAD push or pop EAX, ECX, EDX, EBX, ESP, EBP, ESI, and EDI on the
stack, in that order. These instructions are used in situations where
many registers may be modified and the programmer wants to leave no
evidence of the execution in the code. Although interrupt handlers are
passé for most programmers, they're a perfect example of where PUSHAD
and POPAD come in handy. Besides taking fewer opcodes than eight
individual PUSH instructions, they also execute faster (five clock
cycles on a Pentium).
Instructions PUSHFD, POPFD
Purpose Push or pop the EFLAGS register
In
some cases, it's inconvenient to use the flags set by a prior operation
immediately. Alternatively, you may want to make sure that some
operation you're about to execute won't change the current flag values.
For these situations, PUSHFD and POPFD are the easiest methods to save
and restore those bits.
PUSHFD
is one of the atomic components of an interrupt. When an interrupt or
an exception occurs, the following code effectively executes:
PUSHFD, PUSH CS, PUSH EIP.
Following the three pushes, the EIP register changes to the interrupt
handler address contained in the appropriate slot in the Interrupt
Descriptor Table (IDT). Likewise, the IRETD effectively does a POPFD as
part of returning from an interrupt.
Instructions SHL, SHR, SHLD, SHRD
Purpose Shift bits to the left or right
Example SHL EBX,3
SHR EBX,CL
SHLD EDX,ECX,4
SHRD ESI,EDI,CL
The
SHL and SHR instructions are logically equivalent to the C++ <<
and >> operators. Many of you probably recall that bitwise
shifting is a quick way to perform multiplication and division by powers
of 2. For example, the SHL EBX,3 instruction has the same effect as
multiplying EBX by 8 (23 == 8). Indeed, if you write C++ code that
multiplies or divides an unsigned value by 2, 4, 8, 16, and so on, it
will most likely compile to a SHL instruction.
When
shifting left, the low-order bits are filled with zeroes. The final
high-order bit that's "shifted out" is moved to the carry flag (CF). In
other words, the carry flag is like a virtual 33rd bit. When shifting
right, the high-order bits are filled with zeroes, and the last bit
shifted out moves to the carry flag.
Instruction ADD [EAX],AL
Purpose None
You
may see a lot of this particular instruction, and you'll probably see it
repeated. However, ADD[EAX], AL has no special significance. The opcode
bytes for this instruction are 00 00. In other words, it's what you'll
see if you're viewing a series of data bytes that all contain the value
0. Nothing to see here. You can all go home now.
Instruction CLD
Purpose Clears the direction flag
In my February 1998 column,
I described the string instructions LODSx, SCASx, STOSx, and MOVSx.
Each of these instructions uses the ESI or EDI register to point at the
memory to be read or written to. These instructions are typically used
in conjunction with the REP, REPE, or REPNE prefixes, which cause the
string instruction to execute several times until some specific
condition is met.
After
each REPx-induced iteration, the CPU changes the ESI or EDI register to
point to an adjacent memory location. The direction in which the
registers move is given by the direction flag. If the direction flag is
clear, ESI or EDI is incremented after each instruction (thus causing
the next higher memory location to be referenced in the next iteration).
When the direction flag is set, ESI or EDI decrements after each
iteration.
Most
of the time it's easiest to work moving forward in memory (toward
higher addresses) so that the direction flag is usually clear. However,
it's generally not safe to assume that the flag is clear. Thus, you'll
often see the CLD instruction somewhere before a string operation such
as REP MOVSB.
Instructions NOT value, NEG value
Purpose Negation of values
Example NOT DWORD PTR [EBP-8]
NEG EDX
The
NOT instruction does ones-complement negation. That is, it applies the
NOT operation to each bit in the operand. An initial value of 0 will
become 0xFFFFFFFF after a NOT instruction. The C++ ~ operator is
typically implemented via the NOT instruction.
The
NEG instruction does twos-complement negation. (If you're not 100
percent up on ones versus twos-complement negation, don't feel bad. I
learned this stuff 10 years ago in college, and I've completely
forgotten it!) An easier way to think of the NEG instruction is that it
puts a - sign in front of the value. Thus, using NEG on -3 yields 3,
while NEG applied to 4 yields -4. To summarize, you can think of NOT as
affecting individual bits, while NEG operates on the entire value.
Instruction NOP
Purpose No operation
The
NOP instruction does nothing and affects nothing. It's a single-byte
opcode that executes in one clock cycle and is primarily used to pad
code. For example, a compiler might want the beginning of a procedure to
start on a 16- byte boundary. The compiler/linker would insert enough
NOP instructions between the end of one procedure and the beginning of
the next procedure to create the desired alignment.
If
you're confident in your assembler abilities, the NOP instruction can be
applied to code in memory or in the executable file. You might know
that some instruction you're about to execute will cause a fault in a
debugger. If you want to skip that instruction, use the debugger to
write enough NOP opcodes (0x90) to eliminate the instruction. This is
useful to squash hardcoded INT 3 breakpoint instructions while you're
running under the debugger, effectively not stopping at the breakpoint.
Really advanced users can implement NOP instructions to obliterate
entire regions of code in an executable. (Warning! Harder than it
looks.)
Another
advanced use of the NOP instruction is when you want to make it easy to
patch or hook into your code. At the beginning of a procedure or block
of code, put in enough NOP instructions for the desired goal. Subsequent
patching or hooking code can write JMPs, CALLs, or whatever into the
NOP area.
Instruction INT 3
Purpose Debugger interrupt
INT
3 has two uses—one intended by the original CPU designers, the other
accidental. The INT 3 instruction is the standard method to suspend a
program and transfer control to a debugger. In normal use, programs
don't include INT 3 instructions in their code. Rather, when you set a
traditional breakpoint with a debugger, it temporarily overwrites the
target instruction with an INT 3 instruction. (The LODPRF32 program from
my July 1995 column illustrates this.) Note that an INT 3 instruction
is the heart of the DebugBreak API for Intel CPUs.
The
other offbeat use of the INT 3 instruction is as a paranoid NOP. In
those cases where a NOP would be used for padding (and theoretically
never executed), an INT 3 can be used instead. Like NOP, an INT 3
instruction is only a single byte. The key difference is that if a bug
crept in and you executed the INT 3 instruction, you'd pop into the
debugger. In the same scenario, the CPU would blithely sail through NOP
instructions and wreak havoc someplace farther away from the original
error.
The Microsoft®
linker uses INT 3s as paranoid NOPs when creating padding for
incremental linking. The linker also uses them as padding between
procedures it wants to align on a particular memory boundary. Usually
this alignment is on a multiple of 16 bytes unless you have the
"optimize for size" compiler option set. Figure 1 shows a section of code from CALC.EXE that illustrates INT 3 padding in action.
Instruction LOCK
Purpose This instruction locks the memory bus during the next instruction
ExampleLOCK INC DWORD PTR [EDX+04]
Technically
speaking, LOCK is an instruction prefix rather than an instruction in
its own right. In a multiprocessor environment, multiple processors
could access the same memory location at the same time. The LOCK prefix
insures that the instruction associated with it will have exclusive
access to the destination memory location.
If
you've ever examined the EnterCriticalSection API, you'll see that if
the critical section isn't currently held, the code essentially just
increments a counter. A LOCK prefix is used with an INC instruction to
guarantee that one thread won't increment the counter while another
thread on another CPU is reading it. You'll also see the LOCK
instruction used with multiprocessor synchronization APIs such as
InterlockedExchange and InterlockedIncrement.
A
final thought on the LOCK prefix: you may recall a bug on older Pentium
CPUs where a particular instruction sequence could cause the CPU to
freeze up. (See the February 1998 Editor's Note if you need a
refresher.) That instruction sequence isn't a valid sequence, and the
LOCK prefix plays a vital role in the ensuing CPU meltdown.
Common Instruction Sequences
Sequence CMP register_X, immediate_value_A
JE XXXXXXXX
CMP
register_X, immediate_value_B
JE XXXXXXXX
Purpose C++ switch statement
Example CMP EAX,1
JE 00400248
CMP EAX,3
JE 0040026E
CMP EAX,7
JE 004002A0
This
sequence (compare and JMP if equal) is the most straightforward
encoding of a C++ switch statement that I've seen. It's also very easy
to pick out when you encounter it in a debugger. In the example code,
the switch statement would look something like this:
|