The history of calling conventions, part 4: ia64

Frederik Slijkerman says:

January 13, 2004 at 11:20 am

What is the size of the ‘red zone’ under Win32? I’ve often used this in assembly functions without even considering that it might be illegal.

Raymond Chen says:

January 13, 2004 at 11:24 am

The "red zone" exists only on ia64. Don’t try it on any other platform or you’ll corrupt the stack!

Mike says:

January 13, 2004 at 11:55 am

What is the "red zone?"

Raymond Chen says:

January 13, 2004 at 12:08 pm

"One curious detail of the stack convention is that the first 16 bytes on the stack (the first two quadwords) are always scratch. (Peter Lund calls it a "red zone".)"

Simply Patrick says:

January 13, 2004 at 1:19 pm

??????,??????????? ?????????? calling convention: The history of calling conventions, part 1 The history of calling conventions, part 2 The history of calling conventions, part 3 The history of calling conventions, part 4: ia64 Why do member functions need to be…

Frederik Slijkerman says:

January 13, 2004 at 1:52 pm

Raymond, just to make sure we’re talking about the same thing. Is this illegal:

function entry:

push ebx

push esi

mov [esp-4], eax ; save eax temporarily

; do some stuff

mov eax, [esp-4]

; do some more stuff

pop esi

pop ebx

I believe that this should work as long as Windows saves more of the stack than just up to and including the stack pointer when doing context switches. Am I wrong?

Raymond Chen says:

January 13, 2004 at 1:56 pm

Accessing memory at negative offsets from ESP is illegal. (The "red zone" for ia64 lives at positive offsets relative to ESP.)

If an exception is raised in "do some stuff", you will likely find that your secret hiding place for EAX got overwritten by the exception handler.

Frederik Slijkerman says:

January 13, 2004 at 2:23 pm

I see. Of course I wouldn’t expect an exception handler (or any function call for that matter) to preserve this space, but neither would I set up an exception handler in an assembly function like this, so that hardly matters.

But, before I rush off to rewrite all my assembler stuff ;-), does Windows currently preserve an area below the stack pointer when doing context switches (which is the only problem I can think of)?

Raymond Chen says:

January 13, 2004 at 2:34 pm

Perhaps you won’t set up an exception handler, but your caller might have one, and the caller might decide to fix the exception and then return EXCEPTION_CONTINUE_EXECUTION. (For example, accessing [esp-4] might trigger a guard page exception, which is handled by kernel.) Execution then resumes and your stack is corrupted.

Context is processor state (registers, memory map), not memory values. A context switch changes the processor’s view of the world, but the world doesn’t change.

asdf says:

January 13, 2004 at 2:46 pm

If a function has a float parameter, is that one still passed via the integer registers too? (yeah I know I really should read the itanium manual)

David Dunham says:

January 13, 2004 at 2:53 pm

I didn’t think it was boring. (BTW, the PPC, assuming you mean PowerPC, *is* a RISC machine.)

Raymond Chen says:

January 13, 2004 at 2:55 pm

All architectures have separate rules for floats, which I have been ignoring throughout this series since they aren’t really relevant to my point. (I have a point?)

When I said, "It’s common on RISC machines. I believe the PPC used it, too." I meant "It’s common on RISC machines. For example, the PPC is a RISC machine and it uses this method too." Sorry about that.

Phaeron says:

January 13, 2004 at 10:39 pm

If you’re really crazy you can write Win32 x86 assembly code with a "4GB red zone," by temporarily recycling ESP as an eighth general purpose register for an inner loop that doesn’t trip exceptions or access the stack. You can even stash ESP in the structured exception handling chain to make the routine reentrant. Uh, not that I’ve ever written code that did this….

Raymond Chen says:

January 13, 2004 at 10:47 pm

Yup, I’ve seen people do this (use ESP as a general purpose register) – you’re playing with fire with this trick, since (as noted) the slightest mis-step and you’re toast. I’ve only seen it in intense graphics code which is trying to squeak that last cycle out of an image processing algorithm. Not for the faint of heart!

Frederik Slijkerman says:

January 14, 2004 at 2:34 am

Thanks for your comments, Raymond. I’ve used the [esp-x] trick in an arbitrary precision floating-point library that I’ve developed that needed to be as fast as possible. So using EBP as a general purpose register was important here.

When I get around to it, I will correct the code to adjust ESP before storing temp variables so they’re stored in a ‘safe’ area.

Henk Devos says:

January 14, 2004 at 10:08 am

"This has the neat side-effect that a buffer overflow of a stack variable cannot overwrite a return address since the return address isn’t kept on the stack in the first place."

So this means the end of buffer overflow attacks?

I have wondered for a long time while the stack is still upside down nowadays. In the olden days this made sense: the data area and the stack area grow towards each other. But with today’s virtual memory i don’t see any sense in this anymore.

If i remember it correctly, the intel processers now have a flag to indicate the direction of the stack.

Why has this not been changed yet?

I know this would cause some compatibility issues, but wouldn’t it be possible to reverse the stack direction only for new programs?

Raymond Chen says:

January 14, 2004 at 10:31 am

"some compatibility issues" is quite an understatement.

Stack reversal requires that ALL code in the program (including DLLs that may be outside your control) be aware of the reversal. Changing the stack direction is a fundamental change to the ABI, since parameters are now at positive offsets, the "sub esp, nn" needs to change to "add esp, nn" etc.

This means among other things that

1. there would need to be two copies of every DLL in the OS, one compiled with stack-up (for stack-up programs to use) and one with stack-down (for stack-down programs to use).

2. you couldn’t inject a stack-up DLL into a stack-down process or vice versa

3. a program that hosts plug-ins (like Explorer or IE) would have to choose between being stack-up (new style – but old shell extensions will no longer work) or stack-down (old style – new shell extensions will not work), and once it chose it would be restricted to shell extensions that were compiled with a compatible stack direction.

4. I can’t actually find this "stack-up" bit in my Intel docs.

Henk Devos says:

January 14, 2004 at 4:42 pm

These problems could be easily overcome.

I’m sure you know how intel code is executed on Alpha machines by converting on the fly, and then caching the resulting code.

It would be easy to do the same.

However, the real problem is: Is it worth the effort? The answer to that is probably "no".

I just looked it up, the bit i remembered from long ago is the "expansion direction" bit in a segment descriptor (bit 2). This also means that a solution could be switching to a different stack when changing between stack up and stack down. This could be implemented in the memory manager by putting them in a different segment and generating a page fault that triggers a switch.

Just an idea, i know it will never happen. But this could have solved the buffer overflow problem long ago.

But it’s also very well possible that i misunderstood the meaning of this bit completely.

Raymond Chen says:

January 14, 2004 at 4:52 pm

Um, expand-down and expand-up selectors are irrelevant here since Win32 is flat model, not segmented. They do not affect the meaning of the "push" instruction. They indicate whether the limit value is treated as a lower limit or an upper limit.

E.g., if you specify that selector 013F has a limit of 0x10000 and it is expand-up, then valid addresses are 013F:00000000 through 013F:00010000. Whereas if you mark it as expand-down, valid addresses are 013f:00010000 through 013F:FFFFFFFF.

But the "push" instruction always decrements ESP regardless of whether SS is an expand-up or expand-down selector.

Recompiling code on the fly doesn’t help. The actual stack layout changed. Consider the following code fragment:

push eax

mov eax, esp

mov eax, [eax+4]

mov eax, [eax+4]

this would have to be translated into

add esp, 4 ! mov [esp], eax

mov eax, esp

mov eax, [eax-4]

mov eax, [eax+4]

Notice that the last two instructions – even though completely identical – had to be translated differently, because in the first case, eax points into the stack in an attempt to walk "up" it, but in the second case, eax is now a pointer to a structure (not a stack walk) so no inversion occurs.

So any converter would have to figure out whether any particular memory reference was an attempt to walk up the stack or whether it is just a structure member access. This is semantic information that is not available in raw binaries.

And all this to accomplish what?

Henk Devos says:

January 15, 2004 at 12:10 am

Ok you’re right i got this all wrong. Sorry for that. I had noticed this bit long ago when ni was more involved in low level programming, and the assumption that it would actually influence instructions like push and ret was most likely wishful thinking.

Florian says:

January 15, 2004 at 9:53 pm

How does this calling convention treat variadic functions?

Raymond Chen says:

January 15, 2004 at 11:42 pm

I didn’t mention that the Win32 calling convention for ia64 passes only the first eight parameters in registers; the rest go on the stack. If a function is variadic, you call it like a normal function, but the function itself spills the first eight input registers (r32 through r39) onto the stack next to any possible parameters 9 and upwards, and then it treats the parameters as one giant array. This spilling needs to be done carefully to avoid a problem that I will discuss on Monday.

Thiago says:

January 23, 2004 at 9:32 am

As for floating point parameters, you should go read the Itanium manuals. They explain a lot better than I can.

Basically, when you have a floating-point parameter, the calling convention specifies that you should use registers f8, f9, … till f16, in order. At the same time, a "hole" is left in the integer registers.

So, if you have a function with 4 parameters, in which the third is a floating-point one, parameters 1, 2 and 4 will be in registers out0, out1 and out3, whereas the third parameter will be in f8. out2 will be left with garbage.

However, if the compiler cannot be completely sure that the called function expects a floating-point parameter (see the other article on misdeclaring symbols), it’s supposed to pass the parameter in BOTH registers.

If, however, the 8-integer-register limit has been reached, then even floating points will go through the stack, even if there are free floating point registers.

PS: there are worse things possible, like having an 80-bit extended floating point have to be passed in integer registers, because it would require 2 of them.

Josh says:

January 31, 2004 at 6:19 pm

Making the stack go the other direction wouldn’t eliminate buffer overflow attacks anyway. When you call something like strcpy to fill a buffer on the stack, there’s a return address on either side of it.

Raymond Chen says:

January 31, 2004 at 6:26 pm

Excellent point, Josh. It isn’t the fact that the return address stack grows in the opposite direction that protects the ia64 from stack overflow return address smashing attacks. It’s the fact that the return addresses aren’t kept on the stack in the first place.

Peter Lund says:

February 2, 2004 at 9:14 am

Who says you need to stick to just one stack?

return addresses could go on one stack, small parameters/local variables (bools, chars, ints, pointers) could go on another, and big parameters/local variables (buffers/structs) could go on a third or be heap allocated.

(okay, this is sort of what the IA64 does with the hardware register stack – but you don’t need the IA64 to implement something like this)

Raymond Chen says:

February 2, 2004 at 9:44 pm

Peter: Yes, you could use multiple stacks but there are some serious problems with this.

1. Having to recompile ALL code to conform to this new stack scheme. You couldn’t mix an "oldstyle stack" caller with a "newstyle stack" callee. It would probably be unreasonable to ship a version of Windows that was 100% incompatible with the previous version of Windows… (See previous discussion with Henk.)

2. The paucity of registers on the x86 makes it a hard sell to lose one of its precious few registers as an "alternate stack register", much less TWO of them!

Josh Williams says:

March 10, 2004 at 9:40 pm

Flier's Sky says:

July 7, 2004 at 11:03 pm

The history of calling conventions

Paul Fallon's WebLog says:

August 16, 2004 at 6:52 am

The Old New Thing says:

September 15, 2004 at 9:42 am

The Old New Thing says:

September 15, 2004 at 3:37 pm

The Old New Thing says:

April 21, 2005 at 9:00 am

The Itanium has two stacks, so don’t assume that there’s only one.

Anuncie Aqui! says:

May 17, 2006 at 8:44 am

Not a kernel guy : Blog Archive : ?????????????? ???????? ???? Itanium (IA-64). says:

May 11, 2007 at 2:17 am

PingBack from http://blog.not-a-kernel-guy.com/2007/05/10/183

Date:	January 13, 2004 / year-entry #16
Tags:	history
Orig Link:	https://blogs.msdn.microsoft.com/oldnewthing/20040113-00/?p=41073
Comments:	35
Summary:	The ia-64 architecture (Itanium) and the AMD64 architecture (AMD64) are comparatively new, so it is unlikely that many of you have had to deal with their calling conventions, but I include them in this series because, who knows, you may end up buying one someday. Intel provides the Intel® Itanium® Architecture Software Developer's Manual which...