Date: | September 27, 2006 / year-entry #330 |
Tags: | code |
Orig Link: | https://blogs.msdn.microsoft.com/oldnewthing/20060927-07/?p=29563 |
Comments: | 81 |
Summary: | Often I'll see code that tries to "protect" against invalid pointer parameters. This is usually done by calling a function like IsBadWritePtr. But this is a bad idea. IsBadWritePtr should really be called CrashProgramRandomly. The documentation for the IsBadXxxPtr functions presents the technical reasons why, but I'm going to dig a little deeper. For one... |
Often I'll see code that tries to "protect" against invalid pointer parameters. This is usually done by calling a function like The documentation for the Alternatively, it's possible that your function was called by some code that intentionally passed a pointer to a guard page (or a "Yeah, but my code doesn't use guard pages or play games with And second, your program does use guard pages; you just don't realize it. The dynamic growth of the stack is performed via guard pages: Just past the last valid page on the stack is a guard page. When the stack grows into the guard page, a guard page exception is raised, which the default exception handler handles by committing a new stack page and setting the next page to be a guard page. (I suspect this design was chosen in order to avoid having to commit the entire memory necessary for all thread stacks. Since the default thread stack size is a megabyte, this would have meant that a program with ten threads would commit ten megabytes of memory, even though each thread probably uses only 24KB of that commitment. When you have a small pagefile or are running without a pagefile entirely, you don't want to waste 97% of your commit limit on unused stack memory.) "But what should I do, then, if somebody passes me a bad pointer?" You should crash. No, really. In the Win32 programming model, exceptions are truly exceptional. As a general rule, you shouldn't try to catch them. And even if you decide you want to catch them, you need to be very careful that you catch exactly what you want and no more. Trying to intercept the invalid pointer and returning an error code creates nondeterministic behavior. Where do invalid pointers come from? Typically they are caused by programming errors. Using memory after freeing it, using uninitialized memory, that sort of thing. Consequently, an invalid pointer might actually point to valid memory, if for example the heap page that used to contain the memory has not been decomitted, or if the uninitialized memory contains a value that when reinterpreted as a pointer just happens to be a pointer to memory that is valid right now. On the other hand, it might point to truly invalid memory. If you use In other words Many teams at Microsoft have rediscovered that There is a subtlety to this advice that you should just crash when given invalid input, which I'll take up next time. |
Comments (81)
Comments are closed. |
I’ll admit it: I used IsBadWritePtr(), but it was only in used when compiled in a debug build. Honest!
I’m glad you wrote this article. I knew that the stack grew dynamically, and I knew about guard pages, but I didn’t know the stack implementation used guard pages. Of course, since you’ve said it, it’s obvious. I simply never put two and two together.
On a side note, I’m sure there are people out there who will think that this is obvious and that Raymond is wasting his time writing about this. I know, because I’m guilty of thinking that, too, particularly when writes about user interface caveats. But we should do well to remember that everybody has different experience levels, and that lessons must be constantly repeated so that new people can learn them, too.
Thanks, Raymond.
(Sorry if this is a duplicate, but the server errored when I originally hit submit.)
I’ve always regarded IsBadXxxx as debugging functions, only relevant for debug builds to catch invalid arguments being passed into functions so that the program can panic earlier, not later. Surely that is an appropriate use?
I personally don’t use IsBadXXX functions.Although I have read in the msdn documentation and in microsoft presentations against using it, I haven’t read the justification anywhere. your explanation in the context of guard pages makes sense. Also I beleive IsBadXXX functions will be time consuming because they will have to validate whether memory is accessible, hence slowing down the process. That was one of the reasons I never used it.
you said:-
“On the other hand, it might point to truly invalid memory. If you use IsBadWritePtr to “validate” your pointers before writing to them, then in the case where it happens to point to memory that is valid, you end up corrupting memory (since the pointer is “valid” and you therefore decide to write to it).”
Regarding the above, anyway the program is going to crash on accessing a not to be accessed memory, then what is the issue with double checking it before using it.
@Nate: As Raymond suggests, it can have an effect on components within the system that expect to catch exceptions or on guard pages. Using IsBadWritePtr() in debug mode means it now operates differently than in release mode. Effectively, you’re no longer debugging the same application.
These IsBadXXXPtr functions certainly hint that one of the syscall implementers thought that checking for validity of the pointers might be useful in userspace, too.
In an application I’m currently developing, I use VirtualQueryEx to determine the accessibility of a given memory area (in a different process – that’s why I use the ‘Ex’ variant). Now….do the various flavours of IsBadXxxPtr do anything useful that couldn’t be done by reimplementing with VirtualQueryEx or VirtualQuery? Or do VirtualQuery/Ex have the same drawbacks as IsBadXxxPtr (wouldn’t have thought so). Or is there a big performance hit (seems possible)?
Actually I dimly remember reading a third hand account of a study (i.e. possibly entirely bogus) that found that if you silently succeeded on all "uncaught" read and write access violation type crashes (by returning 0 in the read case, and doing nothing in the write case) the resulting zombie application was in most cases able to continue stably.
Which got me wondering how true that was, but implementing a SEH crashhandler that can cause x86 code to continue regardless seems a little non trivial for my current skillset.
“And even if you decide you want to catch them, you need to be very careful that you catch exactly what you want and no more.”
This is something I agree with, as it has bitten us time and time again when we blindly catch everything we can. But we’re doing so just on the inside of a COM method boundary, where I’ve been lead to believe we have to squash exceptions. Is there a good way to handle the intersection of both these needs?
Wow, I’m impressed!
I have been using this function heavily, hoping to protect my "clients" (callers of our library) from their programming errors. As you realize, if our library crashed on the spot when garbage is fed to our functions, then we would have tremendous support headaches. Now they get back a well documented error code and go figure it out themselves.
What about modifying IsBadXxxPtr on the checked build, to popup or log an assert message when it encounters an address in a guard page? That would be a small consolation for us devs and some help to our QA departments.
Dimitris Staikos
So if IsBadXxxPtr is really that bad, how about making it throw a compiler warning?
It only takes a few #warning’s…
MSDN also makes no mention of any problems, how about putting a nice big bold warning in MSDN?
Developers won’t stop using “bad” functions unless it’s made absolutely clear that they are bad.
"the resulting zombie application was in most cases able to continue stably"
I think that kind of depends on what your definition of "stably" is. I also think that this is a generalization which could be dangerous. For example, what if you had a program that read a file, ignored any corruption, and then allowed the user to save their data? That could lead to a disaster that a crash would have avoided.
A couple of years ago, I was responsible for an API that had been originally implemented without any pointer checks. We got a new requirement that we check for NULL pointers and do something "reasonable" in those cases. In the debug version, NULL pointers caused assert failures, which is not a bad idea, but in the release version it quietly accepted these and returned an error. I have a hard time believing that a programmer who isn’t checking for "garbage in" is going to consistently check for "garbage out".
I feel obliged to point to:http://blogs.msdn.com/larryosterman/archive/2004/05/18/134471.aspx
"When the stack grows into the guard page, a guard page exception is raised,"
Technically, calling IsBad*Ptr() on the stack is safe because the guard page exceptions are handled entirely within the kernel. User-mode exception handlers cannot see or intercept them (contrary to what Larry Osterman’s article claims).
The MSDN page for IsBadWritePtr is here:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/memory/base/isbadwriteptr.asp.
It mentions multithreading problems, but no mention of the guard page problem.
It also doesn’t mention that the pointer being “valid” is no garantee that you can use it. Maybe a better definition of “valid” is needed.
My take is that the real problem is that the API itself has side effects that it really ought not to have. This is not the fault of developers who use the API, it is it fault of the implementation ("oh, you mean you wanted to USE it?").
What’s the point of an API called IsBadxxxPtr if calling it causes more problems then not calling it? I also disagree with the notion of "so what if it’s bad, let ‘er crash" ….there are too many systems that are mission critical where that’s not a valid option.
I do agree with the point that bugs should be caught early and often, and that the easiest way to find this is to let it crash, but once the system ships it is no longer a developer’s machine, it is a customer’s; maybe it can’t go on running, but that should be a policy decision, not a default, and a graceful shutdown is preferable to slamming into a wall.
Not sure what this says about either Unix or Windows, but I’m not aware of any Unix variant ever having introduced an equivalent of IsBad*Ptr(), despite it having had longer to do so.
DavidE, are you sure that wasn’t because in debug builds pointers are initialized to NULL but are not initialized at all in release builds?
Raymond Chen recently blogged that IsBadWritePtr, IsBadCodePtr, IsBadHugeReadPtr, IsBadHugeWritePtr,
Another practice that leads to the similar problems as described in the article is enclosing the part of the program in the
try {
} catch ( … ) { }
where the three dot catch just swallows everything, including any chance to detect where the problem happened. And they even nest such things.
Actually, most of the C++ programmers are clueless about the exceptions anyway, they just regularly shoot themselves in the foot.
What about device drivers?
Is it safe to use IsBadWritePtr() there to check if user supplied output buffer is valid and avoid writing to it if it isn’t?
>The dynamic growth of the stack is performed
>via guard pages
I have one serious question about this for you Raymond because I haven’t found any documented way of doing this properly:
I recenly had to allocate three int arrays of 2048 elements each on stack in a __declspec(naked) function written completely in assembler. I wrote prologue and epilogue code to handle alignment and allocation and the code was ok but it crashed on writing to said arrays on stack.
By debugging it I realized that I have only 4KB of commited stack space. It wasn’t easy to come to this conclusion even though I knew about /STACK:reserve,commit thing because one had actually to look at the stack segment limit to figure this out.
Now for the question — how to change this limit programmatically from assembler code? Is it possible at all to change it during runtime?
Here’s a story that I said
a long time ago that I was
going to tell you all, and then promptly…
There are times when crashing is simply not an option. The debugger shouldn’t cause my app to crash, no matter what I do. Neither should other debugging tools.
This is code in Kernel32 we’re talking about, not some 3rd part DLL. Microsoft controls the vertical & horizontal here. Why doesn’t IsBad*Ptr catch the STATUS_GUARD_PAGE and either note that we’re dealing with the stack and extend it, or reset the PAGE_GUARD setting?
Or else implement it by calling VirtualQuery to check what access is allowed, instead of using a condition handler. Yeah, that’ll probably slow down the function, but the current implementation is fatally flawed.
This explains why I was having a hard time finding out the problem:
“Dereferencing potentially invalid pointers can disable stack expansion in other threads. A thread exhausting its stack, when stack expansion has been disabled, results in the immediate termination of the parent process, with no pop-up error window or diagnostic information.”
So instead of crashing with access violation and showing me the address which triggered it the OS terminated the application I was testing “with no pop-up error window or diagnostic information.” Neat feature indeed.
Sorry if this is off-topic, but I couldn’t find an email address for you, so I’ll have to post this in a comment.
May I suggest a future topic in our “Why Windows does it that way” series? Specifically, why is it so difficult to go back & forth between domains.
Here’s the scenario: I get a new laptop at work. IT delivers it already join to the ofice domain. I selected workdomain.local from the login dropdown and log in.
That night I bring the laptop home, join the laptop to my network at home. (not problem — I’m the network admin at home). Select homedomain.local from the dropdown & login.
The next day, I bring it back to the office, and expect to just select workdomain.local from the dropdown and login — but it’s not in the dropdown! I’m now expected to REJOIN workdomain.local (an action which requires a special privalege on the domain) — even though workdomain.local is COMPLETELY UNAWARE that I ever left it. Why can’t I just log on it each domain (at separate times) after I’ve joined each?
Wine uses an exception:
@Nate: As Raymond suggests, it can have an effect on components within the system that expect to catch exceptions or on guard pages. Using IsBadWritePtr() in debug mode means it now operates differently than in release mode. Effectively, you’re no longer debugging the same application.
Igor-
There is one case where you are expected to eat all EXCEPTION_ACCESS_VIOLATION exceptions: in kernel mode when accessing user mode memory. An access violation raises the same exception in kernel mode as it does in user mode. An unhandled exception causes a bugcheck, so every time you touch user mode memory, it must be enclosed in an exception handler.
Note that before touching a user-supplied pointer, you must first ensure that the pointer in question actually resides in user memory by calling ProbeForRead/Write. Getting an access violation in kernel mode for the kernel’s address space results in an automatic bugcheck– no option to handle an exception.
The entire evilness of having a IsBadXxxPtr is that you don’t know where bad pointers (other than null) came from what they once represented, and what arbitrary things they represent now. If the bad pointer has an arbitrary value, it could just as easily point to data alloctated for something else, just waiting to be corrupted. Even if IsBadXxxPtr didn’t have bad side effects, it still can’t distinguish between unallocated pages and data belonging to something else. Pointers need to have a value of null or point to valid data. Anything else is a logic error.
Accessing user buffers from kernel mode:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/kernel_d/hh/Kernel_d/iputoput_c6e8f2ab-7282-4282-b149-63300a7d97d4.xml.asp
Handling exceptions in kernel mode:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/kernel_d/hh/Kernel_d/other_f7259c51-1557-42e6-8264-ac8dae0177e3.xml.asp
Mike, Nice to see the wine source.
Does someone with access to Windows source code want to show the corresponding algorithm or Is reverse engineering the way to go?
When I first saw these, my first instinct was to put them everywhere. Glad I didn’t go with that! :)
A lot of code uses this; even the directshow baseclasses use to. The code is still there, but it is ifdefed out, and all it says now is "The IsBadXXXX APIs are banned."
Are there any significant cases where IsBadXxxPtr() would actually work better than simple check for NULL?
Most of the time in my own code bad pointers are either NULL or point to deallocated memory.
Well, "deallocated" is not necessarily the same as "will cause an exception if you try to read or write it". Certainly not if you use malloc and free (although there are other issues with those on Windows, with its plethora of C runtimes). There may even be issues with the Heap* functions, I’m not sure.
Anyway, my point is, it may be possible to pass a deallocated pointer to IsBadReadPtr and get false back, even in cases where the pointer is invalid.
"In the Win32 programming model, exceptions are truly exceptional. As a general rule, you shouldn’t try to catch them. And even if you decide you want to catch them, you need to be very careful that you catch exactly what you want and no more."
YES.
I was working on a large "enterprise application" (as they are called these days) written in VC6. Somebody, somewhere did "catch (…)."
I learned a lesson that day. Never, ever, ever catch (…). When it crashed for me in my DLL, this exception handler caught it. My machine took many minutes to unwind the stack and I wound up in never-never land without much of a clue (I was also a newbie then).
JamesCurran: Being in two domains would be a fundamental security hole. Domain Admins would be completely unable to enforce any settings at all because anyone could just join their computer to a different domain and ‘hey presto’ they have unrestricted access to the entire box.
Back to the topic, what I don’t understand here is that if the IsBadXxxPtr functions are so flawed, why not just replace then with stub functions that just return 0 (or perhaps just do minimal NULL pointer checking). Document them as such in MSDN and just move on.
Hmmm – well, I’m just using VirtualQueryEx to determine what access there is to memory in a remote process – I actually package up memory read/write requests into a file mapping and do use proper IPC to ask the remote process to access memory. Maybe I should ask the remote process to do the access check as well.
I guess I shouldn’t mention that I’m using Jeffrey Richter’s DLL injection technique to get the remote end of the IPC link started….not dodgy at all :-)
"Well, "deallocated" is not necessarily the same as "will cause an exception if you try to read or write it"."
This was my point. Majority of time (in my experience) the bad pointer is either NULL or invalid in such way that IsBad*Ptr() won’t catch it.
This "catch (…) { }" phenomenon seems to be the rule rather than the exception. My three most recent projects has them all over the place–unfortunately.
Why the kernel doesn’t use his own stack to display an error message when it notices that application stack is corrupted?
I haven’t given it much thought so if someone has any ideas why that would be a bad thing to do then please elaborate on the subject.
Stack trace logic within custom unhandled exception handlers might be using these functions a lot.
Matt Pietrek’s old article (http://www.microsoft.com/msj/0497/hood/hood0497.aspx) uses IsBadReadPtr (http://www.microsoft.com/msj/0497/hood/hoodtextfigs.htm#fig1).
I’ve based my custom unhandled exception handler on the above article – and now I’m investigating alternatives.
With respect to using VirtualQuery, I’d have to look at a bit closer; but, is VirtualQuery guarenteed to return the same protection value on a guarded page after the STATUS_GUARD_PAGE_VIOLATION and the page is "loaded"?
e.g. if VirtualQuery returns PAGE_READWRITE|PAGE_GUARD for a page, will it then be just PAGE_READWRITE after you try and read the first byte?
@Jan: Even good programmers sometimes use functions that aren’t safe.
…of course VirtualQuery doesn’t get around the PAGE_NOACCESS (for Me/9x apps) problem if there is an exception handler waiting around for first access to that page in order to populate it.
We had a huge developement problem is IsBadReadPtr!
We had a very large application and team, and some of the key developpers would put IsBadxxxxPtr in almost every function they write, in the debug build (ASSERT()).
Here’s are the two problems!
1) the functions are extremely slow. So the debug build ended up being extremely slow. Slower than a normal debug build is expected to be.
The result is that programmer gave up thinking of the performance, since it’s was so different between ship and debug. At one point the software is so slow, you don’t realize you’ve just added a half-second delay to an operation.
It’s very important IMHO that the ship and debug build’s performance are not *completely* at each end of the spectrum, so that a programmer can tell while debugging if he’s just done a big boo boo that slows down some part by a factor of two.
2) The functions WERE USELESS. The devs were using this to check if structures or pointers passed in were valid ‘read’ or ‘write’ memory
But in normal application developement, almost everything in valid except a NULL pointer! These functions can’t tell you a stack buffer is not large enough, and a heap region can be entierly read/writable without being the right size – or even allocated!
I explained to these guys : it only checks if the pages are read/writable! A freed page on the heap will still pass that test! A malloc block too small will generally pass that test too! Plus pages are rounded to a certain increment (512 bytes is it?)
So what I did was #define these functions to only a NULL pointer check. The difference in performance was incredible and the team couldn’t believe the difference in productivity. (Example: file load time was something like 4 times faster.)
There was a huge flame war internally as the old gang lectured that we should never look at performance in debug, but they lost.
In the mean time, not calling the ‘real’ IsBadxxxPtr function in debug has never caused anyone to miss a bug or coding error. These functions do not have that power, it’s not what they do.
Wednesday, September 27, 2006 2:27 PM by Barry Tannenbaum
> Why doesn’t IsBad*Ptr catch the
> STATUS_GUARD_PAGE and either note that we’re
> dealing with the stack and extend it, or
> reset the PAGE_GUARD setting?
> Or else implement it by calling VirtualQuery
I second that.
> [Perhaps it could have been implemented
> differently originally, but that’s water
> under the bridge. What’s done is done and
> you can’t change it after the fact. -Raymond]
Um, OK, thank you for explaining why Microsoft so fiercely resists the idea of fixing bugs. But if this were 100% true then there wouldn’t have been any service packs at all.
Mr. Chen and Mr. Osterman gave excellent descriptions of this bug, and I thank both of you. But I really don’t see why it can’t be fixed. The only break with backwards compatibility is that applications which shouldn’t have crashed will crash less often. (Of course they won’t stop invalid applications from corrupt themselves, such as writing to memory which was freed but still on the heap.)
Wednesday, September 27, 2006 2:27 PM by Igor
> So instead of crashing with access violation
> and showing me the address which triggered
> it the OS terminated the application I was
> testing “with no pop-up error window or
> diagnostic information.”
>
> [You lost your stack. There’s nothing that
> can be done. Displaying an error window
> requires a stack. -Raymond]
So that’s why some crashes of Visual Studio 2005 don’t result in offers to send crash dumps to Microsoft. Maybe an emergency exception stack would be a useful thing to add after all.
> [Windows used to have such an “emergency
> error dialog”. Remember the UAE? People
> hated it. -Raymond]
What does UAE stand for? Is there a link to explain it to people who were doing 32-bit programming before Windows was 32-bits (or before Microsoft existed)?
If it’s the black and white screens of death that Windows 95 sometimes displayed, then people hated it because Windows wouldn’t even say which application was dying, and in fact all applications were dying when only one should have been dying, and people lost all of their work in progress. People hated those for the same reason we hate BSODs in Vista.
"A" wrote: "Technically, calling IsBad*Ptr() on the stack is safe because the guard page exceptions are handled entirely within the kernel."
Yes and no. What seems to happen is:
One of your threads uses IsBadWritePtr to write to a stack guard page. The kernel catches the STATUS_GUARD_PAGE_VIOLATION exception and checks if you are writing to the guard page *for the current thread*. If so, the kernel extends the stack *for the current thread*.
So if you write to the guard page for a different thread’s stack, the kernel doesn’t catch the STATUS_GUARD_PAGE_VIOLATION exception and it appears in user-mode, where IsBadWritePtr discards it. When that other thread writes beyond its now useless guard page you get an access violation (STATUS_ACCESS_VIOLATION).
If you want to feel this from a user perspective then download:
http://www.mozilla-x86-64.com/firefox/firefox-3.0a1.en-US.win64.20060518.zip
Unpack and run regxpcom.exe then Firefox.exe on Windows XP Professional x64 Edition patched up to date. Into search box type UAE and hit enter. Poof, the Firefox will just vanish. No clues, no way to send a debug report, no trace of it even being started. IMO that is extremely both user and developer unfriendly.
<WiseRaymond Joking=";-)">
"But what should I do, then, if somebody passes me a bad pointer?"
You should crash.
</WiseRaymond>
+1 +1 +1 +1. This should be taught at school and we should have a "tax" as Raymond puts it that deals with crashes in production code (pdb’s for release builds, map files, whatever is needed to get close to the error after the fact in order to fix it).
Trying to deal with bugs using functions such as IsBad… is a cheapo get-out.
catch (…) does weird things on my system when you open a common file dialog… instead of listing the files instantly they take seconds to appear…
Oh, and the point of the catch (…) is to stop buggy explorer extensions from crashing the app. Unless anyone has any better ideas?
“Just look at all the apps that are actually relying on IsBadXxxPtr doing something”
Yes, but by your analysis above this is already broken, the very act of calling IsBadXxxPtr means their code is already not going to do what it is expected to do. It’s going to crash randomly.
Normally, I agree with you that it’s worth retaining a buggy function if code that will be dependant upon that bug is in the wild, but that’s not the case here, is it? As it stands, if you call IsBadXxxPtr, your application will crash – so wouldn’t it be better to either make the function work or at the very least make it crash more cleanly by acting as if it were never called?
This is exacly was I expected. The OS is incapable of determining which of my process’ pointers are invalid. I guess is very slow too.
Is there any nonbuggy BOOL Will_My_Process_Crash_If_I_Touch_What_This_Pointer_Points_To() ?
Just for completeness, I’ll answer this: "DavidE, are you sure that wasn’t because in debug builds pointers are initialized to NULL but are not initialized at all in release builds?"
Yes, I’m sure that’s not the reason. The reason was that my team spent a couple of days going into every API and adding checks for NULL pointers. I still felt that this was the wrong thing to do, since the people using the API should have been checking for NULL pointers and handling them correctly. All we did was add more code to slow things down (OK, so it wasn’t that much).
The funny thing was that we later started getting bug reports from the testers that their tests would crash in asserts when they sent in NULL pointers. Apparently, they preferred to use the debug version of the DLL…
Raymond thank you very much for trying to explain things to me, here is my original problem with stack exception:
push ebp
mov ebp, esp
and esp, 0FFFFFFC0h
sub esp, 6000h ; more than the commited 4KB
… ; do some other stuff
mov [esp], eax ; Poof! Your code is gone.
When I executed it in a debugger I got the following message:
The instruction at 0x402989 referenced memory at 0x129E40. The memory could not be written (0x00402989 -> 00129E40).
Why the debugger can show that message and the OS can’t? That is what I still don’t understand.
> But there’s also no sense in the DLL proceeding halfway through its
> normal operations and passing bad parameters down the call stack
> to where they’ll eventually cause an undebuggable crash.
The sort of "problem" that IsBad*Ptr might catch are perfectly debuggable, since you’d have a nice call stack to walk back up. It’s the invalid pointers that IsBat*Prt WOULDN’T catch that cause "undebuggable" problems, since you can still write to it, but you’ll just be corrupting your memory.
So IsBad*Ptr really has no good points, but only bad points. "Fixing" it would be pointless, since even if it worked, it doesn’t do anything for you.
Just don’t use it, and your app will be fine.
> If you want to play word games, "changing" bad APIs instead of "fixing"
> them, then I’ll agree that "changing" can be a move from the frying pan
> to the fire. But I seconded a suggestion for "fixing".
How can you "fix" something without "changing" it?
Anyway, I still learned a few things here:
I was getting ACCESS_VIOLATION exception because I wasn’t probing the stack pages in right order — I touched the pages before the GUARD_PAGE so the OS wasn’t able to grow the stack on demand.
What I am not sure is whether this really counts as a non-reportable exception?
I haven’t really exhausted the stack, I was still within its reserved part.
What is really interesting is why there is only one GUARD_PAGE at the TOS?
What is the reason that we don’t have each reserved stack page set as the GUARD_PAGE so that you can commit them in any order?
About warning exception, you are right, there is an _XCPT_UNABLE_TO_GROW_STACK which you can get as a warning but in this case I wasn’t getting it because I got access violation.
GregM, yes, I meant caller, thanks for correcting me.
I believe that it is the ZwTerminateProcess() that kills it after "stack exhausted" exception is caught.
But what confuses me is why the process gets terminated without error message in response to "access violation" exception which happens on the stack which is actually not exhausted?
>The only way to check for invalid pointers
>is to (1) check for NULL, and then (2) try
>indirecting through the pointer, and crash
>if it’s invalid.
Then why even bother checking for NULL? Why not just access it and crash if it is invalid?
IMO, it should really be the responsibility of the _callee_ to check what they are passing to your function.
If someone calling your function writes their code like this:
BOOL s;
char *p = malloc(1024);
s = YourFunc(p);
if (s) {
OtherFunc(p);
}
Now you have to check p in your func and to signal an error:
BOOL YourFunc(char *p)
{
if (p == NULL) {
return FALSE;
}
… // do stuff with p
return TRUE;
}
So you are fine, your code doesn’t crash and it signals an error. But if they are really lousy programmers they won’t check the return value of YourFunc() either:
char *p = malloc(1024);
YourFunc(p);
OtherFunc(p);
So instead of crashing in YourFunc() immediately after getting a NULL pointer, they will crash in OtherFunc() which would be harder to debug because it is further away from the cause.
Oh the other hand if they bother to check everything you get redundant NULL and error checks first in their code:
BOOL s;
char *p = malloc(260);
if (p != NULL) {
s = YourFunc(p);
if (s) {
OtherFunc(p);
}
} else {
… // handle the error
}
And then in your code:
BOOL YourFunc(char *p)
{
if (p == NULL) {
return FALSE;
}
… // do stuff with p
return TRUE;
}
Here is what it should really look like:
char *p = malloc(260);
if (p != NULL) {
YourFunc(p);
OtherFunc(p);
} else {
… // handle the error
}
And:
void YourFunc(char *p)
{
… // do stuff with p
}
As I said you have to draw the line somewhere.
Thursday, September 28, 2006 5:36 AM by Dean Harding
>> If you want to play word games, "changing"
>> bad APIs instead of "fixing" them, then I’ll
>> agree that "changing" can be a move from the
>> frying pan to the fire. But I seconded a
>> suggestion for "fixing".
>
> How can you "fix" something without
> "changing" it?
Oh neat, two people playing word games now. To repeat, I seconded a suggestion for "fixing". When one person misquoted me as saying "changing", the reason I pointed that out is that it is indeed possible to "change" something without "fixing" it, and those are the kinds of "changes" which I did not second.
Igor, you mean *caller*, not *callee*. The callee is the function being called by the caller.
"Why the debugger can show that message and the OS can’t?" – Igor"
"Because the debugger is a separate process that hasn’t crashed. -Raymond"
Isn’t the OS also a separate process that hasn’t crashed, unless this is a blue screen? I guess what would clarify things is what exactly kills the process in this situation? Is it something in the dying process that kills it, or is it the OS itself that kills it?
Tom wrote:
> @Nate: As Raymond suggests, it can have an
> effect on components within the system that
> expect to catch exceptions or on guard pages.
If it is a pre-condition for my function that it is not passed bad pointers, it’s reasonable for me to assert(!IsBadReadPtr(ptr)) at the start. IsBadReadPtr only causes problems if the pointer is bad, and in that case we already have a bug so it doesn’t matter if behaviour afterwards changes.
> Effectively, you’re no longer debugging the
> same application.
By this argument, DEBUG builds should never be used.
BryanK wrote:
> Now if that other thread needs to grow its
> stack past the guard page, it’ll just
> crash. (As will your whole process, I
> think.) You have screwed up another
> (possibly completely unrelated) thread
> by using IsBadReadPtr.
That’s not a problem, because the assert will terminate the app anyway. We are just making it crash earlier and with an indication of where the problem lies.
> If some other random thread is crashing at
> some other random time, the caller has
> no way to know (a) that it crashed, and
> (b) why.
The fact that one thread is sitting in a failing assert should be a clue. Ignore assertion failures at your peril.
But the caller can’t catch a failed assert like they can catch an exception. All they get is a dialog "assertion failed: IsBadReadPtr(x)", which really doesn’t help narrow anything down. Yes, the process dies (and I must have gotten the return value semantics of IsBadReadPtr messed up when I originally wrote that), but there’s very little debugging information available. At least if you use the MS compiler’s assert.
But with an exception, they can get a stack trace under a debugger (or from the drwtsn32.log file), and with symbols, they can see where in the stack control left their code. That’s actually useful information.
Throwing up an "assertion failed" dialog and dying is marginally better than just dying, but it’s still not as good as giving information on where in the code the failure happened.
(And even if failed-assert dialogs show line number information now — I don’t believe they do, but I’m not sure — that still doesn’t help much. That’s just the line number of the assert call, in the library’s code. What the programmer needs to know is which of the library’s APIs they called from their code, and when. The failed-assert dialog would have to show the entire stack trace to be of any help.)
Dave:
> IsBadReadPtr only causes problems if the pointer is bad, and in that case we already have a bug so it doesn’t matter if behaviour afterwards changes.
No.
The problem is, if someone passes you a bad pointer that’s pointing at another thread’s stack guard page, and you pass that to IsBadReadPtr, you’ve just made that other thread’s stack non-growable (because the PAGE_GUARD bit is removed), where before it was growable.
Now if that other thread needs to grow its stack past the guard page, it’ll just crash. (As will your whole process, I think.) You have screwed up another (possibly completely unrelated) thread by using IsBadReadPtr.
Whereas if you’d just used the pointer and crashed, the caller would at least have had a *chance* to catch the AV. If some other random thread is crashing at some other random time, the caller has no way to know (a) that it crashed, and (b) why.
> By this argument, DEBUG builds should never be used.
I would not agree with that either. I would agree if you said "by this argument, you should not change your code’s behavior in a DEBUG build", though. Make the differences as small as possible.
Antti Huovilainen wrote:
> Are there any significant cases where
> IsBadXxxPtr() would actually work better
> than simple check for NULL?
A fairly common case is when someone passes an offset from a null pointer.
struct Data { int a, b };
Data *p = NULL;
// …
int *pi = &p->b;
This will probably set pi to 0x00000004, which is not NULL, is invalid, and should be caught by IsBadXxxPtr.
Raymond, I’m confused by this paragraph:
“Swallowing the exception in IsBadXxxPtr means that the caller’s exception handler doesn’t get a chance to run, which means that your code rejected a pointer that would actually have been okay, if only you had let the exception handler do its thing.”
Think about that for a minute. Since the caller’s exception handler has to run for the pointer to be valid INSIDE the callee, that means that the caller’s exception handler must be set up to handle and stop the exception without the callee even knowing an exception was raised (which is indeed how guard page exceptions are handled). So exception swallowing is NOT the problem!
As a couple other commenters have noted, the real problem is that these guard page handlers are tied to a specific thread. If you access the guard page on a different thread, the handler won’t get called, and problems will arise.
So it seems that in a single threaded situation (like the scenario mentioned in that paragraph), IsBadXxxPtr will not cause guard page problems.
Am I missing something?
I agree with the arguments against using IsBadxxxPtr. It just always seemed like a good way to check pre-conditions.
I see MFC 8 (Visual Studio 2005) changed the AfxIsValidAddress, AfxIsValidString to no longer make the IsBadReadPtr/IsBadWritePtr calls. They now simply check for a null pointer. No string length checks, etc.
I haven’t checked if the VS2003 SP1 made the same changes, but I guess I would expect that it did.
I’m surprised that these functions can’t be removed from the API. Just put an #if (!Vista) before them. Any program written for XP will not only continue to run, but can still be compiled. Only programs written or updated for Vista will be impacted.
That’s a good thing for Microsoft too; it improves applications build for Vista only, and thus is an upgrade reason for consumers.
BryanK wrote@
They shouldn’t be trying to catch it. They should let the program terminate.
That depends on which assert you use and what your debugging tools are. Would it help if I wrote ASSERT instead of assert? With my IDE, an assertion failure gives the option to halt the program at the offending line, with the stack and variables available for inspection. I can switch to other threads and inspect them, too.
Remember we are talking about DEBUG builds, which normally means it is being run by a programmer from an IDE like Visual Studio. An end-user would be using a RELEASE build and not have the IsBadReadPtr check at all.
Er, wait, the above comment makes no sense. If you get a bad pointer in a release build, you’ll just crash, same as if you never used IsBad*Ptr.
So never mind. (And sorry about the duplicate comment; it made sense to me at the time…)
But I still don’t see the point. You won’t necessarily catch all bad pointers (see the race condition), and you’ll catch some pointers as bad that aren’t actually.
> from an IDE
So you expect you’ll *never* get an invalid pointer when your release build is running?
This seems way too optimistic to me.
The assertion won’t catch all bad pointers even without race conditions. For example, some might point to memory which has been free()d (hence shouldn’t be used) but which is still part of the crt heap (hence owned by the app as far as IsBadXxxPtr is concerned). It can still be useful to catch at least some errors, eg of the 0x00000008 form I mentioned earlier, or due to uninitialised pointer variables etc.
The false positives are a more serious issue. The real problem with IsBadXxxPtr for assert purposes is not that it has unwanted side effects on the (potentially random) memory page, but that it lies: it can return true for memory which can, in fact, be read or written to.
However, this is unlikely. It’s unlikely to happen with stack frames, because the memory should come from the stack frame of routine which is higher up the stack, and that memory ought to be committed by the time our assert is hit. So we’re left with the people playing games with PAGE_NOACCESS. Don’t use the assert if you want to play those games.
Generally it would be better to just use the memory. The only times I’d even consider adding the assert are (a) as documentation; and (b) when the routine only used the memory some of the time, and I wanted the caller to make it available all of the time. The same applies to ASSERT( p != NULL ). Assertions for things which the hardware will detect anyway are often unnecessary clutter.
Not really. It says, "Dereferencing potentially invalid pointers can disable stack expansion in other threads" but it doesn’t say that IsBadXxxPtr dereferences the pointer. In fact I’d read it as saying the opposite: that you should use IsBadXxxPtr in order to avoid the stack expansion problem, because IsBadXxxPtr tells you whether the dereference would succeed without actually dereferencing it, by accessing the memory manager hardware directly. That’s what I’d expect, and the documentation doesn’t disillusion me.
re: IsBadReadPtr, IsBadWritePtr APIが禁止に – Windows Vista
PingBack from http://computinglife.wordpress.com/2007/06/23/how-do-you-chose-your-apis/