Exposing undefined behavior when trying to port code to another platform

Date:December 22, 2017 / year-entry #280
Tags:code
Orig Link:https://blogs.msdn.microsoft.com/oldnewthing/20171222-00/?p=97635
Comments:    38
Summary:Oops, that wasn't allowed after all.

A developer was porting some old Visual Studio code to another platform and found that the code was behavior strangely. Here's a simplified version of the code:

class refarray
{
public:
    refarray(int length)
    {
        m_array = new int*[length];
        for (int i = 0; i < length; i++) {
            m_array[i] = NULL;
        }
    }

    int& operator[](int i)
    {
        return *m_array[i];
    }

    ... other members not relevant here...

private:
    int** m_array;
};

This class is an array of references to integers. Each slot starts out uninitialized, but you can use methods (not shown here) to make each slot point to a particular integer, and you use the array indexing operator to access the referenced integer. (You can tell this is old code because it's not using unique_ptr or reference_wrapper or nullptr.)

Here's some typical code that didn't work:

refarray frameCounts(NUM_RENDERERS);

void refresh(int* frameCount)
{
    .. a bunch of refresh code ..
    if (frameCount != NULL) ++*frameCount;
}

void refresh_and_count(int i)
{
    refresh(&frameCounts[i]);
}

The refresh function performs a refresh and if the pointer is non-null, it assumes it's a frame count and increments it. The refresh_and_count function uses the refresh function to perform an update and then increment the optional frame counter stored in the frameCounts object.

The developer found that if the slot was not set, the code crashed with a null pointer access violation at the ++*frameCount, despite the code explicitly checking if (frameCount != NULL) immediately prior.

Further investigation showed that the code worked fine with optimization disabled, but once they started turning on optimizations, the null pointer check stopped working.

The developer fell into the trap of the null reference, or more generally, the fact that undefined behavior can have strange effects.

In the C++ language, there is no such thing as a null reference. All references are to valid objects. The expression frameCounts[i] produces a reference, and therefore the expression &frameCounts[i] can never legally produce a null pointer. The compiler optimized out the null test because it could prove that the resulting pointer could never legally be null.

The code worked on the very old of Visual Studio because very old Visual Studio compilers did not implement this optimization. They generated the pointer and redundantly tested it against null, even though the only way to generate such a null pointer was to break one of the rules of the language.

The new compiler on that other platform took advantage of the optimization: After one level of inlining, the compiler noticed that the pointer could not be null, so it removed the test.

The fix is to repair the code so it doesn't generate null references.

I know that people will complain that the compiler should not be removing reundant tests, because the person who wrote the code presumably wrote the redundant tests for a reason. Or at least if the compiler removed the redundant test, it should emit a warning: "Removing provably false test."

But on the other hand, surely you would want the compiler to optimize out the test when you call it like this:

int counter;
void something()
{
    refresh(&counter);
}

This is another case where the pointer passed to refresh is provably non-null. Do you want the compiler to generate the test anyway? If not, then it would presumably generate the "Removing provably false test" warning. Your code would probably generate tons of instances of this warning, and none of your options look appealing.

One option is to duplicate the refresh function into one version that supports a null pointer (and performs the test), and another version that requires a non-null pointer (and doesn't perform the test). This sort of change can quickly infect your entire code, because callers of refresh might in turn need to split into two versions, and pretty soon you have two versions of half of your program.

The other option is to suppress the warning.

In practice, you're probably going to go for the second option.

But there's clearly no point in the compiler team implementing a warning that everybody suppresses.


Comments (45)
  1. ErikF says:

    Diagnostics that only show up at higher warning levels seems alright to me: There are already lots of those now (ANSI compliance warnings come to mind.) If the warning is usually going to be suppressed *and* is annoying or difficult to implement, then I agree that it might not hit the -100 test.

  2. Other compilers have been performing this optimization for a while as we can see from the blog post “Finding Undefined Behavior Bugs by Finding Dead Code” which talks about the infamous Linux null pointer check removal: https://blog.regehr.org/archives/970 it is interesting to see the MSVC is finally starting to exploit some of these undefined behavior as well.

    The assumption that there is no undefined behavior can lead to plenty of interesting and non-intuitive optimizations such as turning an finite loop infinite: https://stackoverflow.com/q/32506643/1708801 I know a lot of people find aggressive loop optimization is be too aggressive: https://stackoverflow.com/questions/24296571/why-does-this-loop-produce-warning-iteration-3u-invokes-undefined-behavior-an?noredirect=1&lq=1#comment37567459_24297811

    I feel like the communities response here has been building better tools to catch these problems such as static analysis and dynamic analysis such as UBSan:https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html and ASan: https://clang.llvm.org/docs/AddressSanitizer.html . I feel like these tools make a large difference. For dynamic analysis though it assumes you have testing and good coverage which is usually problematic for legacy code.

    1. Ben Voigt says:

      I see nothing in Raymond’s post that says that new Visual C++ compilers optimize in the presence of UB. There’s a statement that (a) old Visual C++ compilers did not, and (b) an unnamed compiler on another platform does.

      1. You may be correct, I read the statement “very old Visual Studio compilers did not implement this optimization” as implying the newer versions did but I guess rereading it, it does seem ambiguous.

        1. I was also thinking about this blog post: https://blogs.msdn.microsoft.com/vcblog/2016/05/04/new-code-optimizer/ where they said visual C++ would start taking advantage of signed overflow undefined behavior and remembered a more general statement about exploiting undefined behavior.

    2. Dave says:

      Let me guess, the other compiler would be gcc? The compiler where the developers seem to delight in demonstrating how much smarter they are than you in terms of using valid but highly unlikely interpretations of the C standard to break your code? For example the fact that your code could be running on a one’s-complement machine like a CDC-6600 from 1965, and therefore the semantics of two’s-complement arithmetic don’t apply, so they can remove checks and break your code’s arithmetic because if you squint at the C standard just right, they’re allowed to do it.

    1. Peter Doubleday says:

      That is a really valuable link, particularly for those of us who think wwe are smart enough to know all the gotchas.

      At least, I thought I was (more or less) smart enough, but I now realise that I just got lucky when programming with C. And yes, I “took care” over integer overflow, dereferencing pointers, array bounds, etc, but I still think I got lucky.

      One interesting observation is that you don’t even need to change platforms to get bit. A simple update to the order in which your compiler of choice performs optimisations is quite capable of exposing a very nasty bug …

      1. smf says:

        >One interesting observation is that you don’t even need to change platforms to get bit. A simple update to the order in which your >compiler of choice performs optimisations is quite capable of exposing a very nasty bug …

        I use latest gcc/clang and msvc on a project which started as C before being ported to C++ and is supported on Windows/MacOS/Linux (it has been ported to practically all operating systems at one time or another).

        Keeping all of those compilers happy means you have to fix a lot of your code early.

  3. Somewhat related, there may be rare cases where you want the compiler to assume it is safe to dereference null pointers such as in embedded systems where 0 may be a valid address, gcc uses -fno-delete-null-pointer-checks: https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#index-fdelete-null-pointer-checks

    See this twitter thread for some interesting details: https://twitter.com/myrrlyn/status/940365445957279744

    This is an interesting since we see that what may be considered valid optimizations can get in the way of what is valid code in some architectures and the working around it really painful. clang does not respect -fno-delete-null-pointer-checks and so you either have to use volatile which works but not sure it will work in the future or do pointer laundering use __attribute__((optnone)): https://twitter.com/shafikyaghmour/status/940451354631290880 neither of which is documented but seem to work.

    It is a lot of hoops to jump through to work around the optimizer :-(

    1. MiiNiPaa says:

      In cases when 0 is a valid address, null pointer might be represented by another byte sequence. For example address 0xFFFFFFFF might be used to denote null pointer. In this case foo == 0 (or implementation-specific foo == reinterpret_cast(-1)) would check for null pointer (one with value 0xFFFFFFFF) , and foo == reinterpret_cast(0) would check for pointer with value 0x00.

      In short, pointer with integer representation 0 is not guaranteed to be null pointer, likewise void* foo = 0; does not guaranteed to give you pointer with integer representation of 0.

      1. Joshua says:

        Don’t use non-zero as NULL. You will be really upset when memset() or calloc() doesn’t set pointers to NULL. People don’t code defensively against this one anymore.

        Also, 0xFFFFFFFF might be a good address as well.

        1. MiiNiPaa says:

          Well, there are other approaches. Like storing all addresses offset by some value. This way null pointer 0x0 might have physical address 0xFFFFFF000 and to access physical address 0x0 you need pointer value 0x1000. Of course you will have to pay cost of single addition for each pointer dereference (without optimisations)…

      2. Cesar says:

        AFAIK, it’s mandated by POSIX that all-bits-zero be treated as a NULL pointer.

        1. Ben Voigt says:

          But this comment thread is specifically talking about embedded systems which have something located at address zero. For these systems, the POSIX rules do not apply, only the C (or C++) language specification.

      3. smf says:

        >In cases when 0 is a valid address, null pointer might be represented by another byte sequence.

        NULL (& nullptr when comparing to a pointer) you can’t now and even in the time when you could, nobody did.

        There are no invalid addresses, you have to pick one. Why not pick the one that

        1. everyone else has picked previously, so you have free compatibility.
        2. has fast checks built into the cpu (zero/non zero)

  4. M Hotchin says:

    Seems to me like a better place to check would be in operator[], with an ASSERT say. As you say, a NULL reference is illegal anyway, so only hitting the ASSERT in DEBUG seems like a reasonable thing. The devs *do* run their debug builds, right?

    1. Harry Johnston says:

      I suspect that the original programmer was deliberately returning null references, not realizing that they were illegal.

      I was kind of hoping Raymond would explain how the code should be corrected, though I suppose it depends too much on the context. Could you return a reference to a class that behaves as a nullable int?

  5. HiTechHiTouch says:

    I’d put in a strong vote for a warning message anytime optimization does something “unusual”.

    Yes, I may suppressed it 99% of the time, but I occasionally do run with every/ message enabled and wade through the listing looking for gold. More than once this has lead me to a problem I would have blown off as “doesn’t occur” without the compiler’s help in understanding my code.

  6. Alex Guteniev says:

    Warnings about omitted null checks are really impractical if you also consider multiple inheritance.

    Compiler has to adjust pointer when casting from derived to non-first base or vise versa.
    This adjustment should be skipped if pointer is null.

    So there are a lot of such implicit null pointer checks.

    They can be skipped, however, if you cast a reference or for “this” when you call base class method.
    Most compiler would skip checks for references, for “this” the checks would be skipped in all the compilers.

    Having warnings would really make a lot of noise, you won’t be digging thru them anyway.

    A warning that is really useful here is warning about indirected null pointer. Compilers typically don’t make them, unless such indirections directly visible, but most static analysis tools would make such warning.

  7. I didn’t understand the provability part.

    1. Peter Doubleday says:

      I think the “provably not null” part is actually the whole point of the post. I’m sure I’ll get this wrong, but here goes.

      To take the second case first, it’s clear that passing the address of counter will not result in dereferencing a null pointer, because counter is declared in global space/on the stack. Therefore the address is provably not null. In which case any warning at the point of invoking refresh will be otiose.

      … which is probably not the part of “provability” that you take issue with. So, returning to the first part of the post, we note that the “provability” only occurs with optimisation switched on. And, critically, it shows up, in Raymond’s words, “with one level of inlining.” Actually I think it would show up with more levels, but it’s far more obvious when we limit inlining to a single level.

      I’m not sure, but I suspect the single level here inlines the refresh function. (See “one option … another option.”) Now, imagine you are (one particular optimisation phase of!) the compiler. Imagine you are performing this optimisation. Do you have any other information? Sure you do! You’re passing the address of an int-reference that is returned by operator[]. Now, you can’t take the address of a null int-reference, because there’s no such thing as a null int-reference: that’s undefined behavior. So the compiler is free to deduce that it isn’t passing a null pointer. And since it’s inlining, it isn’t crossing a function boundary, and since it therefore isn’t changing scope, it is free to elide the null check, because that null check is “provably” unnecessary within the scope defined by the inline.

      The key here is that you need that single level of inlining to elide the scopes of the “caller” and the “called” functions. Within the scope of the combined, inlined, function, and this scope is all that the optimiser will see, it is provably the case that you will not get a null pointer, in which case why bother with the check?

      1. Peter Doubleday says:

        Assuming I’m mostly right about that (I don’t have to be completely right), the “provability” issue resolves to the following assertion:

        If any part of your program (including external libraries!) can possibly result in “undefined behavior,” then every single bit of your program can result in “undefined behavior.” And, frankly, because undefined behavior is undefined, it definitionally does not matter when that behavior manifests itself, and there is therefore nothing to be gained by tracking the precise line of code down.

        “Unfortunately,” this assertion is baked in to C (and C++). I use scare quotes, but I shouldn’t. C purposefully eschews all the run-time checks that Java, C#, etc, implement, because it was purposefully designed to run fast on very limited hardware. In this individual (C++) case, the compiler could warn about returning an int-reference from an array that might include a null … but that’s not really the point. It’s just shifting the blame for whatever undefined behavior ensues. There is literally an unbounded set of examples where the same behavior can be “optimised in,” because there’s a practical limit to how far the compiler can go when trying to detect the original source of the problem.

        In this case it’s the design error of initializing all the elements of the array to the sentinel NULL, and then not performing the relevant check within the class. But it could be just about anything. Imagine that the class allows a caller to mutate the elements of the array. Boom! Same issue. Try tracking that one down. Casts, volatile variables, signed int overflows … not in this case, but in general, the overhead would be enormous, and not really detectable across library boundaries, and generally to no good purpose at all.

        So, you’re left with C, which is (almost) a practical necessity when writing operating systems. Or C++ with STL and RAII, which mitigates a lot of these issues (because you wouldn’t idiomatically return a reference to something; you’d either copy it or move it. Or more idiomatically still, use an iterator — at least, in this case, you could return end() if you hit a null).

        Or, for embedded systems, I am increasingly coming to believe that C is a nightmare both for robustness and for security (because, again, “undefined behavior”). I would tentatively suggest Rust as a better alternative. But the market says “C,” so we’re stuck with it.

        1. Joshua says:

          I watch Rust. The day may come when C waxes old and Rust supplants it.

          1. Peter Doubleday says:

            Rust never sleeps …

        2. smf says:

          >If any part of your program (including external libraries!) can possibly result in “undefined behavior,” then every single bit of
          > your program can result in “undefined behavior.” And, frankly, because undefined behavior is undefined, it definitionally does
          > not matter when that behavior manifests itself, and there is therefore nothing to be gained by tracking the precise line of code down.

          That is very passive aggressive. It’s also not consistently applied, C++ doesn’t (or didn’t) mandate 2’s complement so relying on bit patterns in numbers is (was) undefined behaviour.

          Pretty much all software would be converted to int main() { return 0; } if all undefined behaviour caused code to be removed.

          Clever people can be dumb.

        3. not important says:

          Hello –
          Thanks DoubleDay for taking the time to write these lengthly explanations – some programmers write comments on their Christmas vacation, and other programmers read comments on their Christmas vacation :)

          I am still confused (dense)…. Part of your comment expresses my confusion: “the compiler could warn about returning an int-reference from an array that might include a null … but that’s not really the point. ” Well, if returning an int-reference might include a null, doesn’t this means that the “provably” part of the compiler optimization is not “provable”? And that it was a mistake to remove the check? Still grappling with this one…

          Hope you are having a great holiday!

          1. Harry Johnston says:

            In this context, a statement (such as “pointer x will never be null”) is considered “provable” if the compiler can prove that if the program is valid, then the statement is true. In this particular case the programmer is obliged by the language specification to ensure that a null pointer is never returned via a reference, and if the programmer has failed to do so, then the program is invalid; this is traditionally considered to be the programmer’s fault, not the compiler’s fault.

            The key point I think is that C++ lacks the design features of more modern languages such as Rust, so it isn’t possible for the compiler to prove whether or not the program is valid. Nor can the compiler perform important optimizations without having to depend on the assumption that the program is in fact valid, so there’s an unavoidable trade-off between ease of programming and performance. I believe there is still a fairly broad consensus that the emphasis should mostly be on performance, though there is certainly room for debate and different compilers do vary in just how aggressively they optimize.

            Ideally, as Peter suggests, we would drop C and C++ entirely in favour of Rust or some other modern language(s).

      2. Thanks for taking the time to write all this. I appreciate it.

        You appear to have understood what I have: The code defines a scattered array (aptly named “refarray”) on heap, using the pointer logic, and perhaps memory allocation commands that are deliberately not shown here. The scattered array is memory-managed using a packed array of pointers. But also refarray redefines its own bracket operator so that a[1] would mean what normally *a[1] would. (Wasn’t it better to define a packed array on the heap with only one pointer to it?) Next, another set of referencing and dereferencing occurs when the array field is passed to the refresh function.

        What really tripped me up was the “provably” word. It appears this word is used in a localized Microsoft sense, not the global sense. (There are lots of terms that Microsoft has redefined for its own use, despite their different global meaning. My favorite example is “boot volume”, which usually ends up not containing the bootloader, and “system volume”, which usually does not contain the systemroot.) In a global sense, when you try to prove something using, e.g., Euclidean geometry principles, it isn’t called a proof when your problem is a non-Euclidean geometry space. It is called a mistake. Likewise, if the compiler try to prove something using the wrong bracket operators (from a different operator space), it is no longer a proof; it is a mistake. Add it to the fact the compiler is not trying to prove anything; it is compiling an app, not a theorem. Hence, what it is doing is optimization, albeit with wrong information.

        1. Richard says:

          This is not specific to Microsoft.

          Optimizing means removing code that is not required for the program to run as written.

          All optimizing compilers use proofs that certain code structures must be unnecessary.
          – If it cannot be proven that a certain piece of code is unnecessary, then the code is necessary, and thus cannot be optimised.

          The proofs were worked out by the compiler designer and they implemented checks that find the situations where they apply.

          A core tenet of optimization is that undefined behaviour is not permitted happen in any program, therefore the compiler can assume the programmer has arranged the reality external to the current code unit to ensure that any path that would be undefined can’t happen.

          Eg if the caller would be undefined for value zero, value zero cannot ever be passed. Thus any further checks for “Is it zero?” aren’t necessary and can be removed – they’ve already been done.
          When called in a different context the zero-check is necessary, so the compiler leaves the check in that time.
          It can do that by inlining (perhaps only inlining the check) or by having two versions of the function.

          Problems occur when the programmer made a mistake and did invoke undefined behaviour.
          – the most common is probably omitting explicit sequence points, so the compiler is free to re-order operations to use fewer registers etc.

          1. Oh, you are using the “proof” word correctly, alright.

            And if you believe the article is using it correctly, well, I got confused. What’re you gonna do? Sue me?

        2. Harry Johnston says:

          As far as I can see, it doesn’t matter which bracket operator it is, because no bracket operator can legally return a null reference.

        3. voo says:

          “In a global sense, when you try to prove something using, e.g., Euclidean geometry principles, it isn’t called a proof when your problem is a non-Euclidean geometry space. ”
          No, the way proof is used here is perfectly valid.

          Mathematicians do this all day long: Sqrt(x^2) = x if x e R and x > 0. If you use that equivalence in the context of a negative or complex number you’ll get the wrong result (just think of all those fun examples of proving 1=0 by dividing by zero). You can do exactly the same in C, except that your input isn’t restricted to positive numbers but to all programs that are valid according to the standard.

          If you don’t follow that you might end up with 1=0 or a removed null check – in both cases it’s your own fault.

          That said, there’s no good reason for having that much undefined behavior in a modern language these days, but the consequences are really perfectly logical.

  8. Giuseppe says:

    The real problem is that programmers (human beings, fragile and fallible) usually think “locally” when they implement functions. “refresh”, by a “local” point of view, “contractually must” check for NULL. What is the purpose of a programming language? Translate in machine code the programmer’s “wish” or generate the most optimized code assuming that the programmer is omniscient? Or, let’s suppose that “refresh” comes from a third-party library. The function’s documentation would say “if the pointer is NULL, the function does nothing” and the “naive” guy who uses it assumes that the contract is “true”… In my opinion, the inheritance of C language (“the programmer knows what he is doing”) has been formally extended, but substantially disowned: “the compiler knows better than the programmer itself what the programmer wants to do”.

    1. Harry Johnston says:

      The behaviour you’re describing would violate the standard and a compiler that behaved in that manner would be broken.

      What actually happens is that the compiler removes the test only in the case where it can prove that the function isn’t being called by external code.

      1. Giuseppe says:

        Please consider this: you are editing your program and decide to move the function “refresh” from one source file to another one… This simple and common operation is enough to change the program’s behavior in a very counterintuitive way. Of course, I am getting the part of devil’s advocate. My point is that we should accept that our brain/culture/mindset is focused on the “literal” sense of single lines of code or short group of them, and that there are things whose optimization should be avoided. Programming languages are meant to help our limited mind to focus on the problem: we use named variables because they help us think about “things” instead of “numbers”, we use OOP because it maps (quite) well the way our mind classifies the world… When a programming language withdraws this model, it betrays its mission.

        1. Richard says:

          Not quite, Giuseppe.
          The compiler does this because it allows the programmer to think locally, yet still get maximum performance.

          The programmer always writes “if null do this, otherwise do that.” – thinking locally.

          The compiler however can look at everywhere the function is actually used, and remove the path that cannot ever happen *at each location*.

          So when called from A, it’s turning a reference into a pointer so it cannot ever be null.
          When called from B, it’s passing null so it cannot ever be not-null.
          And from C it could be either so the check is kept.

          If the programmer then uses the function in a new place, or changes an existing one, the compiler again classifies this as being A, B or C and does the appropriate optimisation.

          Make sense?

          1. Giuseppe says:

            Yes, of course it make sense, by a technical point of view. What I am trying to say is that compilers are to becoming “too smart”. I know perfectly that the code in example is wrong because it generates a null reference by redefining “[]” operator the wrong way. But I also observe that the programmer who added the test deleted by the compiler, could have done it (naively?) to protect himself against errors exactly like the one that eventually happened. And, yes, this sort of self-protection should anyway disappear in non-debug compilations, in an ideal and perfect world. But I don’t agree to put the the whole blame on programmer. I my opinion, the compiler in this case was too … optimistic: it trusted too much in programmer by assuming that the one who wrote the code had a perfect “global” view of his code.

          2. Richard says:

            In the example the programmer broke the rules – operator [] returns a reference, and the pointer to a reference cannot be nullptr – QED.

            Most modern compilers warn or error if you overload operators to return invalid types. We don’t know whether this compiler did warn – or if the programmer explicitly disabled the warning.
            I’ve seen that in quite a lot of production code, and usually it ends very badly.

          3. Harry Johnston says:

            I do think there are cases where over-aggressive optimization can cause a program to fail in a way that makes it unnecessarily difficult to track down the original (perhaps very subtle) bug. I don’t think this is one of them.

            In fact, depending on how you interpret the intent of the code shown in Raymond’s example, the optimization may have helped the programmer by turning a bug that was being silently ignored (probably resulting in more subtle misbehaviour that might be extremely hard to track down) into a crash with a fairly obvious cause.

            Even if you assume that the null pointers were intentional (this is my guess) then the crash would still have helped the programmer identify a bug (and learn an important language rule) that they might otherwise have remained entirely unaware of. Annoying, no doubt, but not all bad.

  9. Some people seem a little confused. It’s provable because operator[] returns a reference to an int which according to the specification must be valid. Passing a NULL as a this pointer generates the same sort of undefined behaviour. The compiler assumes it’s not a null pointer, and it is free to optimize checks away without violating the specifications. And because the null check is in a separate inlined function there is no reason for the compiler to emit a warning for the superfluous null check.

    Most programmers probably aren’t aware of all the details of the C++ specifications so when they try something like this and it works they assume it’s ok. Obviously they can lead to “random” crashes if it’s rarely called code and it works in debug builds but not release builds and there is no proper code testing procedures.

    1. Peter Doubleday says:

      Plunging once again into an area where I am not an expert (one of so many): yes, I believe you’ve expressed this particular issue correctly.

      Two observations, however: one technical, and one blog-related. And both issues are, as it were, related.

      The technical issue is that programmers (including me) naively believe that The Compiler (Or The Compiler/Linker, or I suppose in many cases you could even include The Runtime) has a world view that encompasses, well, the entire world. At various stages in the pipeline this is not necessarily so. In particular, it is not true when what, for want of a better term, I will call a “scope boundary” is crossed.

      Now, in C/C++, you cross a scope boundary immediately you use an external library (or even one of your own). Neither the compiler nor the linker has any way of knowing that the thing on the other side can (or cannot) be proven not to exhibit undefined behavior. Consider, for example, a case where operator [] is defined in a separate compilation unit. All you get is the header. All the header tells you is that you can “reliably” get a reference to an int. See below for how this is related to Raymond’s point.

      Taking this one step further: people (including me) assume that the type-checker part of the compiler — which can manifest itself as an arbitrarily long set of rules, arranged in an arbitrary and non-predictable order, just to make things interesting — has a universal world view. Which it doesn’t.

      And here is where I go beyond my pay level and start waving my hands around. Basically, unless you place a bound on how far you are going to go, your type-checking system (in a language that does not provably forbid undefined behavior) is subject to the Halting Problem. And, even short of that, it’s a practical impossibility to design such a type-checking system that guarantees, say, O(n log n) behavior. Therefore, in practise, your type-checking system will — as here — accept a part of the code, inlined to one degree, as here, and apply the rules it knows about. One convenient (“provable”) rule is, as here, that operator[] for this class will never return a null int-reference. It is provable because the language specification guarantees it.

      Now, it’s obvious to humans (with a bit of thought) that in this case it can. But it isn’t obvious in the domain of the type-checker. In fact, it is, by design (see Halting Problem), invisible.

      And now back to Raymond’s blog. Let’s not concretize the problem at hand to the particular example that Raymond has provided. This is a pedagogical example only (although it might well occur with some compilers and some programs in some form of real life).

      The point is, as I said above — in a language like C or C++, once you take “scope boundaries” into account, and that scope boundary can even cross “module” boundaries, then there is no obvious way that the compiler can warn you about this sort of issue without plastering the stupid warning all over the place and creating more false negatives than the human mind can comfortably comprehend.

      1. Peter Doubleday says:

        Which probably isn’t helpful enough. OK, let’s resolve the issue to the following question:

        “Why did the compiler choose to erase the null check?”

        Because you told it to do that. You asked for optimisation. This is an optimisation. It follows the rules of the language standard.

        What you asked for, you got.

      2. Harry Johnston says:

        Note that optimizations may also be performed at link time, in which case functions in different modules are still in the same scope. I think it would also be legal for a C++ compiler to be designed so that it looks at all the source modules (including libraries) at once, rather than having separate compile-and-then-link steps. I’ve never heard of any, probably because of the performance implications, but it is theoretically possible.

        This doesn’t really affect what you’re saying, mind you, since in most cases it would still be impossible for the compiler to reliably tell whether or not the program is following the rules.

        (What the compiler could perhaps have usefully done in this particular instance is to automatically insert an assertion into debug builds that will fire if a null pointer is ever returned as a reference.)

  10. LePiaf says:

    Has anyone been able to reproduce this behavior?
    I tried: clang/gcc with -O3, and microsoft 15.5 with /O2

Comments are closed.


*DISCLAIMER: I DO NOT OWN THIS CONTENT. If you are the owner and would like it removed, please contact me. The content herein is an archived reproduction of entries from Raymond Chen's "Old New Thing" Blog (most recent link is here). It may have slight formatting modifications for consistency and to improve readability.

WHY DID I DUPLICATE THIS CONTENT HERE? Let me first say this site has never had anything to sell and has never shown ads of any kind. I have nothing monetarily to gain by duplicating content here. Because I had made my own local copy of this content throughout the years, for ease of using tools like grep, I decided to put it online after I discovered some of the original content previously and publicly available, had disappeared approximately early to mid 2019. At the same time, I present the content in an easily accessible theme-agnostic way.

The information provided by Raymond's blog is, for all practical purposes, more authoritative on Windows Development than Microsoft's own MSDN documentation and should be considered supplemental reading to that documentation. The wealth of missing details provided by this blog that Microsoft could not or did not document about Windows over the years is vital enough, many would agree an online "backup" of these details is a necessary endeavor. Specifics include:

<-- Back to Old New Thing Archive Index