What are these spurious nop instructions doing in my C# code?

Date:August 17, 2007 / year-entry #303
Tags:code
Orig Link:https://blogs.msdn.microsoft.com/oldnewthing/20070817-00/?p=25533
Comments:    31
Summary:Prerequisites: Basic understanding of assembly language. When you debug through some managed code at the assembly level, you may find that there are an awful lot of nop instructions scattered throughout your method. What are they doing there; isn't the JIT smart enough to remove them? Isn't this going to slow down execution of my...

Prerequisites: Basic understanding of assembly language.

When you debug through some managed code at the assembly level, you may find that there are an awful lot of nop instructions scattered throughout your method. What are they doing there; isn't the JIT smart enough to remove them? Isn't this going to slow down execution of my program?

It is my understanding that¹ this nop instructions are inserted by the JIT because you're running the program under the debugger. They are emitted specifically so that the debugger can set breakpoints in locations that you normally wouldn't be able to. (For example, they might represent a line of code that got optimized out or merged with another line of code.)

Don't worry. If there's no debugger, the JIT won't generate the dummy nops.

Nitpicker's Corner

¹As with all statements of alleged fact, this statement is an interpretation of events based on observation and thought and does not establish a statement of the official position of the CLR JIT compiler team or Microsoft Corporation, and that interpretation may ultimately prove incorrect.


Comments (31)
  1. mccoyn says:

    My guess is it has very little to no impact on execution time.  Its been a while since I studied them, but I think most processors have an out-of-order execution scheduler that intercepts all instructs and decides when and where they will be executed.  It also throws away any nop it finds and even adds some if it feels the rest of the processor can’t handle a particular order of instructions.  The impact would then only be on size and loading time.

  2. Neil says:

    As it’s not always possible to tell if you’ve turned certain optimizations on, a few extra NOPs would come in handy when generating void (or ignored result) calls with debugging information, so that the stack frame still points to the line of the call.

  3. BryanK says:

    mccoyn — mostly true, but not entirely.  There will be an impact on more than just the size of the code and loading time: the L1 code cache will have to store the NOPs, since (AFAIK anyway) L1 caches the "real" code bytes, not the microcode.  Since L1 is storing the NOPs, it’ll have to push out some other useful code.

    And when memory is so much slower than the processor, and getting worse, the size of your code (meaning the amount that you can get done without missing in the cache) is getting to be more important than the speed of your code.

    Of course these NOPs only happen when you have a debugger attached, so it doesn’t matter anyway.  But in general, anything that affects size is also going to affect speed, via the cache.  Out-of-order and speculative execution or not; the cost of an actual executed NOP is tiny compared to waiting for main memory because you missed in both the L1 and L2 caches.

    (An exclusive cache has a slight edge in this case, since there’s a slightly higher chance that the code is still in the L2 cache (because the contents of the L1 cache aren’t also taking up space in L2).  But even waiting for the L2 cache will take more than one clock cycle, I think, and a NOP only takes one clock to execute.)

  4. Jerk says:

    In spite of your disclaimer, I’m still going to assume that this is Microsoft’s official position.  In addition, I will be filing a bug report with the CLR JIT compiler team and referencing this blog in it.  Good day, sir.

  5. alex.r. says:

    since (AFAIK anyway) L1 caches the "real" code bytes, not the microcode

    Although this is generally the case, it is not strictly true. On the P4 for instance, the L1 cache (the ‘trace cache’) stored decoded instructions.

    But it does not really matter either way I guess, as these NOPs are certainly not the even close to being the main performance bottleneck when running your application under a debugger.

  6. For the x64 and ia64 2.0 CLR JIT, it also inserts NOPs to align loops (some processors execute a backwards branch ‘faster’ if it is on a 16-byte boundary), and to accomodate certain unwind semmantics (so you’ll see them after between calls and EH boundaries, like the end of a try body).  Again this is not guaranteed behavior, or the official MS position, etc., it’s just an FYI for the curious.

    –Grant

  7. DanielMoth says:

    FYI, I believe the canonical example for nop operation usage is placing a breakpoint on a curly brace.

  8. It must be CLR week over at The Old New Thing because it’s been non-stop posts about C# lately. Raymond’s

  9. Karellen says:

    OK, why would you bother to allow someone to put a breakpoint on a curly brace? As far as I can tell, that would have the same effect as having a breakpoint on a blank line. You can’t put breakpoints on blank lines, can you?

    Assuming you can’t put breakpoints on blank lines, can anyone figure out what advantage putting a breakpoint on a curly brace give you?

    Hmmmm…..given:

    if (x) {

       foo();

    }

    would you expect a breakpoint on the "}" to fire if x was true, if x was not true, or both?

  10. Eric Lippert says:

    OK, why would you bother to allow someone to put a breakpoint on a curly brace?

    I frequently want to set a breakpoint AFTER something happens, not before. How do you do that if the "something happens" is the LAST thing that happens?

    void M() {

     this.x = Blah();

    }

    How are you going to inspect the value of this.x unless you can put a breakpoint AFTER the call to Blah()?

    That’s why we let you put a breakpoint on the curly.

  11. Eric Lippert says:

    And that of course answers your question. The breakpoint would be hit iff x is true, because the curly "runs" after the call to foo().

  12. Matt Davis says:

    "Don’t worry. If there’s no debugger, the JIT won’t generate the dummy nops."

    Sure, and the fridge light really goes out when you shut the door… ;)

    But seriously, Raymond, thanks for continuing to blog in the face of a-holes, and flinging a little poo at them while you’re at it.

  13. RichB says:

    nop is also emitted when you override a virtual method and provide no implementation.

    eg:

    .method public hidebysig virtual instance void

           Test(class ThunderMain.Tree.Node opNode,

                class ThunderMain.Tree.Node opTestNode,

                class ThunderMain.Tree.Preferences opPrefs) il managed

    {

     // Code size       2 (0x2)

     .maxstack  8

     IL_0000:  nop

     IL_0001:  ret

    } // end of method FormatAlgorithm$NullAlgorithm::Test

  14. RichB says:

    In reference to my previous comment – I was talking about the nop at the IL level, not the x86 level I suspect you were talking about…

  15. MSDN Archive says:

    There are some other places where these NOPs are valuable.

    Suppose you have a method call with an assignment:

       x = F();

    If you step IN to F(), and then step OUT, the current statement marker will be on this statement.  That’s because this statement still has work to do – assign the result to x.

    If you don’t assign the result:

      F();

    then upon stepping OUT the current statement marker will be on the next statement.  That can be confusing.  By adding the NOPs we get consistent behavior between the two.

    (Note that this is a generalization – specific cases may vary).

  16. St. Thomas... says:

    > Sure, and the fridge light really goes out when you shut the door… ;)

    You can hide a videocam inside the fridge.

    Or, you can attach a debugger after the JIT has generated code for a method :)

  17. For our Eiffel implementation we had to do the same but we learned it by trial and errors. Indeed the pdb format does not like when you set several breakpoints at the same location. So now we generate a nop for each of those breakpoints and it works.

  18. user_1 says:

    sometimes people forget how advanced is out-of-order execution in today chips. for example this code runs at 1 clock (!!) per loop iteration on core2.

    LOOP:

    mov eax, [esi+ecx]

    mov [ecx+edi*1], ebx

    add ecx, 4

    js LOOP

    source: microarchitecture.pdf by agner fog (google for it)

  19. Hum.... says:

    My understanding of the way that the debugger works could easily be squiffy, I just don’t know in any official capacity how it works…. Anyhoo.

    Surely any instruction that you execute relates directly to some decretely identifiable description, or line, or part thereof, that one specified to a compiler.

    If you’re going to relate an instruction or more likely a group of instructions to a described line of code, then there must be a *mapping* of some sort that links instructions and described lines of code. I assume that the debugger operates on the basis of hardware traps.

    I assume that the purpose of .NET is to allow platform independence. That is to say that the code one compiles is translated into a native machine code appropriate to the target platform, at run time. For the sake of this text I’ll refer to what the compiler produces as CLR, but I’m not sure about that. The runtime process of translation between CLR and and platform machine code I’ll refer to as JIT, but again I’m not certain what it should be called.

    Is this article suggesting that the mapping is generated from the CLR code, rather than the platform machine code?

    The only thing that makes sense to me is that the optimisation process, is unable to move traps, because if it did, the mapping would no longer be coherent. Presumably the JIT produces code that is smaller than expected at the time the CLR was generated, and consequently the nops have to be filled to make up the space, and maintain the mapping coherence.

    The thing that I wonder, is what happens when compiled code is retargetted to a different instruction set, which is what I thought .NET was about. What happens, say, when the target platform requires more instructions to complete the line, than was specified in the original CLR code?

    Surely the JIT must be a strict lookthrough for the debugger, for good design? It appears that the debugger is looking around the JIT. Does this mean that you have to rebuild the CLR for each platform that you wish to debug on?

    Wouldn’t it be better for the debugger to sit completely behind the JIT, and have the JIT set real traps, and generate mappings into the CLR? The debugger could then operate on it’s own mapping into the CLR, and pseudo traps that it could receive from the JIT.

    It really must be a bit of a boiler. If I am right, is this just a question of time, money and resources, or is it intended to work that way?

  20. Neil says:

    If you have one of Raymond’s favourites, an object with a nontrivial destructor going out of scope, then the destructor often appears to execute on the close curly of that scope, thus enabling you to breakpoint on it.

  21. alex.r. says:

    St. Thomas.

    Sure, you could put a camera in your fridge… but then you’d only be sure that the light is off *when there is a camera*.

  22. Eric Lippert says:

    Obviously the solution is to shut yourself into the fridge and observe it directly.

  23. Drak says:

    Schroedinger etc.

    Maybe there are only NOPs because you observe the code, and there are none when you are not looking for them ;)

    (Eric, you being in the fridge would change the experiment, thus invalidating the results ;)

  24. - says:

    Uhh… IIRC:

    K7s (Athlons) and derivatives will kill NOPs at the decoding stage at a max of 3/cycle.

    Intel marchs before Core2 process NOPs and they must use one of the Integer/Logic/Float execution ports (P1 or P2), so they "pollute the internal buffers" and have a max throughput of 2/cycle.

    Core2 has 3 I/L/F ports so I’d expect it to eat 3 NOPs per cycle.

  25. CGomez says:

    @Hum:

    "I assume that the purpose of .NET is to allow platform independence."

    That’s your assumption.  I’m pretty sure it’s never been stated as an assumption or goal of the platform.  Rather, it is a side effect (the .NET platform can serve as a programming interface that, if implemented on other targets, can provide platform independence).

    I would say some of the stated goals include a common type system and easier cross-language compatibility.  Also included are a managed runtime intended to improve performance in memory management (and maybe as a side effect limit memory based bugs like buffer overruns).  It could be theorized that eventually the JIT could, when it knows more about the hardware then the coder, could produce more specialized and targeted (and performant) code than the native compiler.

  26. Hum.... says:

    Interesting response.

    By nature, I’m a bit holistic in my outlook. It’s probably why I’m not eminent or successful – you need to be *really* good if you want to be eminent successful *and* holistic. Digression over.

    I’m sure though there are more than a few that would agree with my outlook. Certainly my first google hit agreed anyhow.

    There are some potentially good things about .NET, but my holistic outlook is telling me that the number one improvement for memory management would be hardware based, specifically content adressable memory for, say, free list lookups on the heap. I’d wager that alone, such a hardware feature would have a massive effect on overall performance.

    Clearly that’s not a Microsoft problem. On the other hand, they’ll (you’ll) already be aware of such things. How hard do you keep hitting a problem that can’t be dramatically improved with the only tools you have? Maybe it’s a needle to crack a coconut? Maybe ther is another way?

    I wasn’t exepecting an answer to my original post, but I’m glad you responded. At least I’m not alone in the universe.

  27. Dean Harding says:

    Shut the fridge for a while, then open it and check whether the light bulb is cold.

    What if it’s fluorescent light?

  28. Ryan Bemrose says:

    > Sure, you could put a camera in your fridge… but then you’d only be sure that the light is off *when there is a camera*.

    Any Microsoft interviewee knows the answer to this one.  Shut the fridge for a while, then open it and check whether the light bulb is cold.

  29. mccoyn says:

    Fluorescent lights still get warm, just not as much.  Even LED bulbs warm up, but you might need to get a thermometer to detect it.

    Do I get the job?

  30. alex.r. says:

    Do I get the job?

    Sure you do — you can come repair my fridge’s light any time you’d like ;)

Comments are closed.


*DISCLAIMER: I DO NOT OWN THIS CONTENT. If you are the owner and would like it removed, please contact me. The content herein is an archived reproduction of entries from Raymond Chen's "Old New Thing" Blog (most recent link is here). It may have slight formatting modifications for consistency and to improve readability.

WHY DID I DUPLICATE THIS CONTENT HERE? Let me first say this site has never had anything to sell and has never shown ads of any kind. I have nothing monetarily to gain by duplicating content here. Because I had made my own local copy of this content throughout the years, for ease of using tools like grep, I decided to put it online after I discovered some of the original content previously and publicly available, had disappeared approximately early to mid 2019. At the same time, I present the content in an easily accessible theme-agnostic way.

The information provided by Raymond's blog is, for all practical purposes, more authoritative on Windows Development than Microsoft's own MSDN documentation and should be considered supplemental reading to that documentation. The wealth of missing details provided by this blog that Microsoft could not or did not document about Windows over the years is vital enough, many would agree an online "backup" of these details is a necessary endeavor. Specifics include:

<-- Back to Old New Thing Archive Index