Why the compiler can’t autoconvert foreach to for

Date:April 21, 2004 / year-entry #152
Tags:code
Orig Link:https://blogs.msdn.microsoft.com/oldnewthing/20040421-00/?p=39703
Comments:    11
Summary:People have discovered that the "natural" C# loop construct ArrayList list = ...; foreach (Object o in list) { ... do something with o ... } is fractionally slower than the corresponding manual loop: ArrayList list = ...; for (int i = 0; i < list.Length; i++) { Object o = list[i]; ... do something...

People have discovered that the "natural" C# loop construct

ArrayList list = ...;
foreach (Object o in list) {
  ... do something with o ...
}

is fractionally slower than the corresponding manual loop:

ArrayList list = ...;
for (int i = 0; i < list.Length; i++) {
    Object o = list[i];
  ... do something with o ...
}

The first thing that needs to be said here is that

The performance difference is almost certainly insignificant.

Don't go running around changing all your foreach loops into corresponding for loops thinking that your program will magically run faster. It almost certainly won't, because loop overhead is rarely where a non-benchmark program spends most of its time.

My topic for today is not how to make your code faster by abandoning your foreach loops. My topic is to answer the question, "Why doesn't the compiler autoconvert the foreach into the corresponding for, so I don't lose readability but get to take advantage of the performance benefit."

The reason is that the two loops are in fact not identical.

The semantics for enumeration is that you aren't allowed to change the object being enumerated while an enumeration is in progress. If you do, then the enumerator will throw an InvalidOperationException the next time you talk to it. On the other hand, the for loop doesn't care if you change the collection while you're enumerating it. If you insert items into the collection inside the for loop, the loop will keep on going and depending on where the insertion happened, you might double-enumerate an item.

If the compiler changed the foreach to a for, then a program that used to throw an exception would now run without a hiccup. Whether you consider this an "improvement" is a matter of opinion. (Depending on the circumstances, it may be better for the program to crash than to produce incorrect results.)

Now, the compiler might be able to prove that you don't change the collection inside the loop, but that is often hard to prove. For example, does this loop change the collection?

ArrayList list = target.GetTheList();
foreach (Object o in list) {
  o.GetHashCode();
}

Well, it doesn't look like it. But who knows, maybe target looks like this:

class Sneaky {
  ArrayList list_;

  public Sneaky(ArrayList list) { list_ = list; }
  public override int GetHashCode()
  {
    list_.Add(this);
    return base.GetHashCode();
  }
}

class SneakyContainer {
  public ArrayList GetTheList()
  {
    ArrayList list = new ArrayList();
    list.Add(new Sneaky(list));
    return list;
  }
}

class Program {
  static public void Main()
  {
    SneakyContainer target = new SneakyContainer();
    ArrayList list = target.GetTheList();
    foreach (object o in list) {
      o.GetHashCode();
    }
  }
}

Ah, little did you know that o.GetHashCode() modifies the ArrayList. And yet it looked so harmless!

If the SneakyContainer class came from another assembly, then the compiler must assume the worst, because it's possible that somebody will make that assembly sneaky after you compiled your assembly.

If that's not a messed-up enough reason, here's another: The ArrayList class is not sealed. Therefore, somebody can override its IEnumerable.GetEnumerator and return a nonstandard enumerator. For example, here's a class that always returns an empty enumerator:

class ApparentlyEmptyArrayList : ArrayList {
  static int[] empty = new int[] { };
  public override IEnumerator GetEnumerator()
    { return empty.GetEnumerator(); }
}

"Who would be so crazy as to override the enumerator?"

Well, this one is rather bizarro, but more generally one might override the enumerator in order to add a filter or to change the order of enumeration.

So you can't even trust that your ArrayList really is an ArrayList. It might be an ApparentlyEmptyArrayList!

Now if the compiler wanted to do this rewrite optimization, not only would it have to prove that the object being enumerated is not modified inside the enumeration, it also has to prove that the object really is an ArrayList and not a derived class that may have overridden the GetEnumerator method.

Given the late-binding nature of cross-assembly classes, the number of cases where the compiler can prove these requirements is very restricted indeed, to the point where the number of places where the optimization can safely be performed without changing semantics becomes so vanishingly small as to be not worth the effort.

(By some astonishing universal synchronicity, this topic got picked up by several people all at once:

Sort of the same way a movie subject gets covered all at once. My favorite is the year that there were two volcano disaster movies, Volcano and Dante's Peak.)


Comments (11)
  1. My favorite movie subject all at once period was Antz and A Bug’s Life. The Deep Impact/Armageddon period was also interesting.

  2. Eric Lippert says:

    On a related topic, the way JScript/JScript .NET handle this is a little weird. I wrote a couple of articles on that a while back:

    http://blogs.msdn.com/ericlippert/archive/2003/09/22/53063.aspx

    http://blogs.msdn.com/ericlippert/archive/2003/10/01/53134.aspx

    Remember 1988, when all those terrible mind-body-switch movies came out at the same time? That was weird.

  3. To prove your point, I had written an enumerator called PageRangeEnumerator. You give it a string like "1-4,7,18+" and it would only enumerate over those pages.

  4. Clinton Pierce says:

    Larry Wall talks about optimizing for the Programmer as well as for Runtime speed. I’ll give up quite a few CPU cycles to maintain foreach(…) over a corresponding for(…) loop anytime.

  5. David Kafrissen says:

    You have described why the compiler cannot optimize one form into another, but you did not describe why one form is less efficient than the other.

  6. Raymond Chen says:

    My topic is "Why the compiler can’t autoconvert foreach to for", not "Which one is faster". That’s a topic for somebody else (try the links I provided).

  7. JFo says:

    It turns out that sometimes a foreach can be faster than a for loop – some people have been converting their loops from this:

    foreach (Control c in this.Controls) {

    // …

    }

    to this:

    for (int i = 0; i < this.Controls.Count; i++) {

    // …

    }

    Since you’re now accessing the "Controls" property on every iteration instead of using the enumerator it significantly slows down the loop.

  8. indranil banerjee says:

    but this just begs the question: Why were they designed to be semantically different?

    Sure, "depending on the circumstances, it may be better for the program to crash than to produce incorrect results". And sometimes the other way around. The language designer can never know. So why the inconsistency?

    I would choose the more permissive of the two choices, apply it consistently to both keywords and leave error handling to the programmer.

    BTW, JFo a really smart foreach would generate this :-

    int end = this.Controls.Count

    for (int i = 0; i < end; ++i) {

    // …

    }

    thats what happens in STL

  9. Raymond Chen says:

    They are different because "foreach" is a generalized enumerator whereas "for" works only for integer-indexed objects. One is generalized and one is specialized.

  10. Raymond Chen says:

    Commenting on this entry has been closed.

Comments are closed.


*DISCLAIMER: I DO NOT OWN THIS CONTENT. If you are the owner and would like it removed, please contact me. The content herein is an archived reproduction of entries from Raymond Chen's "Old New Thing" Blog (most recent link is here). It may have slight formatting modifications for consistency and to improve readability.

WHY DID I DUPLICATE THIS CONTENT HERE? Let me first say this site has never had anything to sell and has never shown ads of any kind. I have nothing monetarily to gain by duplicating content here. Because I had made my own local copy of this content throughout the years, for ease of using tools like grep, I decided to put it online after I discovered some of the original content previously and publicly available, had disappeared approximately early to mid 2019. At the same time, I present the content in an easily accessible theme-agnostic way.

The information provided by Raymond's blog is, for all practical purposes, more authoritative on Windows Development than Microsoft's own MSDN documentation and should be considered supplemental reading to that documentation. The wealth of missing details provided by this blog that Microsoft could not or did not document about Windows over the years is vital enough, many would agree an online "backup" of these details is a necessary endeavor. Specifics include:

<-- Back to Old New Thing Archive Index