Date: | April 21, 2004 / year-entry #152 |
Tags: | code |
Orig Link: | https://blogs.msdn.microsoft.com/oldnewthing/20040421-00/?p=39703 |
Comments: | 11 |
Summary: | People have discovered that the "natural" C# loop construct ArrayList list = ...; foreach (Object o in list) { ... do something with o ... } is fractionally slower than the corresponding manual loop: ArrayList list = ...; for (int i = 0; i < list.Length; i++) { Object o = list[i]; ... do something... |
People have discovered that the "natural" C# loop construct ArrayList list = ...; foreach (Object o in list) { ... do something with o ... } is fractionally slower than the corresponding manual loop: ArrayList list = ...; for (int i = 0; i < list.Length; i++) { Object o = list[i]; ... do something with o ... } The first thing that needs to be said here is that
Don't go running around changing all your foreach loops into corresponding for loops thinking that your program will magically run faster. It almost certainly won't, because loop overhead is rarely where a non-benchmark program spends most of its time. My topic for today is not how to make your code faster by abandoning your foreach loops. My topic is to answer the question, "Why doesn't the compiler autoconvert the foreach into the corresponding for, so I don't lose readability but get to take advantage of the performance benefit." The reason is that the two loops are in fact not identical. The semantics for enumeration is that you aren't allowed to change the object being enumerated while an enumeration is in progress. If you do, then the enumerator will throw an InvalidOperationException the next time you talk to it. On the other hand, the for loop doesn't care if you change the collection while you're enumerating it. If you insert items into the collection inside the for loop, the loop will keep on going and depending on where the insertion happened, you might double-enumerate an item. If the compiler changed the foreach to a for, then a program that used to throw an exception would now run without a hiccup. Whether you consider this an "improvement" is a matter of opinion. (Depending on the circumstances, it may be better for the program to crash than to produce incorrect results.) Now, the compiler might be able to prove that you don't change the collection inside the loop, but that is often hard to prove. For example, does this loop change the collection? ArrayList list = target.GetTheList(); foreach (Object o in list) { o.GetHashCode(); } Well, it doesn't look like it. But who knows, maybe target looks like this: class Sneaky { ArrayList list_; public Sneaky(ArrayList list) { list_ = list; } public override int GetHashCode() { list_.Add(this); return base.GetHashCode(); } } class SneakyContainer { public ArrayList GetTheList() { ArrayList list = new ArrayList(); list.Add(new Sneaky(list)); return list; } } class Program { static public void Main() { SneakyContainer target = new SneakyContainer(); ArrayList list = target.GetTheList(); foreach (object o in list) { o.GetHashCode(); } } } Ah, little did you know that o.GetHashCode() modifies the ArrayList. And yet it looked so harmless! If the SneakyContainer class came from another assembly, then the compiler must assume the worst, because it's possible that somebody will make that assembly sneaky after you compiled your assembly. If that's not a messed-up enough reason, here's another: The ArrayList class is not sealed. Therefore, somebody can override its IEnumerable.GetEnumerator and return a nonstandard enumerator. For example, here's a class that always returns an empty enumerator: class ApparentlyEmptyArrayList : ArrayList { static int[] empty = new int[] { }; public override IEnumerator GetEnumerator() { return empty.GetEnumerator(); } } "Who would be so crazy as to override the enumerator?" Well, this one is rather bizarro, but more generally one might override the enumerator in order to add a filter or to change the order of enumeration. So you can't even trust that your ArrayList really is an ArrayList. It might be an ApparentlyEmptyArrayList! Now if the compiler wanted to do this rewrite optimization, not only would it have to prove that the object being enumerated is not modified inside the enumeration, it also has to prove that the object really is an ArrayList and not a derived class that may have overridden the GetEnumerator method. Given the late-binding nature of cross-assembly classes, the number of cases where the compiler can prove these requirements is very restricted indeed, to the point where the number of places where the optimization can safely be performed without changing semantics becomes so vanishingly small as to be not worth the effort. (By some astonishing universal synchronicity, this topic got picked up by several people all at once: Sort of the same way a movie subject gets covered all at once. My favorite is the year that there were two volcano disaster movies, Volcano and Dante's Peak.) |
Comments (11)
Comments are closed. |
My favorite movie subject all at once period was Antz and A Bug’s Life. The Deep Impact/Armageddon period was also interesting.
On a related topic, the way JScript/JScript .NET handle this is a little weird. I wrote a couple of articles on that a while back:
http://blogs.msdn.com/ericlippert/archive/2003/09/22/53063.aspx
http://blogs.msdn.com/ericlippert/archive/2003/10/01/53134.aspx
Remember 1988, when all those terrible mind-body-switch movies came out at the same time? That was weird.
To prove your point, I had written an enumerator called PageRangeEnumerator. You give it a string like "1-4,7,18+" and it would only enumerate over those pages.
Larry Wall talks about optimizing for the Programmer as well as for Runtime speed. I’ll give up quite a few CPU cycles to maintain foreach(…) over a corresponding for(…) loop anytime.
You have described why the compiler cannot optimize one form into another, but you did not describe why one form is less efficient than the other.
My topic is "Why the compiler can’t autoconvert foreach to for", not "Which one is faster". That’s a topic for somebody else (try the links I provided).
It turns out that sometimes a foreach can be faster than a for loop – some people have been converting their loops from this:
foreach (Control c in this.Controls) {
// …
}
to this:
for (int i = 0; i < this.Controls.Count; i++) {
// …
}
Since you’re now accessing the "Controls" property on every iteration instead of using the enumerator it significantly slows down the loop.
but this just begs the question: Why were they designed to be semantically different?
Sure, "depending on the circumstances, it may be better for the program to crash than to produce incorrect results". And sometimes the other way around. The language designer can never know. So why the inconsistency?
I would choose the more permissive of the two choices, apply it consistently to both keywords and leave error handling to the programmer.
BTW, JFo a really smart foreach would generate this :-
int end = this.Controls.Count
for (int i = 0; i < end; ++i) {
// …
}
thats what happens in STL
They are different because "foreach" is a generalized enumerator whereas "for" works only for integer-indexed objects. One is generalized and one is specialized.
Commenting on this entry has been closed.