BlockLeftTop, PRELOAD BlockLeftBottom, PRELOAD BlockLeftStretch, PRELOAD BlockTop, PRELOAD BlockBottom, PRELOAD BlockStretch, PRELOAD BlockRightTop, PRELOAD BlockRightBottom, PRELOAD BlockRightStretch, PRELOAD
Latest Game

For vs Foreach Performance

by Benjamin Nitschke 22. April 2009 03:31
Today I had a discussion with a few other programmers (Hi ViperDK and Timo ^^) about foreach and for loop performance. Everyone had his own ideas and experiences, but we talked mostly about XNA on the Xbox 360 (using the compact framework, that does not like foreach, just use for all the time in tight render loops for your XNA games, foreach will create too much IEnumerator garbage that will not be collected) or other special cases (huge arrays or lists).

Anyway, just guessing around is obviously not as good as actually writing a performance test application and see whats going on (see end of this article for download link). Again, we need a disclaimer: You will probably never write loops with 1 billion iterations, these numbers are similar for less iterations, but you should always try to write clean code and ONLY optimize when it is required after finding performance issues! Lets go directly to the results for 1 billion for or foreach iterations in release mode (will loop through a 10k array or list for 100k times, see below for code):



As you can see there are some major differences between using foreach with an array or using a generic list (boxing/unboxing is not the issue here, you would not be able to archive this performance with an ArrayList). Another thing you can notice right away is that using simple arrays is always faster than using dynamic lists, even converting them from a list to an array via .ToArray() and then executing the foreach is much faster.

It also makes a HUGE difference executing this in debug mode (I execute my code almost 99% in debug mode), everything is 4 times slower for this special performance test. Except for Foreach+List, which is in comparison not that bad in debug mode. What is really bad is For+List, which almost takes 16 seconds to execute 1 billion times (again using my Core 2 Duo 6600 CPU with 4 GB Ram).



Thanks to some helper methods the code for this performance test is pretty short and easy (check end of this article for the source code download link):
 Stopwatch timer = new Stopwatch();
 // Run through all tests
 for (int test = 0; test < 8; test++)
 {
     // Find out which test we wanna make
     bool rememberLength = (test / 4) % 2 == 1;
     bool useForeach = (test / 2) % 2 == 1;
     bool useList = test % 2 == 1;

     // Start timer for test 
     timer.Start();

     // Execute
     int result = RunForOrForeachLoops(rememberLength, useForeach, useList);

     // Stop timer
     timer.Stop();

     // Skip this test if we got no result!
     if (result == 0)
         continue;

     // Report results!
     resultText = ...
     Console.WriteLine(resultText + ": " + timer.ElapsedMilliseconds + "ms");

     // Reset timer for next run
     timer.Reset();
 } // for
The interesting code obviously happens in RunForOrForeachLoops. It basically executes the 1 billion iterations in several different ways (8 ways, as you can guess by the for main test for loop). All of those produce the same result, half of the tests use int[] intArray, the other half uses List<int> intList. And again half of the tests uses just for loops, the other half uses foreach for the iterations. Lets examine the useForeach=true and rememberLength=false cases (btw: Iterations is 100000 and the intList and intArrays have a size of 10000 elements):

if (useForeach)
{
    if (useList)
    {
        for (int iter = 0; iter < Iterations; iter++)
        {
            result = 0;
            foreach (int val in intList)
                result += val;
        } // for
    } // if
    else
    {
        for (int iter = 0; iter < Iterations; iter++)
        {
            result = 0;
            foreach (int val in intArray)
                result += val;
        } // for
    } // else
} // if
else // useForeach == false
...

To figure out the differences between those few lines of code in RunForOrForeachLoops, that almost look the same, we have go to IL code once again. First lets check out what the fastest code block Foreach+Array is being compiled to in IL:
result = 0;
foreach (int val in intArray)
    result += val;
This becomes in IL:
 L_0007: ldsfld int32[] TestSimpleForeachApp.Program::intArray
 L_000c: stloc.2 
 L_000d: ldc.i4.0 
 L_000e: stloc.3 
 L_000f: br.s L_001d
 L_0011: ldloc.2 
 L_0012: ldloc.3 
 L_0013: ldelem.i4 
 L_0014: stloc.1 
 L_0015: ldloc.0 
 L_0016: ldloc.1 
 L_0017: add 
 L_0018: stloc.0 
 L_0019: ldloc.3 
 L_001a: ldc.i4.1 
 L_001b: add 
 L_001c: stloc.3 
 L_001d: ldloc.3 
 L_001e: ldloc.2 
 L_001f: ldlen 
 L_0020: conv.i4 
 L_0021: blt.s L_0011
In L_0021 we jump up to L_0011 as long as we are not done with the loop yet. Inside all the result += val happens, but we do not see any slow code, there are no calls to any methods, there is no unboxing and this all can probably be executed quickly after being complied to assembler by the JIT (just-in-time) compiler.

On the other hand, if we look at pretty much the same code, just with intList instead of intArray, the whole story changes:
result = 0;
foreach (int val in intList)
    result += val;
becomes much more complex IL code with lots of function calls, the creation of an Enumerator and there is even a dispose at the end (hello GC):
  L_0007: ldsfld class [mscorlib]System.Collections.Generic.List`1<int32> TestSimpleForeachApp.Program::intList
  L_000c: callvirt instance valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator<!0>
[mscorlib]System.Collections.Generic.List`1<int32>::GetEnumerator() L_0011: stloc.2 L_0012: br.s L_0020 L_0014: ldloca.s CS$5$0000 L_0016: call instance !0 [mscorlib]System.Collections.Generic.List`1/Enumerator<int32>::get_Current() L_001b: stloc.1 L_001c: ldloc.0 L_001d: ldloc.1 L_001e: add L_001f: stloc.0 L_0020: ldloca.s CS$5$0000 L_0022: call instance bool [mscorlib]System.Collections.Generic.List`1/Enumerator<int32>::MoveNext() L_0027: brtrue.s L_0014 L_0029: leave.s L_0039 L_002b: ldloca.s CS$5$0000 L_002d: constrained [mscorlib]System.Collections.Generic.List`1/Enumerator<int32> L_0033: callvirt instance void [mscorlib]System.IDisposable::Dispose() L_0038: endfinally
What even looks worst too me is the code that is executed in MoveNext (get_Current just returns this.current, no extra instructions here). Since the IL code is very long and confusing (31 IL instructions), here is the C# code of MoveNext:
public bool MoveNext()
{
  List<T> list = this.list;
  if ((this.version == list._version) && (this.index < list._size))
  {
    this.current = list._items[this.index];
    this.index++;
    return true;
  }
  return this.MoveNextRare();
}

And MoveNextRare could also be called at the end of MoveNext if something changes for the list (not the case in our example). But we have to keep in mind that this is just IL code, not actual code that will be executed on the CPU. The JIT compiler can still optimize things out, which we won't see here. But something must be going on since Foreach+List is definiately slower than using all the other approaches to get through all the values in intList. I recommend Foreach+Array converted from List because the IL for Foreach+Array is really fast and does not use any extra methods or Enumerators, especially if you know that the array won't change much and if you do not have to call .ToArray every time when you do a foreach.

And here is the sample project for all this performance testing fun:

Comments


1/10/2010 2:44:30 PM #

Hmmm interesting stuff

instant personal loans direct | Reply



4/1/2010 11:12:19 AM #

To be prepared is half the victory.

alta white | Reply



4/18/2010 5:42:18 PM #

Where did you learn this stuff?

us | Reply



4/28/2010 11:36:07 AM #

I\'m happy I found this blog, I couldnt discover any info on this subject matter prior to. I also run a site and if you want to ever serious in a little bit of guest writing for me if possible feel free to let me know, i\'m always look for people to check out my site. Please stop by and leave a comment sometime!

Rapidshare | Reply


Add comment




biuquote
  • Comment
  • Preview
Loading



Disclaimer: The opinions expressed in this blog are own personal opinions and do not represent the companies view.
© 2000-2010 exDream GmbH & MobileBits GmbH. All rights reserved. Legal/Impressum

Recent Games

Fireburst

ArenaWars Reloaded

Jobs @ exDream

Current Poll

Do you know what the Delta Engine is?



Show Results Poll Archive

Calendar

<<  July 2010  >>
MoTuWeThFrSaSu
2829301234
567891011
12131415161718
19202122232425
2627282930311
2345678

Blogs

Download OPML file OPML