Parallel Extensions June CTP
Either my timing is great or it's lousy - you decide. Yesterday I posted about parallelising Conway's Life - and today the new CTP for the Parallel Extensions library comes out! The bad news is that it meant I had to run all the tests again... the good news is that it means we can see whether or not the team's work over the last 6 months has paid off.
Breaking change
The only breaking change I've seen is that AsParallel() no longer takes a ParallelQueryOptions parameter - instead, you call AsOrdered() on the value returned from AsParallel(). It was an easy change to make, and worked first time. There may well be plenty of other breaking API changes which are more significant, of course - I'm hardly using any of it.
Benchmark comparisons
One of the nice things about having the previous blog entries is that I can easily compare the results of how things were then with how they are now. Here are the test results for the areas of the previous blog posts which used Parallel Extensions. For the Game of Life, I haven't included the results with the rendering for the fast version, as they're still bound by rendering (unsurprisingly).
Mandelbrot set
(Original post)
Results are in milliseconds taken to plot the whole image, so less is better. (x;y) values mean "Width=x, MaxIterations=y."
| Description | December CTP | June CTP |
| ParallelForLoop (1200;200) | 376 | 380 |
| ParallelForLoop (3000;200) | 2361 | 2394 |
| ParallelForLoop (1200;800) | 1292 | 1297 |
| ParallelLinqRowByRowInPlace (1200;200) | 378 | 393 |
| ParallelLinqRowByRowInPlace (3000;200) | 2347 | 2440 |
| ParallelLinqRowByRowInPlace (1200;800) | 1295 | 1939 |
| ParallelLinqRowByRowWithCopy (1200;200) | 382 | 411 |
| ParallelLinqRowByRowWithCopy (3000;200) | 2376 | 2484 |
| ParallelLinqRowByRowWithCopy (1200;800) | 1288 | 1401 |
| ParallelLinqWithGenerator (1200;200) | 4782 | 4868 |
| ParallelLinqWithGenerator (3000;200) | 29752 | 31366 |
| ParallelLinqWithGenerator (1200;800) | 16626 | 16855 |
| ParallelLinqWithSequenceOfPoints (1200;200) | 549 | 533 |
| ParallelLinqWithSequenceOfPoints (3000;200) | 3413 | 3290 |
| ParallelLinqWithSequenceOfPoints (1200;800) | 1462 | 1460 |
| UnorderedParalleLinqInPlace (1200;200) | 422 | 440 |
| UnorderedParalleLinqInPlace (3000;200) | 2586 | 2775 |
| UnorderedParalleLinqInPlace (1200;800) | 1317 | 1475 |
| UnorderedParallelLinqInPlaceWithDelegate (1200;200) | 509 | 514 |
| UnorderedParallelLinqInPlaceWithDelegate (3000;200) | 3093 | 3134 |
| UnorderedParallelLinqInPlaceWithDelegate (1200;800) | 1392 | 1571 |
| UnorderedParallelLinqInPlaceWithGenerator (1200;200) | 5046 | 5511 |
| UnorderedParallelLinqInPlaceWithGenerator (3000;200) | 31657 | 30258 |
| UnorderedParallelLinqInPlaceWithGenerator (1200;800) | 17026 | 19517 |
| UnorderedParallelLinqSimple (1200;200) | 556 | 595 |
| UnorderedParallelLinqSimple (3000;200) | 3449 | 3700 |
| UnorderedParallelLinqSimple (1200;800) | 1448 | 1506 |
| UnorderedParalelLinqWithStruct (1200;200) | 511 | 534 |
| UnorderedParalelLinqWithStruct (3000;200) | 3227 | 3154 |
| UnorderedParalelLinqWithStruct (1200;800) | 1427 | 1445 |
A mixed bag, but overall it looks to me like the June CTP was slightly worse than the older one. Of course, that's assuming that everything else on my computer is the same as it was a couple of weeks ago, etc. I'm not going to claim it's a perfect benchmark by any means. Anyway, can it do any better with the Game of Life?
Game of Life
(Original post)
Results are in frames per second, so more is better.
| Description | December CTP | June CTP |
| ParallelFastInteriorBoard (rendered) | 23 | 22 |
| ParallelFastInteriorBoard (unrendered) | 29 | 28 |
| ParallelFastInteriorBoard | 508 | 592 |
Yes, ParallelFastInteriorBoard really did get that much speed bump, apparently. I have no idea why... which leads me to the slightly disturbing conclusion of this post:
Conclusion
The numbers above don't mean much. At least, they're not particularly useful because I don't understand them. Why would a few tests become "slightly significantly worse" and one particular test get markedly better? Do I have serious benchmarking issues in terms of my test rig? (I'm sure it could be a lot "cleaner" - I didn't think it would make very much difference.)
I've always been aware that off-the-cuff benchmarking like this was of slightly dubious value, at least in terms of the details. The only facts which are really useful here are the big jumps due to algorithm etc. The rest is noise which may interest Joe and the team (and hey, if it proves to be useful, that's great) but which is beyond my ability to reason about. Modern machines are complicated beasts, with complicated operating systems and complicated platforms above them.
So ignore the numbers. Maybe the new CTP is faster for your app, maybe it's not. It is important to know it's out there, and that if you're interested in parallelisation but haven't played with it yet, this is a good time to pick it up.