<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://msmvps.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Jon Skeet: Coding Blog : Benchmarking, Parallelisation</title><link>http://msmvps.com/blogs/jon_skeet/archive/tags/Benchmarking/Parallelisation/default.aspx</link><description>Tags: Benchmarking, Parallelisation</description><dc:language>en</dc:language><generator>CommunityServer 2008.5 SP2 (Build: 40407.4157)</generator><item><title>Revisiting randomness</title><link>http://msmvps.com/blogs/jon_skeet/archive/2009/11/04/revisiting-randomness.aspx</link><pubDate>Wed, 04 Nov 2009 07:40:15 GMT</pubDate><guid isPermaLink="false">d67277c4-116b-43f1-b688-e9ef184ea916:1737577</guid><dc:creator>skeet</dc:creator><slash:comments>18</slash:comments><wfw:commentRss xmlns:wfw="http://wellformedweb.org/CommentAPI/">http://msmvps.com/blogs/jon_skeet/rsscomments.aspx?PostID=1737577</wfw:commentRss><wfw:comment xmlns:wfw="http://wellformedweb.org/CommentAPI/">http://msmvps.com/blogs/jon_skeet/commentapi.aspx?PostID=1737577</wfw:comment><comments>http://msmvps.com/blogs/jon_skeet/archive/2009/11/04/revisiting-randomness.aspx#comments</comments><description>&lt;p&gt;Almost every Stack Overflow question which includes the words &amp;quot;random&amp;quot; and &amp;quot;repeated&amp;quot; has the same basic answer. It&amp;#39;s one of the most common &amp;quot;gotchas&amp;quot; in .NET, Java, and no doubt other platforms: creating a new random number generator without specifying a seed will depend on the current instant of time. The current time as measured by the computer doesn&amp;#39;t change very often compared with how often you can create and use a random number generator – so code which repeatedly creates a new instance of Random and uses it once will end up showing a lot of repetition.&lt;/p&gt;  &lt;p&gt;One common solution is to use a static field to store a single instance of Random and reuse it. That&amp;#39;s okay in Java (where Random is thread-safe) but it&amp;#39;s not so good in .NET – if you use the same instance repeatedly from .NET, you can corrupt the internal data structures.&lt;/p&gt;  &lt;p&gt;A long time ago, I created a &lt;a href="http://pobox.com/~skeet/csharp/miscutil/usage/staticrandom.html"&gt;StaticRandom class&lt;/a&gt; in &lt;a href="http://pobox.com/~skeet/csharp/miscutil"&gt;MiscUtil&lt;/a&gt; – essentially, it was just a bunch of static methods (to mirror the instance methods found in Random) wrapping a single instance of Random and locking appropriately. This allows you to just call StaticRandom.Next(1, 7) to roll a die, for example. However, it has a couple of problems:&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;It doesn&amp;#39;t scale well in a multi-threaded environment. When I originally wrote it, I benchmarked an alternative approach using [ThreadStatic] and at the time, locking won (at least on my computer, which may well have only had a single core). &lt;/li&gt;    &lt;li&gt;It doesn&amp;#39;t provide any way of getting at an instance of Random, other than by using new Random(StaticRandom.Next()). &lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;The latter point is mostly a problem because it encourages a style of coding where you just use StaticRandom.Next(…) any time you want a random number. This is undoubtedly convenient in some situations, but it goes against the idea of &lt;a href="http://stackoverflow.com/questions/1667516/doesnt-passing-in-parameters-that-should-be-known-implicitly-violate-encapsulati/1667590#1667590"&gt;treating a source of randomness as a service or dependency&lt;/a&gt;. It makes it harder to get repeatability and to see what needs that dependency.&lt;/p&gt;  &lt;p&gt;I could have just added a method generating a new instance into the existing class, but I decided to play with a different approach – going back to per-thread instances, but this time using the &lt;a href="http://msdn.microsoft.com/en-us/library/dd642243(VS.100).aspx"&gt;ThreadLocal&amp;lt;T&amp;gt; class&lt;/a&gt; introduced in .NET 4.0. Here&amp;#39;s the resulting code:&lt;/p&gt;  &lt;div class="code"&gt;&lt;span class="Namespace"&gt;using&lt;/span&gt; System;     &lt;br /&gt;&lt;span class="Namespace"&gt;using&lt;/span&gt; System.Threading;     &lt;br /&gt;    &lt;br /&gt;&lt;span class="Namespace"&gt;namespace&lt;/span&gt; RandomDemo     &lt;br /&gt;{     &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; &lt;span class="XmlComment"&gt;/// &amp;lt;summary&amp;gt;&lt;/span&gt;     &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; &lt;span class="XmlComment"&gt;/// Convenience class for dealing with randomness.&lt;/span&gt;     &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; &lt;span class="XmlComment"&gt;/// &amp;lt;/summary&amp;gt;&lt;/span&gt;     &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; &lt;span class="Modifier"&gt;public&lt;/span&gt;&amp;#160;&lt;span class="Modifier"&gt;static&lt;/span&gt;&amp;#160;&lt;span class="ReferenceType"&gt;class&lt;/span&gt; ThreadLocalRandom     &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; {     &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; &lt;span class="XmlComment"&gt;/// &amp;lt;summary&amp;gt;&lt;/span&gt;     &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; &lt;span class="XmlComment"&gt;/// Random number generator used to generate seeds,&lt;/span&gt;     &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; &lt;span class="XmlComment"&gt;/// which are then used to create new random number&lt;/span&gt;     &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; &lt;span class="XmlComment"&gt;/// generators on a per-thread basis.&lt;/span&gt;     &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; &lt;span class="XmlComment"&gt;/// &amp;lt;/summary&amp;gt;&lt;/span&gt;     &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; &lt;span class="Modifier"&gt;private&lt;/span&gt;&amp;#160;&lt;span class="Modifier"&gt;static&lt;/span&gt;&amp;#160;&lt;span class="Modifier"&gt;readonly&lt;/span&gt; Random globalRandom = &lt;span class="Keyword"&gt;new&lt;/span&gt; Random();     &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; &lt;span class="Modifier"&gt;private&lt;/span&gt;&amp;#160;&lt;span class="Modifier"&gt;static&lt;/span&gt;&amp;#160;&lt;span class="Modifier"&gt;readonly&lt;/span&gt;&amp;#160;&lt;span class="ReferenceType"&gt;object&lt;/span&gt; globalLock = &lt;span class="Keyword"&gt;new&lt;/span&gt;&amp;#160;&lt;span class="ReferenceType"&gt;object&lt;/span&gt;();     &lt;br /&gt;    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; &lt;span class="XmlComment"&gt;/// &amp;lt;summary&amp;gt;&lt;/span&gt;     &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; &lt;span class="XmlComment"&gt;/// Random number generator &lt;/span&gt;    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; &lt;span class="XmlComment"&gt;/// &amp;lt;/summary&amp;gt;&lt;/span&gt;     &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; &lt;span class="Modifier"&gt;private&lt;/span&gt;&amp;#160;&lt;span class="Modifier"&gt;static&lt;/span&gt;&amp;#160;&lt;span class="Modifier"&gt;readonly&lt;/span&gt; ThreadLocal&amp;lt;Random&amp;gt; threadRandom = &lt;span class="Keyword"&gt;new&lt;/span&gt; ThreadLocal&amp;lt;Random&amp;gt;(NewRandom);     &lt;br /&gt;    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; &lt;span class="XmlComment"&gt;/// &amp;lt;summary&amp;gt;&lt;/span&gt;     &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; &lt;span class="XmlComment"&gt;/// Creates a new instance of Random. The seed is derived &lt;/span&gt;    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; &lt;span class="XmlComment"&gt;/// from a global (static) instance of Random, rather &lt;/span&gt;    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; &lt;span class="XmlComment"&gt;/// than time.&lt;/span&gt;     &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; &lt;span class="XmlComment"&gt;/// &amp;lt;/summary&amp;gt;&lt;/span&gt;     &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; &lt;span class="Modifier"&gt;public&lt;/span&gt;&amp;#160;&lt;span class="Modifier"&gt;static&lt;/span&gt; Random NewRandom()     &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; {     &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; &lt;span class="Statement"&gt;lock&lt;/span&gt; (globalLock)     &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; {     &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; &lt;span class="Statement"&gt;return&lt;/span&gt;&amp;#160;&lt;span class="Keyword"&gt;new&lt;/span&gt; Random(globalRandom.Next());     &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; }     &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; }     &lt;br /&gt;    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; &lt;span class="XmlComment"&gt;/// &amp;lt;summary&amp;gt;&lt;/span&gt;     &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; &lt;span class="XmlComment"&gt;/// Returns an instance of Random which can be used freely&lt;/span&gt;     &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; &lt;span class="XmlComment"&gt;/// within the current thread.&lt;/span&gt;     &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; &lt;span class="XmlComment"&gt;/// &amp;lt;/summary&amp;gt;&lt;/span&gt;     &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; &lt;span class="Modifier"&gt;public&lt;/span&gt;&amp;#160;&lt;span class="Modifier"&gt;static&lt;/span&gt; Random Instance { get { &lt;span class="Statement"&gt;return&lt;/span&gt; threadRandom.Value; } }     &lt;br /&gt;    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; &lt;span class="XmlComment"&gt;/// &amp;lt;summary&amp;gt;See &amp;lt;see cref=&amp;quot;Random.Next()&amp;quot; /&amp;gt;&amp;lt;/summary&amp;gt;&lt;/span&gt;     &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; &lt;span class="Modifier"&gt;public&lt;/span&gt;&amp;#160;&lt;span class="Modifier"&gt;static&lt;/span&gt;&amp;#160;&lt;span class="ValueType"&gt;int&lt;/span&gt; Next()     &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; {     &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; &lt;span class="Statement"&gt;return&lt;/span&gt; Instance.Next();     &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; }     &lt;br /&gt;    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; &lt;span class="XmlComment"&gt;/// &amp;lt;summary&amp;gt;See &amp;lt;see cref=&amp;quot;Random.Next(int)&amp;quot; /&amp;gt;&amp;lt;/summary&amp;gt;&lt;/span&gt;     &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; &lt;span class="Modifier"&gt;public&lt;/span&gt;&amp;#160;&lt;span class="Modifier"&gt;static&lt;/span&gt;&amp;#160;&lt;span class="ValueType"&gt;int&lt;/span&gt; Next(&lt;span class="ValueType"&gt;int&lt;/span&gt; maxValue)     &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; {     &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; &lt;span class="Statement"&gt;return&lt;/span&gt; Instance.Next(maxValue);     &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; }     &lt;br /&gt;    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; &lt;span class="XmlComment"&gt;/// &amp;lt;summary&amp;gt;See &amp;lt;see cref=&amp;quot;Random.Next(int, int)&amp;quot; /&amp;gt;&amp;lt;/summary&amp;gt;&lt;/span&gt;     &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; &lt;span class="Modifier"&gt;public&lt;/span&gt;&amp;#160;&lt;span class="Modifier"&gt;static&lt;/span&gt;&amp;#160;&lt;span class="ValueType"&gt;int&lt;/span&gt; Next(&lt;span class="ValueType"&gt;int&lt;/span&gt; minValue, &lt;span class="ValueType"&gt;int&lt;/span&gt; maxValue)     &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; {     &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; &lt;span class="Statement"&gt;return&lt;/span&gt; Instance.Next(minValue, maxValue);     &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; }     &lt;br /&gt;    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; &lt;span class="XmlComment"&gt;/// &amp;lt;summary&amp;gt;See &amp;lt;see cref=&amp;quot;Random.NextDouble()&amp;quot; /&amp;gt;&amp;lt;/summary&amp;gt;&lt;/span&gt;     &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; &lt;span class="Modifier"&gt;public&lt;/span&gt;&amp;#160;&lt;span class="Modifier"&gt;static&lt;/span&gt;&amp;#160;&lt;span class="ValueType"&gt;double&lt;/span&gt; NextDouble()     &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; {     &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; &lt;span class="Statement"&gt;return&lt;/span&gt; Instance.NextDouble();     &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; }     &lt;br /&gt;    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; &lt;span class="XmlComment"&gt;/// &amp;lt;summary&amp;gt;See &amp;lt;see cref=&amp;quot;Random.NextBytes(byte[])&amp;quot; /&amp;gt;&amp;lt;/summary&amp;gt;&lt;/span&gt;     &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; &lt;span class="Modifier"&gt;public&lt;/span&gt;&amp;#160;&lt;span class="Modifier"&gt;static&lt;/span&gt;&amp;#160;&lt;span class="ValueType"&gt;void&lt;/span&gt; NextBytes(&lt;span class="ValueType"&gt;byte&lt;/span&gt;[] buffer)     &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; {     &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; Instance.NextBytes(buffer);     &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; }     &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; }     &lt;br /&gt;} &lt;/div&gt;  &lt;p&gt;The user can still call the static Next(…) methods if they want, but they can also get at the thread-local instance of Random by calling ThreadLocalRandom.Instance – or easily create a new instance with ThreadLocalRandom.NewRandom(). (The fact that NewRandom uses the global instance rather than the thread-local one is an implementation detail really; it happens to be convenient from the point of view of the ThreadLocal&amp;lt;T&amp;gt; constructor. It wouldn&amp;#39;t be terribly hard to change this.)&lt;/p&gt;  &lt;p&gt;Now it&amp;#39;s easy to write a method which needs randomness (e.g. to shuffle a deck of cards) and give it a Random parameter, then call it using the thread-local instance:&lt;/p&gt;  &lt;div class="code"&gt;&lt;span class="Modifier"&gt;public&lt;/span&gt;&amp;#160;&lt;span class="ValueType"&gt;void&lt;/span&gt; Shuffle(Random rng)     &lt;br /&gt;{     &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; ...     &lt;br /&gt;}     &lt;br /&gt;...     &lt;br /&gt;deck.Shuffle(ThreadLocalRandom.Instance); &lt;/div&gt;  &lt;p&gt;The Shuffle method is then easier to test and debug, and expresses its dependency explicitly.&lt;/p&gt;  &lt;h3&gt;Performance&lt;/h3&gt;  &lt;p&gt;I tested the &amp;quot;old&amp;quot; and &amp;quot;new&amp;quot; implementations in a very simple way – for varying numbers of threads, I called Next() a fixed number of times (from each thread) and timed how long it took for all the threads to finish. I&amp;#39;ve also tried a .NET-3.5-compatible version using ThreadStatic instead of ThreadLocal&amp;lt;T&amp;gt;, and running the original code and the ThreadStatic version on .NET 3.5 as well.&lt;/p&gt;  &lt;table border="1" cellspacing="0" cellpadding="2"&gt;&lt;tbody&gt;     &lt;tr&gt;       &lt;td align="right"&gt;Threads&lt;/td&gt;        &lt;td align="right"&gt;StaticRandom (4.0b2)&lt;/td&gt;        &lt;td align="right"&gt;ThreadLocalRandom (4.0b2)&lt;/td&gt;        &lt;td align="right"&gt;ThreadStaticRandom (4.0b2)&lt;/td&gt;        &lt;td align="right"&gt;StaticRandom(3.5)&lt;/td&gt;        &lt;td align="right"&gt;ThreadStaticRandom (3.5)&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td align="right"&gt;1&lt;/td&gt;        &lt;td align="right"&gt;11582&lt;/td&gt;        &lt;td align="right"&gt;6016&lt;/td&gt;        &lt;td align="right"&gt;10150&lt;/td&gt;        &lt;td align="right"&gt;10373&lt;/td&gt;        &lt;td align="right"&gt;16049&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td align="right"&gt;2&lt;/td&gt;        &lt;td align="right"&gt;24667&lt;/td&gt;        &lt;td align="right"&gt;7214&lt;/td&gt;        &lt;td align="right"&gt;9043&lt;/td&gt;        &lt;td align="right"&gt;25062&lt;/td&gt;        &lt;td align="right"&gt;17257&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td align="right"&gt;3&lt;/td&gt;        &lt;td align="right"&gt;38095&lt;/td&gt;        &lt;td align="right"&gt;10295&lt;/td&gt;        &lt;td align="right"&gt;14771&lt;/td&gt;        &lt;td align="right"&gt;36827&lt;/td&gt;        &lt;td align="right"&gt;25625&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td align="right"&gt;4&lt;/td&gt;        &lt;td align="right"&gt;49402&lt;/td&gt;        &lt;td align="right"&gt;13435&lt;/td&gt;        &lt;td align="right"&gt;19116&lt;/td&gt;        &lt;td align="right"&gt;47882&lt;/td&gt;        &lt;td align="right"&gt;34415&lt;/td&gt;     &lt;/tr&gt;   &lt;/tbody&gt;&lt;/table&gt;  &lt;p&gt;A few things to take away from this:&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;The numbers were slightly erratic; somehow it was quicker to do twice the work with ThreadStaticRandom on .NET 4.0b2! This isn&amp;#39;t the perfect benchmarking machine; we&amp;#39;re interested in trends rather than absolute figures. &lt;/li&gt;    &lt;li&gt;Locking hasn&amp;#39;t changed much in performance between framework versions &lt;/li&gt;    &lt;li&gt;ThreadStatic has improved &lt;em&gt;massively&lt;/em&gt; between .NET 3.5 and 4.0 &lt;/li&gt;    &lt;li&gt;Even on 3.5, ThreadStatic wins over a global lock as soon as there&amp;#39;s contention &lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;The only downside to the ThreadLocal&amp;lt;T&amp;gt; version is that it requires .NET 4.0 - but the ThreadStatic version works pretty well too.&lt;/p&gt;  &lt;p&gt;It&amp;#39;s worth bearing in mind that of course this is testing the highly unusual situation where there&amp;#39;s a &lt;em&gt;lot&lt;/em&gt; of contention in the global lock version. The performance difference in the single-threaded version where the lock is always uncontended is still present, but very small.&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Update&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;After reading the comments and thinking further, I would indeed get rid of the static methods elsewhere. Also, for the purposes of dependency injection, I agree that it&amp;#39;s a good idea to have a factory interface &lt;em&gt;where that&amp;#39;s not overkill&lt;/em&gt;. The factory implementation could use either the ThreadLocal or ThreadStatic implementations, or effectively use the global lock version (by having its own instance of Random and a lock). In many cases I&amp;#39;d regard that as overkill, however.&lt;/p&gt;  &lt;p&gt;One other interesting option would be to create a thread-safe instance of Random to start with, which delegated to thread-local &amp;quot;normal&amp;quot; implementations. That would be very useful from a DI standpoint.&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;&lt;img src="http://msmvps.com/aggbug.aspx?PostID=1737577" width="1" height="1"&gt;</description><category domain="http://msmvps.com/blogs/jon_skeet/archive/tags/C_2300_/default.aspx">C#</category><category domain="http://msmvps.com/blogs/jon_skeet/archive/tags/Parallelisation/default.aspx">Parallelisation</category><category domain="http://msmvps.com/blogs/jon_skeet/archive/tags/Stack+Overflow/default.aspx">Stack Overflow</category><category domain="http://msmvps.com/blogs/jon_skeet/archive/tags/Benchmarking/default.aspx">Benchmarking</category></item><item><title>Buffering vs streaming benchmark: first results</title><link>http://msmvps.com/blogs/jon_skeet/archive/2009/03/24/buffering-vs-streaming-benchmark-first-results.aspx</link><pubDate>Tue, 24 Mar 2009 01:34:00 GMT</pubDate><guid isPermaLink="false">d67277c4-116b-43f1-b688-e9ef184ea916:1681085</guid><dc:creator>skeet</dc:creator><slash:comments>11</slash:comments><wfw:commentRss xmlns:wfw="http://wellformedweb.org/CommentAPI/">http://msmvps.com/blogs/jon_skeet/rsscomments.aspx?PostID=1681085</wfw:commentRss><wfw:comment xmlns:wfw="http://wellformedweb.org/CommentAPI/">http://msmvps.com/blogs/jon_skeet/commentapi.aspx?PostID=1681085</wfw:comment><comments>http://msmvps.com/blogs/jon_skeet/archive/2009/03/24/buffering-vs-streaming-benchmark-first-results.aspx#comments</comments><description>&lt;p&gt;My poor laptop&amp;#39;s had a busy weekend. It&amp;#39;s run 72 tests, rebooting between each test. Most of these tests have kept both the CPU and disk busy for a lot of the time. I expect to update this blog post with more numbers - and possibly more strategies - as time goes on, but I wanted to get the numbers out as quickly as possible. Graphs should be coming when I&amp;#39;ve got a decent network to experiment with Google graphing.&lt;/p&gt;
&lt;h3&gt;Source code and setup&lt;/h3&gt;
&lt;p&gt;While I don&amp;#39;t really expect anyone else to sacrifice their computer for the best part of 24 hours, it doesn&amp;#39;t take &lt;em&gt;very&lt;/em&gt; much effort to get the tests running. &lt;a href="http://pobox.com/~skeet/csharp/blogfiles/BufferingAndStreaming.zip"&gt;This zip file&lt;/a&gt; contains the source code for three programs, a batch file to generate the test data, and a command file.&lt;/p&gt;
&lt;p&gt;The CreateFiles executable creates input files based on its command line arguments. (This is run multiple times by the batch file.) EncryptFiles &amp;quot;encrypts&amp;quot; (I use the term loosely) all the files in a given directory, writing the results to another directory. The class design of EncryptFiles is a little bit funny, but it&amp;#39;s really tailored for these tests. It should be easy to write other strategies in the same way too.&lt;/p&gt;
&lt;p&gt;ExecuteAndReboot is used to automate the testing. Basically it reads a command file (from a hard-coded directory; simplicity trumps flexibility here). If the file isn&amp;#39;t empty, it waits five minutes (so that after a reboot the computer has time to settle down after loading all its services) and runs the command specified in the first line of the file, appending the output to a results file. When the command has finished executing, ExecuteAndReboot rewrites the command file, removing the first line, and schedules a reboot. The idea is to drop ExecuteAndReboot into a Startup folder, set Windows to log in automatically, reboot once and then just let it go. If you want to use the computer, just wait until it&amp;#39;s in the &amp;quot;waiting five minutes&amp;quot; bit at the start of the cycle and kill ExecuteAndReboot. Next time you reboot it will start testing again.&lt;/p&gt;
&lt;p&gt;That&amp;#39;s the general principle - if anyone &lt;em&gt;does&lt;/em&gt; want to run the tests on their mail and needs a bit of help setting it up, just let me know.&lt;/p&gt;
&lt;h3&gt;Test environment and workload&lt;/h3&gt;
&lt;p&gt;I expect the details of the execution environment to affect the outcome significantly. For the sake of the record then, my laptop specs are:&lt;/p&gt;
&lt;p&gt;Dell Inspiron 1720 (unlikely to be relevant, but...)&lt;br /&gt;Intel Core 2 Duo T7250 2.0GHz&lt;br /&gt;Vista Home Premium 32 bit &lt;br /&gt;3 GB main memory, 32K L1 cache, 2048K L2 cache&lt;br /&gt;2 disks, both 120GB (Fujitsu MHY2120BH, 5400RPM, 8MB cache)&lt;/p&gt;
&lt;p&gt;Even though there are two drives in the system, the tests only ever read from and wrote to a single disk.&lt;/p&gt;
&lt;p&gt;I created four sets of input data; I&amp;#39;ve presented the results for each one separately below, and included the description of the files along with the results. Basically in each case there&amp;#39;s a set of files where each file is the same size, consisting of a fixed number of fixed-length lines.&lt;/p&gt;
&lt;p&gt;The &amp;quot;encryption&amp;quot; that is performed in each case (on the encoded version of the line) is just to XOR each byte with a value. This is repeated a specified number of times, and the XOR value on each pass is the current index, i.e. it first XORs with 0, then 1, then 2 etc. I&amp;#39;ve used work levels of 0 (the loop never runs), 100 and 1000. These were picked somewhat arbitrarily, but they give interesting results. At 0 the process is clearly IO-bound. At 1000 it&amp;#39;s clearly CPU-bound. At 100 it&amp;#39;s not pegging the CPU, but there&amp;#39;s significant activity. I may try 200 as well at some point.&lt;/p&gt;
&lt;p&gt;For each scenario I tried using 1, 2 and 3 threads (and 4 for the big tests; I&amp;#39;ll go back and repeat earlier ones with 4 threads soon). The two strategies used are &amp;quot;streaming&amp;quot;: read a line, encrypt, write the line, repeat and &amp;quot;buffering&amp;quot;: read a whole file, encrypt it all, write it all out. After a thread has started, it needs no shared data at all - it&amp;#39;s given the complete list of files to encrypt at the very start.&lt;/p&gt;
&lt;h3&gt;Results&lt;/h3&gt;
&lt;p&gt;For every table, the number of threads is shown by the leftmost column, and the work level is shown by the topmost row. The cells are of the form &amp;quot;x/y&amp;quot; where &amp;quot;x&amp;quot; is the time taken for the streaming strategy, and &amp;quot;y&amp;quot; is the time taken for the buffering strategy. All times are in seconds. For each column, I&amp;#39;ve highlighted the optimal test.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Small tests&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;There are 10000 test files. Each is 100 lines long, with 100 characters per line.&lt;/p&gt;
&lt;table&gt;

&lt;tr align="right"&gt;
&lt;td&gt;Threads/Work&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;100&lt;/td&gt;
&lt;td&gt;1000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr align="right"&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;47&lt;/strong&gt;/90&lt;/td&gt;
&lt;td&gt;94/&lt;strong&gt;79&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;163/158&lt;/td&gt;
&lt;/tr&gt;
&lt;tr align="right"&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;95/112&lt;/td&gt;
&lt;td&gt;113/157&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;129&lt;/strong&gt;/146&lt;/td&gt;
&lt;/tr&gt;
&lt;tr align="right"&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;179/128&lt;/td&gt;
&lt;td&gt;130/144&lt;/td&gt;
&lt;td&gt;152/160&lt;/td&gt;
&lt;/tr&gt;
&lt;tr align="right"&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;144/139&lt;/td&gt;
&lt;td&gt;163/193&lt;/td&gt;
&lt;td&gt;156/204&lt;/td&gt;
&lt;/tr&gt;

&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;Medium tests (short lines)&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;There are 100 test files. Each is 500,000 lines long, with 20 characters per line.&lt;/p&gt;
&lt;table&gt;

&lt;tr align="right"&gt;
&lt;td&gt;Threads/Work&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;100&lt;/td&gt;
&lt;td&gt;1000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr align="right"&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;138&lt;/strong&gt;/157&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;219&lt;/strong&gt;/267&lt;/td&gt;
&lt;td&gt;1125/1205&lt;/td&gt;
&lt;/tr&gt;
&lt;tr align="right"&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;196/238&lt;/td&gt;
&lt;td&gt;259/263&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;604&lt;/strong&gt;/691&lt;/td&gt;
&lt;/tr&gt;
&lt;tr align="right"&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;292/250&lt;/td&gt;
&lt;td&gt;312/236&lt;/td&gt;
&lt;td&gt;624/676&lt;/td&gt;
&lt;/tr&gt;
&lt;tr align="right"&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;302/291&lt;/td&gt;
&lt;td&gt;321/281&lt;/td&gt;
&lt;td&gt;602/666&lt;/td&gt;
&lt;/tr&gt;

&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;Medium tests (long lines)&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;There are 100 test files. Each is 100 lines long, with 100,000 characters per line.&lt;/p&gt;
&lt;table&gt;

&lt;tr align="right"&gt;
&lt;td&gt;Threads/Work&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;100&lt;/td&gt;
&lt;td&gt;1000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr align="right"&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;203&lt;/strong&gt;/213&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;249&lt;/strong&gt;/286&lt;/td&gt;
&lt;td&gt;1027/1107&lt;/td&gt;
&lt;/tr&gt;
&lt;tr align="right"&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;283/264&lt;/td&gt;
&lt;td&gt;311/296&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;602&lt;/strong&gt;/696&lt;/td&gt;
&lt;/tr&gt;
&lt;tr align="right"&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;319/284&lt;/td&gt;
&lt;td&gt;314/278&lt;/td&gt;
&lt;td&gt;608/618&lt;/td&gt;
&lt;/tr&gt;
&lt;tr align="right"&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;360/300&lt;/td&gt;
&lt;td&gt;336/297&lt;/td&gt;
&lt;td&gt;607/598&lt;/td&gt;
&lt;/tr&gt;

&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;Big tests&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;There are 20 files. Each file is 100,000 lines long, with 1000 characters per line.&lt;/p&gt;
&lt;table&gt;

&lt;tr align="right"&gt;
&lt;td&gt;Threads/Work&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;100&lt;/td&gt;
&lt;td&gt;1000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr align="right"&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;190&lt;/strong&gt;/246&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;349&lt;/strong&gt;/441&lt;/td&gt;
&lt;td&gt;2047/2165&lt;/td&gt;
&lt;/tr&gt;
&lt;tr align="right"&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;365/375&lt;/td&gt;
&lt;td&gt;392/473&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1097&lt;/strong&gt;/1387&lt;/td&gt;
&lt;/tr&gt;
&lt;tr align="right"&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;456/379&lt;/td&gt;
&lt;td&gt;484/399&lt;/td&gt;
&lt;td&gt;1161/1296&lt;/td&gt;
&lt;/tr&gt;
&lt;tr align="right"&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;517/442&lt;/td&gt;
&lt;td&gt;521/431&lt;/td&gt;
&lt;td&gt;1113/1214&lt;/td&gt;
&lt;/tr&gt;

&lt;/table&gt;
&lt;h3&gt;Conclusions&lt;/h3&gt;
&lt;p&gt;I&amp;#39;m loathe to draw &lt;i&gt;too&lt;/i&gt; many conclusions so far. After all, we&amp;#39;ve only got data for a single test environment. It would be interesting to see what would happen on a quad core machine. However, I think a few things can be said with reasonable confidence:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Sometimes the streaming strategy wins, sometimes the buffering strategy wins. The difference can be quite significant either way. For a given input and workload, the streaming solution almost always wins &lt;em&gt;on my machine&lt;/em&gt;.  &lt;/li&gt;
&lt;li&gt;In ideal scenario, the bottlenecked resource should be busy for as much of the time as possible. For example, in a high-CPU task it makes little sense to have idle cores when you&amp;#39;ve already loaded data. Likewise in a disk-bound task we should always be fetching (or writing) data even if all our cores are momentarily busy. Personally I think this is the operating system&amp;#39;s pre-fetch systems&amp;#39;s job. We could do it ourselves, but it&amp;#39;s nicer if we don&amp;#39;t have to.  &lt;/li&gt;
&lt;li&gt;If we&amp;#39;re disk-bound, it makes sense to read (or write) one file at a time, to avoid seeking. This is obvious from the results - for work levels of 0 and 100, a single-thread solution is consistently best (and streaming usually works better buffering). Based on this assumption, I suspect a cleaner solution would be to have a single thread reading largish chunks (but not the whole file) and feeding the other threads - or to use async IO. However, this all gets very complicated. One nice aspect of both of the current solutions is that they&amp;#39;re really simple to implement safely. Reading line-by-line in an async manner is a pain. I&amp;#39;d quite like to write an async &amp;quot;execute for every line&amp;quot; helper at some time, but not just yet.  &lt;/li&gt;
&lt;li&gt;If we&amp;#39;re CPU bound, the buffering solution&amp;#39;s sweet spot is 3 threads (remember we&amp;#39;re using 2 cores) and the streaming solution works better with just 2 threads - introducing more threads will make both the IO and context switching less efficient.  &lt;/li&gt;
&lt;li&gt;The streaming strategy &lt;i&gt;always&lt;/i&gt; uses less application memory than the buffering strategy. When running the buffering strategy against the &amp;quot;big&amp;quot; data set with 4 threads, the memory varied between 800MB and 1.8GB (as measured by the &amp;quot;Total bytes in all heaps&amp;quot; performance counter), vs around 165KB with the streaming strategy. (Obviously the process itself took more than 165KB, but that was the memory used in &amp;quot;normal&amp;quot; managed heap memory.) Using more memory to improve performance is a valid technique, but I&amp;#39;d say it&amp;#39;s only worth it when the improvement is very significant. Additionally, the streaming technique instantly scales to huge data files. While they&amp;#39;re not in the current requirements, it doesn&amp;#39;t hurt to get that ability for free.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Next steps...&lt;/h3&gt;
&lt;p&gt;I&amp;#39;ve included quite a few &amp;quot;I&amp;#39;m going to try to [...]&amp;quot; points in this post. Just for reference, my plans are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Turn off my virus checker! If this makes a significant difference, I&amp;#39;ll rerun all the current tests &lt;/li&gt;
&lt;li&gt;Try to visualise the results with Google graphs - although I think it may get cluttered &lt;/li&gt;
&lt;li&gt;Rerun big/buffering/1000 test - odd results &lt;/li&gt;
&lt;li&gt;Rerun tests with a work level of 200 and keep an eye on CPU usage - I&amp;#39;d like to spot where we tip between CPU and IO being the bottleneck  &lt;/li&gt;
&lt;li&gt;Try to optimize the Stream/StreamReader being used - we should be able to skip the FileStream buffer completely, and specifying FileOptions.SequentialScan should give more hints to Windows about how we&amp;#39;re intending to read the data. I&amp;#39;ll try a few different buffer sizes, probably 4K, 8K and 64K, just to see what happens.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is a really interesting little example problem, because it sounds so simple (and it &lt;em&gt;is&lt;/em&gt; simple in terms of the implementation) but it has a lot of environment variables. Of course it would be nicer if we didn&amp;#39;t have to reboot the machine between test runs in order to get a fair result, but never mind...&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;&lt;img src="http://msmvps.com/aggbug.aspx?PostID=1681085" width="1" height="1"&gt;</description><category domain="http://msmvps.com/blogs/jon_skeet/archive/tags/Parallelisation/default.aspx">Parallelisation</category><category domain="http://msmvps.com/blogs/jon_skeet/archive/tags/Benchmarking/default.aspx">Benchmarking</category></item><item><title>Benchmarking IO: buffering vs streaming</title><link>http://msmvps.com/blogs/jon_skeet/archive/2009/03/20/benchmarking-io-buffering-vs-streaming.aspx</link><pubDate>Fri, 20 Mar 2009 11:13:31 GMT</pubDate><guid isPermaLink="false">d67277c4-116b-43f1-b688-e9ef184ea916:1679878</guid><dc:creator>skeet</dc:creator><slash:comments>24</slash:comments><wfw:commentRss xmlns:wfw="http://wellformedweb.org/CommentAPI/">http://msmvps.com/blogs/jon_skeet/rsscomments.aspx?PostID=1679878</wfw:commentRss><wfw:comment xmlns:wfw="http://wellformedweb.org/CommentAPI/">http://msmvps.com/blogs/jon_skeet/commentapi.aspx?PostID=1679878</wfw:comment><comments>http://msmvps.com/blogs/jon_skeet/archive/2009/03/20/benchmarking-io-buffering-vs-streaming.aspx#comments</comments><description>&lt;p&gt;I mentioned in my recent book review that I was concerned about a recommendation to load all of the data from an input file before processing all of it. This seems to me to be a bad idea in an age where Windows prefetch will anticipate what data you need next, etc - allowing you to process efficiently in a streaming fashion.&lt;/p&gt;  &lt;p&gt;However, without any benchmarks I&amp;#39;m just guessing. I&amp;#39;d like to set up a benchmark to test this - it&amp;#39;s an interesting problem which I suspect has lots of nuances. This isn&amp;#39;t about trying to prove the book wrong - it&amp;#39;s about investigating a problem which &lt;em&gt;sounds&lt;/em&gt; relatively simple, but could well not be. I wouldn&amp;#39;t be at all surprised to see that in some cases the streaming solution is faster, and in other cases the buffered solution is faster.&lt;/p&gt;  &lt;h3&gt;The Task&lt;/h3&gt;  &lt;p&gt;The situation presented is like this:&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;We have a bunch of input files, either locally or on the network (I&amp;#39;m probably just going to test locally for now)&lt;/li&gt;    &lt;li&gt;Each file is less than 100MB&lt;/li&gt;    &lt;li&gt;We need to encrypt each line of text in each input file, writing it to a corresponding output file&lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;The method suggested in the book is for each thread to:&lt;/p&gt;  &lt;ol&gt;   &lt;li&gt;Load a file into a List&amp;lt;string&amp;gt;&lt;/li&gt;    &lt;li&gt;Encrypt every line (replacing it in the list)&lt;/li&gt;    &lt;li&gt;Save to a new file&lt;/li&gt; &lt;/ol&gt;  &lt;p&gt;My alternative option is:&lt;/p&gt;  &lt;ol&gt;   &lt;li&gt;Open a TextReader and a TextWriter for the input/output&lt;/li&gt;    &lt;li&gt;Repeatedly read a line, encrypt, write the encrypted line until we&amp;#39;ve exhausted the input file&lt;/li&gt;    &lt;li&gt;Close both the reader and the writer&lt;/li&gt; &lt;/ol&gt;  &lt;p&gt;These are the two implementations I want to test. I strongly suspect that the &lt;em&gt;optimal &lt;/em&gt;solution would involve async IO, but doing an async version of ReadLine is a real pain for various reasons. I&amp;#39;m going to keep it simple - using plain threading, no TPL etc.&lt;/p&gt;  &lt;p&gt;I haven&amp;#39;t written any code yet. This is where you come in - not to write the code for me, but to make sure I test in a useful way.&lt;/p&gt;  &lt;h3&gt;Environmental variations&lt;/h3&gt;  &lt;p&gt;My plan of attack is to first write a small program to generate the input files. These will just be random text files, and the program will have a few command line parameters:&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;Directory to put files under (one per test variation, basically)&lt;/li&gt;    &lt;li&gt;Number of files to create&lt;/li&gt;    &lt;li&gt;Number of lines per file&lt;/li&gt;    &lt;li&gt;Number of characters per line&lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;I&amp;#39;ll probably test a high and a low number for each of the last three parameters, possibly omitting a few variations for practical reasons.&lt;/p&gt;  &lt;p&gt;In an ideal world I&amp;#39;d test on several different computers, locally and networked, but that just isn&amp;#39;t practical. In particular I&amp;#39;d be interested to see how much difference an SSD (low seek time) makes to this test. I&amp;#39;ll be using my normal laptop, which is a dual core Core Duo with two normal laptop disks. I may well try using different drives for reading and writing to see how much difference that makes.&lt;/p&gt;  &lt;h3&gt;Benchmarking&lt;/h3&gt;  &lt;p&gt;The benchmark program will also have a few command line parameters:&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;Directory to read files from&lt;/li&gt;    &lt;li&gt;Directory to write files to&lt;/li&gt;    &lt;li&gt;Number of threads to use (in some cases I &lt;em&gt;suspect&lt;/em&gt; that more threads than cores will be useful, to avoid cores idling while data is read for a blocking thread)&lt;/li&gt;    &lt;li&gt;Strategy to use (buffered or streaming)&lt;/li&gt;    &lt;li&gt;Encryption work level&lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;The first three parameters here are pretty self-explanatory, but the encryption work level isn&amp;#39;t. Basically I want to be able to vary the difficulty of the task, which will vary whether it ends up being CPU-bound or IO-bound (I expect). So, for a particular line I will:&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;Convert to binary (using Encoding.ASCII - I&amp;#39;ll generate just ASCII files)&lt;/li&gt;    &lt;li&gt;Encrypt the binary data&lt;/li&gt;    &lt;li&gt;Encrypt the encrypted binary data&lt;/li&gt;    &lt;li&gt;Encrypt the encrypted encrypted [...] etc until we&amp;#39;ve hit the number given by the encryption work level&lt;/li&gt;    &lt;li&gt;Base64 encode the result - this will be the output line&lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;So with an encryption work level of 1 I&amp;#39;ll just encrypt once. With a work level of 2 I&amp;#39;ll encrypt twice, etc. This is purely for the sake of giving the computer something to do. I&amp;#39;ll use AES unless anyone has a better suggestion. (Another option would be to just use an XOR or something else incredibly simple.) The key/IV will be fixed for all tests, just in case that has a bearing on anything.&lt;/p&gt;  &lt;p&gt;The benchmarking program is going to be as simple as I can possibly make it:&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;Start a stopwatch&lt;/li&gt;    &lt;li&gt;Read the names of all the files in the directory&lt;/li&gt;    &lt;li&gt;Create a list of files for each thread to encrypt&lt;/li&gt;    &lt;li&gt;Create and start the threads&lt;/li&gt;    &lt;li&gt;Use Thread.Join on all the threads&lt;/li&gt;    &lt;li&gt;Stop the stopwatch and report the time taken&lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;No rendezvous required at all, which certainly simplifies things. By creating the work list before the thread, I don&amp;#39;t need to worry about memory model issues. It should all just be fine.&lt;/p&gt;  &lt;p&gt;In the absence of a better way of emptying all the file read caches (at the Windows and disk levels) I plan to reboot my computer between test runs (which makes it pretty expensive in terms of time spent - hence omitting some variations). I wasn&amp;#39;t planning on shutting services etc down: I really hope that Vista won&amp;#39;t do anything silly like trying to index the disk while I&amp;#39;ve got a heavy load going. Obviously I won&amp;#39;t run any other applications at the same time.&lt;/p&gt;  &lt;p&gt;If anyone has any suggested changes, I&amp;#39;d be very glad to hear them. Have I missed anything? Should I run a test where the file sizes vary? Is there a better way of flushing all caches than rebooting?&lt;/p&gt;  &lt;p&gt;I don&amp;#39;t know exactly when I&amp;#39;m going to find time to do all of this, but I&amp;#39;ll get there eventually :)&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;&lt;img src="http://msmvps.com/aggbug.aspx?PostID=1679878" width="1" height="1"&gt;</description><category domain="http://msmvps.com/blogs/jon_skeet/archive/tags/C_2300_/default.aspx">C#</category><category domain="http://msmvps.com/blogs/jon_skeet/archive/tags/Parallelisation/default.aspx">Parallelisation</category><category domain="http://msmvps.com/blogs/jon_skeet/archive/tags/Benchmarking/default.aspx">Benchmarking</category></item></channel></rss>