<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://msmvps.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Search results for 'app:weblogs' matching tags 'Noda Time' and 'Benchmarking'</title><link>http://msmvps.com/search/SearchResults.aspx?q=app:weblogs&amp;tag=Noda+Time,Benchmarking&amp;orTags=0&amp;o=DateDescending</link><description>Search results for 'app:weblogs' matching tags 'Noda Time' and 'Benchmarking'</description><dc:language>en-US</dc:language><generator>CommunityServer 2008.5 SP2 (Build: 40407.4157)</generator><item><title>Optimization and generics, part 2: lambda expressions and reference types</title><link>http://msmvps.com/blogs/jon_skeet/archive/2011/08/23/optimization-and-generics-part-2-lambda-expressions-and-reference-types.aspx</link><pubDate>Tue, 23 Aug 2011 05:00:00 GMT</pubDate><guid isPermaLink="false">d67277c4-116b-43f1-b688-e9ef184ea916:1798036</guid><dc:creator>skeet</dc:creator><description>&lt;p&gt;&lt;strong&gt;As with almost any performance work, your mileage may vary (in particular the 64-bit JIT may work differently) and you almost certainly shouldn&amp;#39;t care. Relatively few people write production code which is worth micro-optimizing. Please don&amp;#39;t take this post as an invitation to make code more complicated for the sake of irrelevant and possibly mythical performance changes.&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;It took me a surprisingly long time to find the problem described in the &lt;a href="http://msmvps.com/blogs/jon_skeet/archive/2011/08/22/optimization-and-generics-part-1-the-new-constraint.aspx"&gt;previous blog post&lt;/a&gt;, and almost no time at all to fix it. I understood why it was happening. This next problem took a while to identify at all, but even when I&amp;#39;d found a workaround I had no idea why it worked. Furthermore, I couldn&amp;#39;t reproduce it in a test case... because I was looking for the wrong set of triggers. I&amp;#39;ve now found at least &lt;em&gt;some&lt;/em&gt; of the problem though.&lt;/p&gt;  &lt;p&gt;This time the situation in Noda Time is harder to describe, although it concerns the same area of code. In various places I need to create new delegates containing parsing steps and add them to the list of steps required for a full parse. I can always use lambda expressions, but in many cases I&amp;#39;ve got the same logic repeatedly... so I decided to pull it out into a method. Bang - suddenly the code runs far slower. (In reality, I&amp;#39;d performed this refactoring first, and &amp;quot;unrefactored&amp;quot; it to speed things up.)&lt;/p&gt;  &lt;p&gt;I &lt;em&gt;think&lt;/em&gt; the problem comes down to method group conversions with generic methods and a type argument which is a reference type. The CLR isn&amp;#39;t very good at them, and the C# compiler uses them more than it needs to.&lt;/p&gt;  &lt;h3&gt;Show me the benchmark!&lt;/h3&gt;  &lt;p&gt;The &lt;a href="http://pobox.com/~skeet/csharp/blogfiles/GenericsAndLambdas.cs"&gt;complete benchmark code&lt;/a&gt; is available of course, but fundamentally I&amp;#39;m doing the same thing in each test case: creating a delegate of type Action which does nothing, and then checking that the delegate reference is non-null (just to avoid the JIT optimizing it away). In each case this is done in a generic method with a single type parameter. I call each method in two ways: once with int as the type argument, and once with string as the type argument. Here are the different cases involved:&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;Use a lambda expression: &lt;font face="Courier New"&gt;&lt;strong&gt;Action foo = () =&amp;gt; {};&lt;/strong&gt;&lt;/font&gt; &lt;/li&gt;    &lt;li&gt;Fake what I &lt;em&gt;expected&lt;/em&gt; the compiler to do: keep a separate generic cache class with a static variable for the delegate; populate the cache once if necessary, and thereafter use the cache field &lt;/li&gt;    &lt;li&gt;Fake what the compiler is &lt;em&gt;actually&lt;/em&gt; doing with the lambda expression: write a separate generic method and perform a method group conversion to it &lt;/li&gt;    &lt;li&gt;Do what the compiler &lt;em&gt;could&lt;/em&gt; do: write a separate non-generic method and perform a method group conversion to it &lt;/li&gt;    &lt;li&gt;Use a method group conversion to a static (non-generic) method on a generic type &lt;/li&gt;    &lt;li&gt;Use a method group conversion to an instance (non-generic) method on a generic type, via a generic cache class with a single field in referring to an instance of the generic class &lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;(Yes, the last one is a bit convoluted - but the line in the method itself is simple: &lt;font face="Courier New"&gt;&lt;strong&gt;Action foo = ClassHolder&amp;lt;T&amp;gt;.SampleInstance.NoOpInstance;&lt;/strong&gt;&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;Remember, we&amp;#39;re doing &lt;em&gt;each&lt;/em&gt; of these in a generic method, and calling that generic method using a type argument of either int or string. (I&amp;#39;ve run a few tests, and the exact type isn&amp;#39;t important - all that matters is that int is a value type, and string is a reference type.)&lt;/p&gt;  &lt;p&gt;Importantly, we&amp;#39;re &lt;em&gt;not&lt;/em&gt; capturing any variables, and the type parameter is &lt;em&gt;not&lt;/em&gt; involved in either the delegate type or any part of the implementation body.&lt;/p&gt;  &lt;h3&gt;Benchmark results&lt;/h3&gt;  &lt;p&gt;Again, times are in milliseconds - but this time I didn&amp;#39;t want to run it for 100 million iterations, as the &amp;quot;slow&amp;quot; versions would have taken far too long. I&amp;#39;ve run this on the x64 JIT as well and seen the same effect, but I haven&amp;#39;t included the figures here.&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Times in milliseconds for 10 million iterations&lt;/strong&gt;&lt;/p&gt;  &lt;table border="1" cellspacing="0" cellpadding="2"&gt;&lt;tbody&gt;     &lt;tr&gt;       &lt;td&gt;&lt;strong&gt;Test&lt;/strong&gt;&lt;/td&gt;        &lt;td align="right"&gt;&lt;strong&gt;TestCase&amp;lt;int&amp;gt;&lt;/strong&gt;&lt;/td&gt;        &lt;td align="right"&gt;&lt;strong&gt;TestCase&amp;lt;string&amp;gt;&lt;/strong&gt;&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td&gt;Lambda expression&lt;/td&gt;        &lt;td align="right"&gt;180&lt;/td&gt;        &lt;td align="right"&gt;29684&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td&gt;Generic cache class&lt;/td&gt;        &lt;td align="right"&gt;90&lt;/td&gt;        &lt;td align="right"&gt;288&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td&gt;Generic method group conversion&lt;/td&gt;        &lt;td align="right"&gt;184&lt;/td&gt;        &lt;td align="right"&gt;30017&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td&gt;Non-generic method group conversion&lt;/td&gt;        &lt;td align="right"&gt;178&lt;/td&gt;        &lt;td align="right"&gt;189&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td&gt;Static method on generic type&lt;/td&gt;        &lt;td align="right"&gt;180&lt;/td&gt;        &lt;td align="right"&gt;29276&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td&gt;Instance method on generic type&lt;/td&gt;        &lt;td align="right"&gt;202&lt;/td&gt;        &lt;td align="right"&gt;299&lt;/td&gt;     &lt;/tr&gt;   &lt;/tbody&gt;&lt;/table&gt;  &lt;p&gt;Yes, it&amp;#39;s about &lt;em&gt;150 times&lt;/em&gt; slower to create a delegate from a generic method with a reference type as the type argument than with a value type... and yet this is the first I&amp;#39;ve heard of this. (I wouldn&amp;#39;t be surprised if there were a post from the CLR team about it somewhere, but I don&amp;#39;t think it&amp;#39;s common knowledge by any means.)&lt;/p&gt;  &lt;h3&gt;Conclusion&lt;/h3&gt;  &lt;p&gt;One of the tricky things is that it&amp;#39;s hard to know exactly what the C# compiler is going to do with any given lambda expression. In fact, the method which was causing me grief earlier on &lt;em&gt;isn&amp;#39;t&lt;/em&gt; generic, but it&amp;#39;s in a generic type and captures some variables which use the type parameters - so perhaps that&amp;#39;s causing a generic method group conversion somewhere along the way.&lt;/p&gt;  &lt;p&gt;Noda Time is a relatively extreme case, but if you&amp;#39;re using delegates in any performance-critical spots, you should really be aware of this issue. I&amp;#39;m going to ping Microsoft (first informally, and then via a Connect report if that would be deemed useful) to see if there&amp;#39;s an awareness of this internally as potential &amp;quot;gotcha&amp;quot;, and whether there&amp;#39;s anything that can be done. Normal trade-offs of work required vs benefit apply, of course. It&amp;#39;s possible that this really is an edge case... but with lambdas flying everywhere these days, I&amp;#39;m not sure that it is.&lt;/p&gt;  &lt;p&gt;Maybe tomorrow I&amp;#39;ll actually be able to finish getting Noda Time moved onto the new system... all of this performance work has been a fun if surprising distraction from the main job of shipping working code...&lt;/p&gt;</description></item><item><title>Optimization and generics, part 1: the new() constraint (updated: now with CLR v2 results)</title><link>http://msmvps.com/blogs/jon_skeet/archive/2011/08/22/optimization-and-generics-part-1-the-new-constraint.aspx</link><pubDate>Mon, 22 Aug 2011 05:00:00 GMT</pubDate><guid isPermaLink="false">d67277c4-116b-43f1-b688-e9ef184ea916:1798032</guid><dc:creator>skeet</dc:creator><description>&lt;p&gt;&lt;strong&gt;As with almost any performance work, your mileage may vary (in particular the 64-bit JIT may work differently) and you almost certainly shouldn&amp;#39;t care. Relatively few people write production code which is worth micro-optimizing. Please don&amp;#39;t take this post as an invitation to make code more complicated for the sake of irrelevant and possibly mythical performance changes.&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;I&amp;#39;ve been doing quite a bit of work on Noda Time recently - and have started getting my head round all the work that James Keesey has put into the parsing/formatting. I&amp;#39;ve been reworking it so that we can do everything without throwing any exceptions, and also to work on the idea of parsing a pattern once and building a sequence of actions for both formatting and parsing from the action. To format or parse a value, we then just need to apply the actions in turn. &lt;a href="http://www.comparethemeerkat.com/"&gt;Simples&lt;/a&gt;.&lt;/p&gt;  &lt;p&gt;Given that this is all in the name of performance (and &lt;a href="http://noda-time.blogspot.com/2010/02/performance-matters.html"&gt;I consider Noda Time to be worth optimizing pretty hard&lt;/a&gt;) I was pretty cross when I ran a complete revamp through the little benchmarking tool we use, and found that my rework had made everything &lt;em&gt;much&lt;/em&gt; slower. Even parsing a value &lt;em&gt;after parsing the pattern&lt;/em&gt; was slower than parsing both the value and the pattern together. Something was clearly very wrong.&lt;/p&gt;  &lt;p&gt;In fact, it turns out that at least two things were very wrong. The first (the subject of this post) was easy to fix and actually made the code a little more flexible. The second (the subject of the next post, which may be tomorrow) is going to be harder to work around.&lt;/p&gt;  &lt;h3&gt;The new() constraint&lt;/h3&gt;  &lt;p&gt;In my SteppedPattern type, I have a generic type parameter - TBucket. It&amp;#39;s already constrained in terms of another type parameter, but that&amp;#39;s irrelevant as far as I&amp;#39;m aware. (After today though, I&amp;#39;m taking very little for granted...) The important thing is that before I try to parse a &lt;em&gt;value&lt;/em&gt;, I want to create a new bucket. The idea is that bits of information end up in the bucket as they&amp;#39;re being parsed, and at the very end we put everything together. So each parse operation requires a new bucket. How can we create one in a nice generic way?&lt;/p&gt;  &lt;p&gt;Well, we can just call its public parameterless constructor. I don&amp;#39;t mind the types involved having such a constructor, so all we need to do is add the new() constraint, and then we can call new TBucket():&lt;/p&gt;  &lt;div class="code"&gt;&lt;span class="InlineComment"&gt;// Somewhat simplified...&lt;/span&gt;     &lt;br /&gt;&lt;span class="Modifier"&gt;internal&lt;/span&gt;&amp;#160;&lt;span class="Modifier"&gt;sealed&lt;/span&gt;&amp;#160;&lt;span class="ReferenceType"&gt;class&lt;/span&gt; SteppedPattern&amp;lt;TResult, TBucket&amp;gt; : IParsePattern&amp;lt;TResult&amp;gt;     &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; &lt;span class="Linq"&gt;where&lt;/span&gt; TBucket : &lt;span class="Keyword"&gt;new&lt;/span&gt;()     &lt;br /&gt;{     &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; &lt;span class="Modifier"&gt;public&lt;/span&gt; ParseResult&amp;lt;TResult&amp;gt; Parse(&lt;span class="ReferenceType"&gt;string&lt;/span&gt; value)     &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; {     &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; TBucket bucket = &lt;span class="Keyword"&gt;new&lt;/span&gt; TBucket();     &lt;br /&gt;    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; &lt;span class="InlineComment"&gt;// Rest of parsing goes here&lt;/span&gt;     &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; }     &lt;br /&gt;} &lt;/div&gt;  &lt;p&gt;Great! Nice and simple. Unfortunately, it turned out that that one line of code was taking 75% of the time to parse a value. Just creating an empty bucket - pretty much the simplest bit of parsing. I was amazed when I discovered that.&lt;/p&gt;  &lt;h3&gt;Fixing it with a provider&lt;/h3&gt;  &lt;p&gt;The fix is reasonably easy. We just need to tell the type how to create an instance, and we can do that with a delegate:&lt;/p&gt;  &lt;div class="code"&gt;&lt;span class="InlineComment"&gt;// Somewhat simplified...&lt;/span&gt;     &lt;br /&gt;&lt;span class="Modifier"&gt;internal&lt;/span&gt;&amp;#160;&lt;span class="Modifier"&gt;sealed&lt;/span&gt;&amp;#160;&lt;span class="ReferenceType"&gt;class&lt;/span&gt; SteppedPattern&amp;lt;TResult, TBucket&amp;gt; : IParsePattern&amp;lt;TResult&amp;gt;     &lt;br /&gt;{     &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; &lt;span class="Modifier"&gt;private&lt;/span&gt;&amp;#160;&lt;span class="Modifier"&gt;readonly&lt;/span&gt; Func&amp;lt;TBucket&amp;gt; bucketProvider;     &lt;br /&gt;    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; &lt;span class="Modifier"&gt;internal&lt;/span&gt; SteppedPattern(Func&amp;lt;TBucket&amp;gt; bucketProvider)     &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; {     &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; &lt;span class="Keyword"&gt;this&lt;/span&gt;.bucketProvider = bucketProvider;     &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; }     &lt;br /&gt;    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; &lt;span class="Modifier"&gt;public&lt;/span&gt; ParseResult&amp;lt;TResult&amp;gt; Parse(&lt;span class="ReferenceType"&gt;string&lt;/span&gt; value)     &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; {     &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; TBucket bucket = bucketProvider();     &lt;br /&gt;    &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; &lt;span class="InlineComment"&gt;// Rest of parsing goes here&lt;/span&gt;     &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; }     &lt;br /&gt;} &lt;/div&gt;  &lt;p&gt;Now I can just call new SteppedPattern(() =&amp;gt; new OffsetBucket()) or whatever. This also means I can keep the constructor internal, not that I care much. I could even reuse old parse buckets if that wouldn&amp;#39;t be a semantic problem - in other cases it could be useful. Hooray for lambda expressions - until we get to the next post, anyway.&lt;/p&gt;  &lt;h3&gt;Show me the figures!&lt;/h3&gt;  &lt;p&gt;You don&amp;#39;t want to have to run Noda Time&amp;#39;s benchmarks to see the results for yourself, so I wrote a small benchmark to time &lt;em&gt;just&lt;/em&gt; the construction of a generic type. As a measure of how insignificant this would be for most apps, these figures are in milliseconds, performing 100 &lt;em&gt;million&lt;/em&gt; iterations of the action in question. Unless you&amp;#39;re going to do this in performance-critical code, you just shouldn&amp;#39;t care.&lt;/p&gt;  &lt;p&gt;Anyway, the benchmark has four custom types: two classes, and two structs - a small and a large version of each. The small version has a single int field; the large version has eight long fields. For each type, I benchmarked both approaches to initialization.&lt;/p&gt;  &lt;p&gt;The results on two machines (32-bit and 64-bit) are below, for both the v2 CLR and v4. The 64-bit machine is much faster in general - you should only compare results within one machine, as it were.)&lt;/p&gt;  &lt;h3&gt;&lt;strong&gt;CLR v4: 32-bit results (ms per 100 million iterations)&lt;/strong&gt;&lt;/h3&gt;  &lt;table border="1" cellspacing="0" cellpadding="2" width="400"&gt;&lt;tbody&gt;     &lt;tr&gt;       &lt;td valign="top" width="133"&gt;Test type&lt;/td&gt;        &lt;td valign="top" width="133"&gt;new() constraint&lt;/td&gt;        &lt;td valign="top" width="133"&gt;Provider delegate&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="133"&gt;Small struct&lt;/td&gt;        &lt;td valign="top" width="133"&gt;689&lt;/td&gt;        &lt;td valign="top" width="133"&gt;1225&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="133"&gt;Large struct&lt;/td&gt;        &lt;td valign="top" width="133"&gt;11188&lt;/td&gt;        &lt;td valign="top" width="133"&gt;7273&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="133"&gt;Small class&lt;/td&gt;        &lt;td valign="top" width="133"&gt;16307&lt;/td&gt;        &lt;td valign="top" width="133"&gt;1690&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="133"&gt;Large class&lt;/td&gt;        &lt;td valign="top" width="133"&gt;17471&lt;/td&gt;        &lt;td valign="top" width="133"&gt;3017&lt;/td&gt;     &lt;/tr&gt;   &lt;/tbody&gt;&lt;/table&gt;  &lt;p&gt;&lt;strong&gt;CLR v4: 64-bit results (ms per 100 million iterations)&lt;/strong&gt;&lt;/p&gt;  &lt;table border="1" cellspacing="0" cellpadding="2" width="401"&gt;&lt;tbody&gt;     &lt;tr&gt;       &lt;td valign="top" width="133"&gt;Test type&lt;/td&gt;        &lt;td valign="top" width="133"&gt;new() constraint&lt;/td&gt;        &lt;td valign="top" width="133"&gt;Provider delegate&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="133"&gt;Small struct&lt;/td&gt;        &lt;td valign="top" width="133"&gt;473&lt;/td&gt;        &lt;td valign="top" width="133"&gt;868&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="133"&gt;Large struct&lt;/td&gt;        &lt;td valign="top" width="133"&gt;2670&lt;/td&gt;        &lt;td valign="top" width="133"&gt;2396&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="133"&gt;Small class&lt;/td&gt;        &lt;td valign="top" width="133"&gt;8366&lt;/td&gt;        &lt;td valign="top" width="133"&gt;1189&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="133"&gt;Large class&lt;/td&gt;        &lt;td valign="top" width="133"&gt;8805&lt;/td&gt;        &lt;td valign="top" width="133"&gt;1529&lt;/td&gt;     &lt;/tr&gt;   &lt;/tbody&gt;&lt;/table&gt;  &lt;h3&gt;&lt;strong&gt;CLR v2: 32-bit results (ms per 100 million iterations)&lt;/strong&gt;&lt;/h3&gt;  &lt;table border="1" cellspacing="0" cellpadding="2" width="400"&gt;&lt;tbody&gt;     &lt;tr&gt;       &lt;td valign="top" width="133"&gt;Test type&lt;/td&gt;        &lt;td valign="top" width="133"&gt;new() constraint&lt;/td&gt;        &lt;td valign="top" width="133"&gt;Provider delegate&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="133"&gt;Small struct&lt;/td&gt;        &lt;td valign="top" width="133"&gt;703&lt;/td&gt;        &lt;td valign="top" width="133"&gt;1246&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="133"&gt;Large struct&lt;/td&gt;        &lt;td valign="top" width="133"&gt;11411&lt;/td&gt;        &lt;td valign="top" width="133"&gt;7392&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="133"&gt;Small class&lt;/td&gt;        &lt;td valign="top" width="133"&gt;143967&lt;/td&gt;        &lt;td valign="top" width="133"&gt;1791&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="133"&gt;Large class&lt;/td&gt;        &lt;td valign="top" width="133"&gt;143107&lt;/td&gt;        &lt;td valign="top" width="133"&gt;2581&lt;/td&gt;     &lt;/tr&gt;   &lt;/tbody&gt;&lt;/table&gt;  &lt;p&gt;&lt;strong&gt;CLR v2: 64-bit results (ms per 100 million iterations)&lt;/strong&gt;&lt;/p&gt;  &lt;table border="1" cellspacing="0" cellpadding="2" width="401"&gt;&lt;tbody&gt;     &lt;tr&gt;       &lt;td valign="top" width="133"&gt;Test type&lt;/td&gt;        &lt;td valign="top" width="133"&gt;new() constraint&lt;/td&gt;        &lt;td valign="top" width="133"&gt;Provider delegate&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="133"&gt;Small struct&lt;/td&gt;        &lt;td valign="top" width="133"&gt;510&lt;/td&gt;        &lt;td valign="top" width="133"&gt;686&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="133"&gt;Large struct&lt;/td&gt;        &lt;td valign="top" width="133"&gt;2334&lt;/td&gt;        &lt;td valign="top" width="133"&gt;1731&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="133"&gt;Small class&lt;/td&gt;        &lt;td valign="top" width="133"&gt;81801&lt;/td&gt;        &lt;td valign="top" width="133"&gt;1539&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="133"&gt;Large class&lt;/td&gt;        &lt;td valign="top" width="133"&gt;83293&lt;/td&gt;        &lt;td valign="top" width="133"&gt;1896&lt;/td&gt;     &lt;/tr&gt;   &lt;/tbody&gt;&lt;/table&gt;  &lt;p&gt;(An earlier version of this post had a mistake - my original tests used structs for everything, despite the names.)&lt;/p&gt;  &lt;p&gt;Others have reported slightly different results, including the new() constraint being better for both large &lt;em&gt;and&lt;/em&gt; small structs.&lt;/p&gt;  &lt;p&gt;Just in case you hadn&amp;#39;t spotted them, look at the results for classes. Those are the real results - it took over 2 minutes to run the test using the new() constraint on my 32-bit laptop, compared with under two seconds for the provider. Yikes. This was actually the situation I was in for Noda Time, which is built on .NET 2.0 - it&amp;#39;s not surprising that so much of my benchmark&amp;#39;s time was spent constructing classes, given results like this.&lt;/p&gt;  &lt;p&gt;Of course you can &lt;a href="http://pobox.com/~skeet/csharp/blogfiles/NewConstraint.cs"&gt;download the benchmark program&lt;/a&gt; for yourself and see how it performs on your machine. It&amp;#39;s a pretty cheap-and-cheerful benchmark, but when the differences are this big, minor sources of inaccuracy don&amp;#39;t bother me too much. The simplest way to run under CLR v2 is to compile with the .NET 3.5 C# compiler to start with.&lt;/p&gt;  &lt;h3&gt;What&amp;#39;s going on under the hood?&lt;/h3&gt;  &lt;p&gt;As far as I&amp;#39;m aware, there&amp;#39;s no IL to give support for the new() constraint. Instead, the compiler emits a call to &lt;a href="http://msdn.microsoft.com/en-us/library/0hcyx2kd.aspx"&gt;Activator.CreateInstance&amp;lt;T&amp;gt;&lt;/a&gt;. Apparently, that&amp;#39;s slower than calling a delegate - presumably due to trying to find an accessible constructor with reflection, and invoking it. I suspect it could be optimized relatively easily - e.g. by caching the results per type it&amp;#39;s called with, in terms of delegates. I&amp;#39;m slightly surprised this &lt;em&gt;hasn&amp;#39;t&lt;/em&gt; (apparently) been optimized, given how easy it is to cache values by generic type. No doubt there&amp;#39;s a good reason lurking there somewhere, even if it&amp;#39;s only the memory taken up by the cache.&lt;/p&gt;  &lt;p&gt;Either way, it&amp;#39;s easy to work around in general.&lt;/p&gt;  &lt;h3&gt;Conclusion&lt;/h3&gt;  &lt;p&gt;I wouldn&amp;#39;t have found this gotcha if I didn&amp;#39;t have before and after tests (or in this case, side-by-side tests of the old way and the new way of parsing). The real lesson of this post shouldn&amp;#39;t be about the new() constraint - it should be how important it is to test performance (assuming you care), and how easy it is to assume certain operations are cheap.&lt;/p&gt;  &lt;p&gt;Next post: something much weirder.&lt;/p&gt;</description></item></channel></rss>