Tech-Ed developers Barcelona: Friday
Today is the last day of tech-ed. Hopefully I’ll arrive in Brussels somewhere around 23:00.
I really start to miss my wife and daughter, so I am glad I’ll see them again tonight.
Yesterday evening was an uneventful evening in my hotel room. I decided to start reading that book I bought on WPF to pass the time.
After a cup of the Happy Juice I make my way to the final C++ talk of this conference.
DEV407: Visual C++ new optimizations
This session is hosted by Ayman Shoukry of the VC++ compiler team.
There are a good 50 – 60 people here. Less than other C++ talks, but I suspect that the reason has something to do with the fact that it is 9:00 on a Friday morning, after the community dinner on Thursday evening. Another reason might be that this session has nothing to do with the language itself, or the .NET framework.
Still, not a bad turnout at all.
C++ optimizations is a very interesting topic. It is for this kind of high level stuff that I go to a conference. The coverage was really in-depth, and covered the available compiler optimizations in great detail.
All in all this was a very good presentation with a good Q&A.
Single file optimization
For single file optimization, the VC2005 compiler has improved by 20 – 30 % compared to VC6.0. This alone is reason enough to upgrade if you have computational intensive programs.
Whole program optimization
Whole program optimization basically means that the linker can inspect the object files to see which functions get called where. It the passes the object files back to the compiler to optimize them in certain ways. The major 2 improvements are cross module inlining and custom calling conventions.
Profile guided optimization
If you use PGO, the linker instruments your code with probes. The 2 major probe types are counting probes (how many times is a certain code path taken) and value probes (saves a histogram of values for e.g. function parameters).
Then you run common usage scenarios on your app. The captured data will tell the linker how you use the app, and how it can optimize. It can then create a new binary, based on this info.
Btw, an instrumentized app (with all the probes) is slow. Very very slow. Think about a snail on valium, paralyzed from the neck down, crawling uphill. But that doesn’t matter, because the instrumentized app is used only for analysis.
PGO can yield 20% increases in some cases, because the default code path gets optimized really well. This will of course decrease performance on the non-default path, but since that is non-default, it shouldn’t matter that much.
PGO can use the following mechanisms for optimizing performance:
- Inlining: the linker examines the call graph, and can decide to inline functions at each call site separately.
- Switch expansion: the linker can extract cases from a switch statement if they are executed often.
- Code separation: this will move blocks of code so that the common code path always falls through, and moves uncommon code paths behind it.
- Virtual call speculation: this will introduce a type check whenever a base class is used to call a virtual method. If the class is the same in 90% of the cases, it is quicker to check the type and call the direct method than to execute the virtual call via the base class.
- Partial inlining: this will inline only those parts of a function that is in the hot path, while leaving the cold part in a function.
One thing that is worth to remember with PGO is this:
There are some situations in which you should be able reproduce the exact same binary image from your source code.
E.g in the space industry, if ESA want you to rebuild version X.Y.Z of an application, it had better be the exact same binary image. It is simply not allowed to build a version that might have slightly different timing behavior because that would mandate a complete re run of all acceptance tests.
This means that if you have such a requirement, you have to put the *.pgc files under source control as well, so that the linker can always reuse a known set of instrumentation data to create an exact replica of what you shipped earlier.
Floating point optimization
/Op will be deprecated. The documentation states that it enables greater floating point consistency. The problem is that nobody really understands what it does. Not even the compiler folks themselves. And the old floating point model was outdated anyway.
There are 4 new floating points models that are designed to give you better control of floating point behavior.
- /fp:precise: this is the default, and it gives you a compromise between speed and precision.
- /fp:fast: this will yield the fastest code, but you are not guaranteed the last digit of precision. Whether this is acceptable or not depends on the application.
- /fp:except: this will cause fp code to throw exceptions on the exact line on which they should occur, whenever there should be an exception.
- /fp:strict: this implies /fp:except, and will yield the best precision possible, but with a performance penalty compared to /fp:fast or /fp:precise.
Now, if – at this moment – you are thinking ‘OMG M$FT is shipping bad code’ then this is the point where I tell you to calm down and sing kumbaya.
In case you didn’t know, there is a fundamental problem with floating point math. Not all numbers are representable in the IEEE floating point model.
One problem is that 1e20 + 1 is still 1e20 using the IEEE fp, because the mathematical result is simply not representable in that format. Hence the result defaults to the closest number that is representable, which is 1e20 again.
Another problem is that, while A + B + C is equal to C + B + A on a mathematical level, this is not true at code level. It all depends on the precision and the order of magnitude of A, B, C and the intermediate results.
This issue is there for ALL compilers out there, not only C++, but also in C#, Java or whatever language you are programming in.
Floating point math is extremely hard to understand, and the value of a result depends on the exact sequence of instructions. Given an equation, it is very hard to determine if the result is correct or not, because correct’ has a fuzzy boundary in the IEEE floating point model. You can only say that code is correct for a given definition of ‘correct’.
Private talk about Interop
Craig Kitterman (MSFT) invited me to a private talk about the interoperability community and what I could contribute to it.
There is no interesting session in this slot anyway, so I decide to accept.
This is MVP stuff so I won’t tell more of it.
DEV004: building a distributed application with .NET 3.0
This session is hosted by Christian Weyer.
It is to be a demo of building a distributed application. It sounds interesting enough to spend my time here.
It is a level 400 session so I hope I can keep up.
Turns out I can’t.
The speaker is very good, but the application is already finished, and he is demonstrating the architecture and talking about things like service contracts etc. A good talk, but wasted on me.
Goodbye
The last session slot does not have anything that could interest me. This makes sense because a lot of people are leaving already, and they would not want to waste A level sessions on this session slot.
I think I’ll catch an early ride to the airport and do some reading. The battery on my laptop is past its peak, so I can only use it for an hour anymore without needing AC anyway.
I’ll post my afterthoughts on tech-ed somewhere next week.