Five Levels of Code Generation
NOTE 31 Jan, 2014: I discussed a sixth level in this post http://bit.ly/1ih3vL5.
NOTE 8 Feb 2014: I discussed why there are four, not five, bullet points in this post http://bit.ly/1cf4Pcu
I want to clarify the five levels of code generation because there’s recently been some confusion on this point with the RyuJIT release, and because I want to reference it in another post I’m writing.
Code generation can refer to…
- Altering code for the purpose of machines
The path from human readable source code to machine language
- DSL (Domain Specific Language)
Changing human instructions of intent, generally into human readable source code
- File Code Generation
Creating human readable source code files from small metadata, or sometimes, altering those files
- Application code generation or architecture expression
Creating entire systems from a significant body of metadata
Did I leave anything out?
The increasing size and abstraction of each level means we work with it fundamentally differently.
We want to know as little as possible about the path from human readable to machine readable. Just make it better and don’t bother me. The step we think about here is the compiler, because we see it. The compiler creates MSIL, an intermediate language. There’s another step of gong from IL to native code, and there’s an amazing team at Microsoft that does that – it happens to be called the code gen team inside of the BCL/CLR team. That’s not what I mean when I say code generation.
The phrase Domain Specific Language means many things to many people. I’m simplifying it to a way to abstract sets of instructions. This happens close to the point of application development – closer than a language like C#. As such, there is a rather high probability of bugs in the expression of the DSL – the next step on the path to native code. Thus most DSLs express themselves in human readable languages.
File code generation is what you almost certainly mean when you say “code generation”. Give me some stuff and make a useful file from it. This is where tools like T4, Razor, and the Visual Studio Custom Tools feature are aimed. And that’s where my upcoming tool is aimed.
Architecture expression may be in its infancy, but I have no doubt it is what we will all be doing in ten years. There’s been an enormous logjam at the 3rd Generation Language phase (3GL) for some very understandable reasons. It’s an area where you can point to many failures. The problem is not in the expression – the problem is in the metadata. It’s not architecture expression unless you can switch architectures – replace what you’re currently doing with something else entirely. That requires a level of metadata understanding we don’t have. And architectures that better isolate the code that won’t fit into the metadata format, which we have and don’t use.
RyuJIT is totally and completely at the first level. It’s a better way to create native code on a 64 bit computer that means compiling your app to 64 bit should no longer condemn it to run slower than its 32 bit friends. That’s a big deal, particularly as we’re shoved into 64 bit because of side cases like security cryptography performance.
RyuJit is either the sixth or seventh different way your IL becomes native code. I bet you didn’t know that. It’s a huge credit the team to have integrated improved approaches and you don’t even need to know about them. (Although, if you have startup problems in non-ASP.NET applications, explore background-JIT and MPGO, as well as NGen for simple cases).
The confusion when RyuJIT was released was whether it replaced the Roslyn compilers. The simple answer is “no.” Shout it from the rooftops. Roslyn is not dead. But that’s a story for another day.