The Sixth Level of Code Generation
I wrote here about the five levels I see in code generation/meta-programming (pick your favorite overarching word for this fantastically complex space).
I missed one level in my earlier post. There are actually (at least) six levels. I missed the sixth because I was thinking vertically about the application - about the process of getting from an idea about a program all the way to a running program. But as a result I missed a really important level, because it is orthogonal.
Side note: I find it fascinating how our language affects our cognition. I think the primary reason I missed this orthogonal set is my use of the word “level” which implied a breakdown in the single dimension of creating the application.
Not only can we generate our application, we can generate the orthogonal supporting tools. This includes design-time deployment (NuGet, etc), runtime deployment, editor support (IntelliSense, classification, coloration, refactorings, etc.), unit tests and even support for code generation itself – although the last might feel a tad too much like a Mobius strip.
Unit tests are perhaps the most interesting. Code coverage is a good indicator of what you are not testing, absolutely. But code coverage does not indicate what you are testing and it certainly does not indicate that you are testing well. KLOC (lines of code) ratios of test code to real code are another indicator, but still a pretty poor one and still fail to use basic to use basic boundary condition understanding we’ve had for what, 50 years? And none of that leverages the information contained in unit tests to write better library code.
Here’s a fully unit tested library method (100% coverage) where I use TDD (I prefer TDD for libraries, and chaos for spiky stuff which I later painfully clean up and unit test):
public static string SubstringAfter(this string input, string delimiter)
var pos = input.IndexOf(delimiter, StringComparison.Ordinal);
if (pos < 0) return "";
return input.Substring(pos + 1 );
There are two bugs in this code.
Imagine for a minute that I had not used today’s TDD, but had instead interacted – with say… a dialog box (for simplicity). And for fun imagine it also allowed easy entry of XML comments; this is a library after all.
Now, imagine that the dialog asked about the parameters. Since they are strings – what happens if they are null or empty, is whitespace legal, is there an expected RegEx pattern, and are there any maximum lengths – a few quick checkboxes. The dialog would have then requested some sample input and output values. Maybe it would even give a reminder to consider failure cases (a delimiter that isn’t found in the sample). The dialog then evaluates your sample input and complains about all the boundary conditions you overlooked that weren’t already covered in your constraints. In the case above, that the delimiter is not limited to a length of one and I didn’t initially test that.
Once the dialog has gathered the data you’re willing to divulge, it looks for all the tests it thinks you should have, and generates them if they don’t exist. Yep, this means you need to be very precise in naming and structure, but you wanted to do that anyway, right?
Not only is this very feasible (I did a spike with my son and a couple conference talks about eight years ago), but there’s also very interesting extensions in creating random sample data – at the least to avoid unexpected exceptions in side cases. Yes, it’s similar to PEX, and blending the two ideas would be awesome, but the difference is you’re direct up-front guidance on expectations about input and output.
The code I initially wrote for that simple library function is bad. It’s bad code. Bad coder, no cookies.
The first issue is just a simple, stupid bug that the dialog could have told me about in evaluating missing input/output pairs. The code returns wrong the wrong answer if the length of the delimiter is greater than one and I’d never restricted the length to one. While my unit tests had full code coverage, I didn’t test a delimiter greater than one and thus had a bug.
The second issue is both common, insidious, and easily caught by generated unit tests. What happens if the input string or delimiter is null? Not only can this be caught by unit tests, but it’s a straightforward refactoring to insert the code you want into the actual library method – assertion, exception, or automatic return (I want null returned for null). And just in case you’re not convinced yet, there’s also a fantastic opportunity for documentation – all that stuff in our imagined dialog belongs in your documentation. Eventually I believe the line between your library code, unit tests and documentation should be blurry and dynamic – so don’t get too stuck on that dialog concept (I hate it).
To straighten one possible misconception in the vision I’m drawing for you, I am passionately opposed to telling programmers the order in which they should do their tasks. If this dialog is only available before you start writing your method – forget it. Whether you do TDD or spike the library method, whether you make the decisions (filling in the imagined dialog) up front or are retrofitting concepts to legacy code, the same process works.
And that’s where Roslyn comes in. As I said, we abandoned the research on this eight years ago as increasing the surface area of what it takes to write an app and requiring too much work in a specific order (and other reasons). Roslyn changes the story because we can understand the declaration, XML comments, the library method, the unit test name and attributes, and the actual code in the method and unit test without doing our own parsing. This allows the evaluation to be done at any time.
That’s just one of the reasons I’m excited about Roslyn. My brain is not big enough to imagine all the ways that we’re going to change the meaning of programming in the next five years. Roslyn is a small, and for what it’s worth highly imperfect, cog in that process. But it’s the cog we’ve been waiting for.