While disposition stuff is on the brain, there's another disposition topic I've been wanting to write about for a while now: handling disposition of objects of nebulous ownership.

The issue of nebulous ownership creeps in whenever you use any sort of complex creational pattern. If you instantiate an object by invoking its constructor directly, it's pretty clear that you are its owner (unless, of course, you're deliberately handing it off to other code). If that object happens to be disposable, then you can (and should) safely dispose it when you're done working with it.

But what if you obtained an object instance via a method call? How can you tell if you're the owner? If you're lucky, that method has a well-documented contract that specifies whether you're getting your own instance or not. Otherwise, you're pretty much going to have to guess or go code spelunking (and hope that the behaviour doesn't change in later versions). Luckily, we can often make some pretty decent educated guesses about who owns an object returned from a method call. For example, if the method is called CreateFoo, we can be pretty sure we're getting our own Foo instance. On the other hand, if we're calling the getter of a static Instance property, chances are that we're getting a shared singleton.

There are, however, cases where one is explicitly not supposed to care whether one is using a shared instance or not. Use of the service locator pattern is probably the most obvious example, and it's the one I'll focus on for the rest of this post. When you retrieve an instance from a service locator, you can't tell if it's a shared instance. Even more importantly, you're really not supposed to care. You're getting an object that's castable to the type you requested, and you should use it as that type, and that's it. Pretty simple, right? And it is, unless the object happens to be disposable.

Unfortunately, when you retrieve a disposable instance from a service locator, you have no way to tell if you're getting a shared instance or not. This means that you can't safely dispose it. I've seen a few different approaches to trying to deal with this:

  1. Don't ever dispose anything received from a service locator.
  2. Use a separate lookup table keyed on service registration data (usually type and an optional registration name) to indicate whether instances of any given retrieved service should be disposed.
  3. Offload instance disposition to the container.
  4. Build disposition-prevention logic into registered service instances so that they will not be disposed unless the service container is being disposed.

#1 has the obvious disadvantage that some things that ought to be disposed won't be. This will usually lead to at least poorer performance, but it could also cause some more obvious functional bugs if resources locks are not released in a timely fashion.

#2 is kludgy. It means that you can't simply call Dispose() on a serviced object. Instead, you need to either perform the disposability lookup first or invoke a helper method that takes care of both the lookup and the Dispose() invocation. Either way, you can't use using statements, which is massively annoying. You're also very likely to run into some nasty little bugs if you switch from using a locally newed instance to a serviced instance some time in the future. For example, if you make the following change, how likely are you to notice that you also need to change the disposition code?

Original code:

using (IFoo foo = new Foo())
{
	foo.DoSomething();
}

New code:

using (IFoo foo = ServiceLocator.GetInstance<IFoo>())
{
	foo.DoSomething();
}

#3 is the approach used by Autofac and, as far as I can tell, only by Autofac in the .NET space. When I first saw it, I was pretty darn excited – for about 20 seconds. And then that inconvenient little voice in my head started raising objections:

  1. Disposition is meant to allow "expensive" state to be cleaned up as soon as possible. By deferring disposition of container-sourced objects until the end of a predefined lifetime scope, we will be delaying disposition of individual objects in at least some, if not most, cases.
  2. Unless the tracking mechanism uses weak references (and I do have to admit that I haven't examined the Autofac implementation in any depth, so I don't know whether this is the case for it or not), it is very likely preventing non-shared disposable instances from being garbage collected until the end of their declared lifetime scope. This means that we could end up taking an unexpected memory hit unless we use fairly granular lifetime scopes.
  3. We shouldn't dispose any container-sourced objects, which means some potentially difficult-to-diagnose problems if we move from a newed instance to a container-sourced instance (as in the previous example).
  4. Using this mechanism ties us into using the Autofac container. We can't swap out a different container without making some pretty major code modifications.

So that leaves approach #4... From the point of view of the instance retriever, it's very simple: you can safely dispose of any instance you retrieve from the container, just as if it had been newed. That's really quite elegant, and it's what I want to be able to do in my own code. However, getting to the point where you can do this is anything but simple. The only reference I've been able to find to a similar approach leaks implementation details up to the instance users, which is inconvenient in practical terms and, well, more than a little icky from a design point-of-view. So what can be done to avoid this?

Well, whatever we do, we're going to need a wrapper type that actually controls the disposition. It has three key features:

  1. It wraps an instance of an interface, with that instance being provided to the wrapper constructor.
  2. It implements that focal interface, directly forwarding all interface member invocations to the wrapped instance, except for IDisposable.Dispose().
  3. For IDisposable.Dispose(), it checks whether disposition is allowed before invoking the Dispose() method of the wrapped instance.

On the surface, that looks pretty simple but, as is so often the case, the devil is lurking in the details... Let's start with the question of how the wrapper class knows whether disposition of the wrapped instance should be allowed. Regardless of how we implement this, we'll need to modify our service container to toggle some "disposition allowed" flag. However, I'd prefer to avoid forcing the container type to interact directly with our disposition-prevention mechanism. Instead, let's make the container implement a more general-purpose interface:

public interface ITrackingDisposable : IDisposable
{
	DispositionState DispositionState { get; }
}

where DispositionState is an enum with the following definition:

public enum DispositionState
{
	Undisposed = 0,
	Disposing = 1,
	Disposed = 2
}

(There are a few potentially interesting enhancements to both the interface and the enum, but they're not relevant to the disposition-prevention mechanism, so we'll leave them out of the current discussion.)

In order to implement ITrackingDisposable, we'll need to make a couple of changes to the service container. Adding the DispositionState property is pretty trivial:

public DispositionState DispositionState { get; private set; }

Changing the Dispose implementation is pretty trivial too, as long as you're not trying to make any distinctions between successful and unsuccessful disposition:

public void Dispose()
{
	this.DispositionState = DispositionState.Disposing;
	try
	{
		this.Dispose(true);
		GC.SuppressFinalize(this);
	}
	finally
	{
		this.DispositionState = DispositionState.Disposed;
	}
}

Once our service container implements the ITrackingDisposable interface, we can pass a reference to the container to our wrapper class, and let it decide whether the wrapped instance should be disposed based on the current DispositionState of the container.

That just leaves us with one other picky detail: how do we end up with a wrapper type that forwards the focal interface implementation to the wrapped instance? First, we'll create an abstract base class for our wrappers:

public abstract class DispositionPreventer<TComponent> : IDisposable
	where TComponent : IDisposable
{
	private readonly TComponent _component;
	private ITrackingDisposable _owner;

	protected DispositionPreventer(TComponent component, ITrackingDisposable owner)
	{
		Contract.Requires(component != null);
		Contract.Requires(owner != null);

		this._component = component;
		this._owner = owner;
	}

	protected TComponent Component
	{
		get
		{
			return this._component;
		}
	}

	public void Dispose()
	{
		switch (this._owner.DispositionState)
		{
			case DispositionState.Disposing:
			case DispositionState.Disposed:
				this._component.Dispose();
				break;
		}
	}
}

Now, we'll need to create a concrete subclass for each interface for which we wish to register an instance with the service container. For example, if we have an interface IFoo defined as follows:

public interface IFoo : IDisposable
{
	event EventHandler Fooing;

	bool IsFooed { get; }

	void DoSomething();
}

we'll need to create a disposition-preventer implementation like the following:

public sealed class FooDispositionPreventer : DispositionPreventer<IFoo>, IFoo
{
	public FooDispositionPreventer(IFoo component, ITrackingDisposable owner)
		: base(component, owner)
	{
	}

	public event EventHandler Fooing
	{
		add
		{
			this.Component.Fooing += value;
		}

		remove
		{
			this.Component.Fooing -= value;
		}
	}

	public bool IsFooed
	{
		get
		{
			return this.Component.IsFooed;
		}
	}

	public void DoSomething()
	{
		this.Component.DoSomething();
	}
}

That's not so bad for a little interface like IFoo, but it can turn into hours of drudgery for larger interfaces. Unfortunately, given the lack of support for easy design by composition in .NET, you're either going to have to write this code manually or build your own code generation tool to create the wrapper classes. (I recently created a T4 template for this at the day job and, while authoring the template itself was a somewhat annoying exercise, at least I won't have to write (or, even worse, code review) the wrapper classes themselves.)

Once you've got the wrapper classes set up, all that's left is to actually use them. This is done simply by wrapping our focal instance immediately before registering it with the service container. e.g.:

container.RegisterInstance<IFoo>(new FooDispositionPreventer(new Foo()));

To help detect a failure to wrap a disposable instance before registration, I use a custom FxCop rule that looks for RegisterInstance calls where the registered instance is disposable but does not inherit from DispositionPreventer<>.

And that's pretty much it. The only really hard part of this is implementing the individual disposition wrapper classes. However, if you're willing to invest a bit in that part of the puzzle, you'll end up with a container-independent disposition-prevention mechanism that won't intrude in any way into your container-consuming code. And, in case you missed it, the above mechanism can be made to work with pretty much any nebulous ownership scenario where interfaces are used for implementation decoupling since all you need to do is have the real "owner" pass around disposition-preventing wrappers instead of "naked" instances. If interface-based decoupling is not being used, a subclassing approach is also possible as long as the focal class is unsealed.

But it would still be nice if implementing the wrapper classes wasn't quite so much work...

Posted by calinoiu | 6 comment(s)

I've had publication of a little FxCop rule-building toolkit on my todo list for at least the past three years. I finally got the beastie out the door yesterday evening...

The Bordecal tools for FxCop include two main pieces:

  1. A rule development framework with two main pieces:
    • A testing framework for custom rules, and
    • Rule base classes that allow configuration via code rather than an embedded XML file.
  2. A set of custom rules.

The testing framework is probably the single most important part. For those of you who are already familiar with Roy Osherove's FxCopUnit, it is important to understand that this is a rather different approach to rule testing. It is focused on targeting entire sample target assemblies with the goal of detecting both false negatives and false positives. It's also very picky about matching expected to actual results, largely because I believe that the violation details exposed to the rule user are almost as important as detection of the violation itself. That said, I could probably be convinced to make this behaviour configurable in some future version if enough folks vote for the feature.

The rule base classes that form the other part of the development toolkit are motivated primarily by my personal annoyance with the embedded XML configuration approach. The fact that so many beginner rule developers seem to have problems with generating an appropriate embedded XML resource was my main reason for including it in the published toolkit. However, there's no need to use these base classes in order to use the testing framework since the testing framework uses fxcopcmd.exe and will be able to use any rule that FxCop itself can find.

The custom rules are meant primarily as an example of how to use the development toolkit, as is the rule test project that can be found in the source code. However, I've chosen rules that I tend to write and re-write at each day job, so I'm guessing that at least some folks are likely to find them interesting in their own right.

If you wish to report a bug or request a feature for either the development framework or the custom rule set, please use the project issue tracker. Similarly, if you need help using the toolkit, please use the project discussion list so that other users with similar problems may see the answers to your questions even if they do not read this blog.

And now there's one more disposition post to finish up before I can finally give Bordecal.ImportsSorter some much-needed attention...

Posted by calinoiu | 2 comment(s)
Filed under:

For some odd reason, I've been authoring quite a few disposable types lately. Under normal circumstances, this would hardly be worth mentioning, but the recommended disposition pattern has been grating a raw nerve quite some time now. At best, it feels like a slightly nasty code smell to me. Due the volume of recent implementations, I'm starting to feel like I'm hanging out next to a sewer. I suspect that I'm starting to slip toward ignoring the guideline in favour of what I would consider a better practice, and I'm hoping that a little bit of venting might postpone that, so here it goes...

Just in case you're not already familiar with it, here's an outline of the implementation recommended in the .NET Design Guidelines:

public void  Dispose()
{
	this.Dispose(true);
	GC.SuppressFinalize(this);
}

protected virtual void Dispose(bool disposing)	// or private in a sealed class.
{
	if (disposing)
	{	
		// Clean up managed state here.
	}

	// Clean up unmanaged state here.
}

Seems a little odd, given that we don't ever enter the Dispose(bool) method with a false parameter value, doesn't it? So why bother? Well, once upon a time, we used finalizers to help clean up unmanaged state. A finalizable type ought to implement IDisposable, and the "Clean up unmanaged state here" bit in the above example should be run from both the finalizer and the Dispose() method. Ideally, we'd like to be able to write something like the following in a finalizable and disposable class:

~TheClass()	// In case you're not familiar with the syntax, this is the finalizer.
{
	// Clean up unmanaged state here.

public void  Dispose()
{
	// Clean up managed state here.

	this.Finalize();
	GC.SuppressFinalize(this);
}

Unfortunately, our code can't invoke the finalizer directly, so we can't call the finalizer from the Dispose() method in order to clean up the unmanaged state. To work around this, we use the Dispose(bool) method shown in the first example above, and we invoke it with a false value from the finalizer:

~TheClass()
{
	this.Dispose(false);
} 

So far, so good. However (very, very big "however"), starting in .NET 2.0, we're not supposed to write finalizers anymore (or at least not for typical unmanaged resource clean-up). This means that we'll never call Dispose(bool) with a false argument in the vast majority of our new disposable type implementations. In other words, there's no practical benefit in implementing the recommended disposition pattern anymore, unless you happen to be working on one of the very rare cases where a finalizer is still required. For the vast majority of cases, where a finalizer is not required, the following would be functionally equivalent:

public virtual void Dispose()	// or non-virtual in a sealed class
{
	// Clean up managed resources here.

	GC.SuppressFinalize(this);
}

Now, let's assume for the moment that you've never heard of the recommended disposition pattern. What might your naïve implementation of IDisposable.Dispose() look like? Probably an awful lot like that last example, except you would probably omit the GC.SuppressFinalize call (which wouldn't actually matter if your type isn't finalizable, which it shouldn't be anyway) In other words, if the recommended disposition pattern didn't involve a bunch of unnecessary code, naïve implementers would automatically do the right thing. This is known as the "pit of success", and it is generally considered a good thing.

But let's come back to you, the very well informed disposition author. You're not naïve. You know how to implement the disposition pattern. Heck, you even have a lovely little snippet. Why should it bother you that the recommended pattern is more verbose than strictly necessary? It's unfortunate, but every little bit of code we write is an opportunity for error. Every bit of extra complexity (including little things like a senseless redirection or one extra nested if block) increases the probability of introducing a bug and decreases the readability and maintainability of the code. It's bad enough when things like this appear in one-off code. How much worse when we're asked to systematically add unnecessary code as part of a recommended implementation pattern?

Would you accept the following in a code review, if the DoSomething(bool) method were never invoked with a false parameter value:

public void DoSomething()
{
	this.DoSomething(true);
}

private void DoSomething(bool allowDoing)
{
	if (allowDoing)
	{
		...
	}
}

If not, why accept a guideline that recommends the same thing? Well, obviously because it's a guideline, and you're a well-mannered developer who cares about exposing APIs that are consistent with what consumers have come to expect from their interactions with core .NET Framework code. But if the fact that it's a guideline is the only reason you're still following the recommended disposition pattern (or if you've already given up on following it), you might want to consider letting the folks responsible for the guideline know that you're not too fond of it...

Posted by calinoiu | 7 comment(s)

For the past week or so, I've been performing last-minute documentation tasks at the day job in preparation for the impending maternity leave. If you've been reading this blog because of my FxCop posts, then you might be interested in my latest post on the FinRad blog, which covers the statistics collection tool that we developed internally to help us track our FxCop backlog cleanup efforts. A copy of the tool is also available on CodePlex if you would like to try it out for yourself.

Posted by calinoiu | with no comments
Filed under:

So... Umm... It's been a while. A little over six months, to be specific. That would be three months of unrelenting nausea and surprisingly heavy fatigue, followed by three months of desperately trying to catch up on all the "real world" stuff that didn't get done during those first three months. But now it's winter, and I'm looking forward to three months of not skiing, so there should be plenty of time for 'putering (at least once the pesky December holiday season is over), and care and feeding of this blog thing will slowly resume...

For today, I figured I'd address some of the correspondence I've shamefully ignored since June. Comments have been moderated and, where appropriate, answered. For those of you who sent me e-mails, here are some replies:

  1. You're quite correct: a single-line textbox is actually attribute-encoded. Why I didn't pick up that error in the table while I was working on the double-encoding example is a bit of a mystery...

  2. I do indeed have a solution for automatic HTML-encoding that avoids the double-encoding problem. I actually started working on a framework and series of blogs posts about a week after writing the HTML-encoding post, but I got busy with other stuff and never ended up publishing anything on the topic. Now that I know that at least some folks are interested, I'll try to find some time to update the framework and explain how it works and how to use it.

  3. A version of Bordecal.ImportsSorter that works with Visual Studio 2008 will be coming soon-ish. It ranks right below work, holiday preparations, and future offspring concerns on my to-do list. (Hmm... That probably doesn't sound like I'm likely to find much time to work on it any time soon. However, please rest assured that I want this new version at least as much as you do, and I'm highly motivated to work on it whenever I have the time and energy.)

My apologies to those of you who have been waiting far too long for those answers. I'll try to do better over the next little while (or at least until I drop out of sight again in about three months)...


P.S.: In case you haven't figured it out yet and are worried about my health, please don't be. I do have an abdominal parasite, but it's mammalian and quite benign. ;) For those of you who are curious, it's a girl, and she's due on February 29th.

Posted by calinoiu | 2 comment(s)
Filed under:

We've got a rather large VB code base at FinRad, at least some of which we would like to migrate to C# in the nearish future. Because of this, we have created several custom rules that are intended to detect issues in VB code that would cause problems when migrating the code to C#. However, while attempting to assign categories for those custom rules, we quickly realized that each of them had merits besides portability of the code base, and that we wanted our assemblies to abide by these rules regardless of whether or not they might eventually be ported to C#.

I hadn't been planning on writing a post about these rules for some time to come, but then I saw the announcement last week that Microsoft is using VB to build the Silverlight javascript compiler and VBx, which seems to have boosted the topic's prority on my internal TODO list enough that I can't quite seem to ignore it...


Rule #1: Interfaces should not contain nested types (Design)

By definition, interfaces should not contain any implementation, and a nested type would certainly qualify as implementation. Unfortunately, the VB compiler generates no warnings or errors when any type whatsoever is nested within an interface. For example, the following interface will compile just fine (and FxCop won't flag it as an "empty" interface either):

Public Interface ISomething

	Enum Animal
		Cat
		Dog
		Fish
	End Enum

	Class SomeClass
		Public Sub DoSomething()
			Console.WriteLine("something")
		End Sub
	End Class

End Interface

To make matters worse, the VB compiler will also generate a type within an interface under certain conditions. Consider, for example, the following interface:

Public Interface ISomethingElse

	Event SomethingDone(ByVal sender As Object, ByVal e As EventArgs)

End Interface

When an event is declared using the above syntax, the VB compiler will generate a delegate for the event signature within the same parent type. The resulting compiled assembly actually ends up containing the following interface definition, which includes a nested type:

Public Interface ISomethingElse
    
    Delegate Sub SomethingDoneEventHandler(ByVal sender As Object, ByVal e As EventArgs)
    
    Event SomethingDone As SomethingDoneEventHandler

End Interface

Rule #2: Properties should not have arguments (Design)

In order to be easily invokable from C#, a property should never have more than one argument, and only indexers (i.e.: properties named "Item" in VB) should have even one argument. If either of these conventions are ignored, C# code is going to have to use syntax like get_SomeProperty(a, b, c) to use your properties, which is considerably less readable than some of the alternatives.  Do you really want to be forcing your API's callers to be writing ugly code?


Rule #3: Member names should not match type names (Naming)

C# will generate compiler error CS0542 if you try to create a member with the same name as its parent type, but VB won't complain about this at all. The use of a member with the same name as the type to signify a constructor in C# presumably explains this difference in compiler behaviour. However, there's an API useability problem here as well since this significance of a member (usually a property) with the same name as its parent class is rarely immediately obvious. For example, imagine a class named "Label" with a property named "Label". What exactly does "Label.Label" represent? In such a case, either the class or the member probably ought to be renamed to convey more information about its role.


Rule #4: Review call scope forcing (Usage)

VB allows for forcing the scope of invocation of a virtual member to the current class via the MyClass keyword. Both VB and C# allow for forcing the scope of invocation of a virtual member to the base class (via MyBase and base, respectively). Since call scope forcing isn't really a VB-specific capability, this isn't really a VB-specific rule. However, eliminating MyClass use was what provoked its development at FinRad, so I'm including it here anyway.

In case you're wondering why call scope forcing should be reviewed, consider why a member would be declared as virtual in the first place: it's meant to be overrideable. If a subclass has overridden a virtual method, why should a base class be deliberately avoiding invoking that override? Obviously, there are cases where this behaviour is desirable, which is why this is a "review" rule rather than a "do not" rule, but there are also plenty of cases where scope forcing is likely to be used inappropriately, particularly when developers aren't aware of the consequences.

If you want to implement this rule, there are a few things to watch out for:

  • Within the implementation of an overriding, overloading, or hiding member, it is quite common to invoke the base class implementation explicitly. In order to avoid undesirable noise in the rule analysis results, these cases should not be flagged as problems by the rule.
  • Because of VB interface member implementation renaming, the name of an overload will not necessarily be the same as the name of the member it is overloading.
  • The C# compiler will force a non-virtual call when invoking a non-virtual base class member even if your code uses this rather than base to scope the call. The VB compiler will retain the virtual call.
  • FxCop 1.35 and Visual Studio 2005 Static Analysis both have problems detecting whether a property is overriding and/or hiding, although the exact behaviour differs between the two engines, as well as between .NET 1.1. and 2.0 assemblies for FxCop 1.35. For more details on this issue (which appears to have been resolved in Orcas), see this bug report.
Posted by calinoiu | with no comments
Filed under:

If you're considering tackling an FxCop backlog, you're going to need a few tools. Obviously, the most important of these is FxCop itself, but that still leaves you with a potential choice to make between stand-alone FxCop and Visual Studio Static Analysis. If your target code base is built against .NET 1.1 or if you're not willing/able to shell out for Team Edition for Software Developers or Team Suite, the decision is pretty trivial. However, what if all your developers already have Visual Studio 2005 editions that include static analysis? You've paid for it, so your first instinct will probably be to use it. Unfortunately, it's probably not the best tool for this particular job.

While VStudio Static Analysis is great for dealing with a small volume of problems that should be addressed immediately, it kind of sucks the big wazoo with respect to handling the much larger volume of violations that one needs to deal with while tackling a backlog. This is partly a UI issue, and it shouldn't come as much as a surprise given that the backlog scenario probably wasn't exactly a major priority during the development of the VStudio integration piece. However, there are also some issues that go beyond the limited capabilities of the Visual Studio Error List window, and it's important to be aware of these as you make your tooling decisions.

In order to help understand why the stand-alone tool will likely be a better choice, let's take a closer look at some of the tasks in the backlog winnowing process...


Fix this

You've decided to activate a given rule. Your code base has 400 existing violations of the rule, and you want your team to start fixing those right away. However, you certainly don't want to have to specifically assign each and every fix to a given developer. How do you create a "pool" of violations from which developers can draw?

If you're using Static Analysis and you also happen to be using Team Foundation Server, you might want create work items for the violations, then let developers assign the work items to themselves before starting to work on the fixes. However, from the VStudio Error List, you're going to able to generate work items in only one of two ways:

  • Select all the violations, then generate a single work item, or
  • Generate a work item for each violation individually.

Generating the individual work items would be ridiculously time consuming and tedious, so a single work item would usually be the only practical choice. However, in order to ensure that the more than one developer isn't working on any given violation at the same time, there's still going to have to be some manual work in terms of copying and editing work items before a developer will even start fixing a violation. Unfortunately, this procedural overhead doesn't really have much added value besides preventing overlapping work, and the additional effort may end up accounting for a very large proportion of the time spent on any given fix. Wouldn't you rather use that time for fixing other violations instead? (Of course, you could create a tool that automatically creates individual work items for you based on an FxCop output report, but that's again time that might be better spent doing other things.)

Another problem with adding backlog work items to TFS from VStudio Static Analysis results is that they're going to end up in the same TFS project as all your "real" work items for the same code base. Assuming your Static Analysis backlog is non-trivial, these backlog work items will start to swamp out your other feature and bug fix work items pretty quickly. Sure, you can isolate them in a category of their own, but sooner or later someone is going to include those beasties in a report where they shouldn't belong, and your stats are going to be skewed halway to heck.

OK, so if those are problems with using Static Analysis for violation reservation, what can the cheapskate's standalone edition do better? For starters, we can mark all the violations for fixing in one step without requiring later splitting. After activating a new rule and executing an analysis, simply select all the newly active violations of the rule and exclude them with a "TODO" note. If you're feeling fancy, you can even add a priority level (e.g.: "TODO 1" or "TODO 2"). After excluding the pre-existing violations, simply check the FxCop project into source control, and all the developers on your team can instantly access the items.

Under the "TODO" exclusion scheme, it's equally easy for any given developer to grab himself a chunk of new backlog work. He simply selects a bunch of violations from those that still have "TODO" notes, adds a new note (in one step) to claim all of them, then checks the modified FxCop project back into source control. What's this new note that gets added for the claiming? At FinRad, we simply add our initials to the note. For example, I claim violations by adding the note "TODO - NC" to an unclaimed TODO exclusion. If you've got a bigger team, you might need to identify developers by somewhat longer strings than their initials, but the underlying technique should still work.

Using stand-alone FxCop and TODO exclusions, I can both create and reserve violation "work items" in seconds, so I can spend more time on actually fixing violations rather than on managing the workload itself.


Don't break that

Identifying the current workload is only part of managing the violation backlog. Another important part is ensuring that the backlog violations that have not yet been fixed don't break your builds (assuming, of course, that you're using automated builds and that you've set FxCop rule violations to break those builds). If you're using VStudio Static Analysis, you've got two basic options for doing this:

  • Set the rule to not break the build until after all the backlog violations are fixed or
  • Use SuppressMessageAttribute to exclude the pre-existing violations.

If you don't allow violations of the newly activated rule to break the build, you won't start detecting new violations of the rule until after you've cleaned up the entire backlog, and that's not pretty. On the other hand, if you add a SuppressMessageAttribute to any given pre-existing violation, that violation will no longer show up in the Error List, so it's going to start being difficult for your team members to find and fix the problem.

Another problem with the SuppressMessageAttribute exclusions approach is that VStudio Static Analysis doesn't let you know when you don't need a suppression anymore.1 This means that if you fix the original violation but forget to remove the attribute, you could later create another violation of the same rule in the same code element, and you'll never notice it because the old placeholder exclusion will hide the new problem. That's a nasty enough problem even when you have a small number of "real" exclusions, but it's asking for big trouble when you consider the very large number of suppressions that you would need to create for your backlog violations.

This is another problem for which a much cleaner, simpler solution is possible when using the FxCop stand-alone UI. In fact, we've already covered the approach above. Since we create a "TODO" exclusion for every violation in the immediate backlog, we can keep the rule activated and build-breaking while still readily identifying the backlog violations. The stand-alone UI also allows us to detect exclusions that are no longer needed, so we can remove the unecessary exclusions in batches even if individual developers forget to remove them when fixing a violation.

On a bit of a side note, if you happen to be running a backlog cleanup of a .NET 2.0 application, you will probably want to store only your TODO exclusions in the FxCop project file. If you happen to resolve a violation by creating a "real" exclusion, that exclusion should probably be created with SuppressMessageAttribute rather than in the FxCop project file. After all, one of these days, you'll actually finish tackling the backlog, and you may want to switch over to using VStudio Static Analysis for finding and addressing the piddly few rule violations you'll be generating in your day-to-day work.


Sorting, filtering, and other fancy-pants advanced techniques

So let's say you're a developer who is ready to give an hour or so to fixing some of the backlog problems. How to you select which ones to work on? If you simply grab a random assortment, you'll probably end up working on violations of a variety of different rules that are widely separated with respect to location in the source code. You'll very likely end up wasting time on context switching that could have been put to much better use actually fixing violations.

As a more concrete example, let's say you decide that you have time to fix about 30 violations of the Performance.RemoveUnusedLocals rule. Trivial as these might seem, chances are that you'll probably find that somewhere between 10% and 20% of the violations actually require some decisions and maybe even a bit of code spelunking. This is why you won't try tackling 3 or 4 violations per minute, but it also has another consequence. If you're going to be encountering "weird" violations, wouldn't it be nice if they're mostly of the same style? One way of trying to increase the probably of this would be to draw the violations from a smallish area of the code base, where you'll at least stand a reasonable chance of encountering mostly code authored by a single developer.

While selecting violations using the above pattern is certainly possible from the VStudio Error List, it's not terribly convenient, and it's certainly not something that most of your developers are likely to enjoy doing several times a week. Developer patience with the selection process is likely to fray even faster if you're using TFS to store the backlog work items. To make things worse, you'll again be wasting time that could be better spent on actually fixing violations.

On the other hand, making the same issue selection in the stand-alone FxCop UI is trivial. First, I select the Performance.RemoveUnusedLocals rule in the rule treeview to hide violations of any other rule. Then I sort the "Excluded In Project" list by the item column. Since I can see availability of items in the "Last Note" column, all I have to do is select a bunch of violations that are clustered close together by item path in order to increase the chances of similar types of weirdness in the problems that I'm going to address.2


But I don't like the extra step involved in running FxCop...

Let's step back a moment and look at the usual reason for preferring VStudio Static Analysis: it runs seemlessly as part of the build process. Even ignoring the backlog problem, there's another nasty gotcha here: duration of a static analysis run. It takes over 3 minutes to analyze the assemblies in the product on which I work day-to-day. Given this, I'm certainly not going to run an analysis as part of my usual debug builds, whether it's via the integrated Static Analysis tool or by launching FxCop via a post-build event. If I really want the analysis to be associated with a build, I'm going to have to create an alternate build configuration at the very least.

This means that, even if I'm targeting the integrated tool, I'm still going to be launching analyses via a conscious, at least partially manual step. With one little extra step (clicking the "Analyze" button within the FxCop UI), I can run the analysis in the richer, otherwise more convenient stand-alone UI instead. To make this rapid launch possible, I simply add my FxCop project file to my solution as a solution item, then configure Visual Studio to open FxCop files using FxCop.exe. Once this initial setup is done, all I need to do to run my FxCop analysis is open the FxCop project from within Visual Studio then click the "Analyze" button once the FxCop UI has opened (to watch a 700K demo video, click here). That's amply convenient for me, although I guess your mileage may vary (but only if you're a complete nut <gdr>).


But that Static Analysis thingy cost me 5000$ (per seat) !!!

If you've bought one of the Team System editions that includes static analysis, I very much hope that you're taking full advantage of it for your new development work. I also hope that static analysis isn't the only "extra" feature that you're using, or perhaps you did spend rather a lot for one feature. ;) Either way, the fact that you happen to have the integrated static analysis tool available to you shouldn't be an important factor in making a decision regarding which FxCop tool to use for your backlog cleanup project.

On the other hand, the integrated tool ships with some rules that are not available in the free version of FxCop, and you might be swayed toward using that tool in order to take advantage of the additional rules, particularly given that FxCop 1.35 and Visual Studio 2005 Static Analysis cannot consume each other's rules. However, what you may not know is that FxCop 1.35 also shipped with some rules that aren't present in the integrated static analysis rule set3. The following table lists the rule set differences:

Category FxCop 1.35 Visual Studio 2005
Design ExceptionsShouldBePublic  
Maintainability   AvoidExcessiveComplexity
AvoidExcessiveInheritance
VariableNamesShouldNotMatchFieldNames
Naming CompoundWordsShouldBeCasedCorrectly
FlagsEnumsShouldHavePluralNames
IdentifiersShouldBeSpelledCorrectly
OnlyFlagsEnumsShouldHavePluralNames
ResourceStringCompoundWordsShouldBeCasedCorrectly
ResourceStringsShouldBeSpelledCorrectly
 
Performance RemoveEmptyFinalizers  
Reliability   DisposeObjectsBeforeLosingScope
DoNotLockOnObjectsWithWeakIdentity
DoNotTreatFibersAsThreads
RemoveCallsToGCKeepAlive
UseSafeHandleToEncapsulateNativeResources
Security CatchNonClsCompliantExceptionsInGeneralHandlers
SecurityTransparentAssembliesShouldNotContainSecurityCriticalCode
SecurityTransparentCodeShouldNotAssert
SecurityTransparentCodeShouldNotReferenceNonpublicSecurityCriticalCode
 
Usage AttributeStringLiteralsShouldParseCorrectly
LiteralsShouldBeSpelledCorrectly
TestForNaNCorrectly
 

If you're seeking to maximize your rule coverage, jumping on static analysis because it supposedly contains more rules isn't exactly looking like an optimal approach... If you want to cover the combined rule set offered by the two tools, I'd recommend using FxCop 1.35 for your backlog cleanup until you've exhausted all of its rules. Once that's done, you would potentially be faced with a decision of switching over to the integrated tool for the remaining rules or coding custom rules for FxCop 1.35 that check for the same problems as the additional rules shipped with Visual Studio.

Then again, is that really a decision you're likely to end up facing? Assuming you start a backlog cleanup now, when do you think you're likely to be almost done? Unless your backlog is trivial, chances are excellent that both Orcas and FxCop vNext will have shipped by the time you're even half way done. Personally, I'm hoping that Orcas static analysis and FxCop vNext will be able to consume each other's rules, thereby making it possible to use the richer UI when addressing a backlog of the Visual Studio rules, as well as easing some of the pain around custom rule authoring. There's no technical reason why they shouldn't be able to consume the same rules, but that doesn't mean that a difference couldn't be introduced explicitly in order to prevent using the expensive rules from the free tool. I'm keeping my fingers crossed that this won't happen, although I won't start counting any chickens until Orcas and FxCop vNext have both RTMed...




1If you wish VStudio Static Analysis and/or FxCopCmd.exe would let you know when you've got unecessary suppressions, vote here.

2If FxCop wouldn't "lose" the rule selection when switching between the rules and targets treeviews, it would be possible to make an even more convenient selection by "climbing" the target treeview from the target associated with any given violations. Unfortunately, the rule selection is lost when switching to the other treeview tab in FxCop 1.35. I'm keeping my fingers crossed that this won't be the case in the vNext UI, but there seems little point in creating a bug report until I've actually tried the beastie.  I'm also hoping that the vNext UI will allowing filtering by column values within the violation lists, which would also make working with large backlog lists more convenient, but I won't be holding my breath for that change.

3In case you're not already aware of this, the Visual Studio rule set is supposed to be a superset of the stand-alone FxCop rule set. FxCop 1.35 only contains rules that aren't present in Visual Studio 2005 because of time constraints during the preparation of Visual Studio 2005. (Sorry, but the best reference I seem to be able to find regarding this at the moment is a post on the FxCop forum.) The extra FxCop 1.35 rules are present in Orcas, with the exception of Usage.LiteralsShouldBeSpelledCorrectly, which is a control flow rule.

Posted by calinoiu | with no comments
Filed under:

Off on a bit of a tangent this morning...

I have a bit of a love/hate relationship with the Visual Studio start page. I do find it useful enough when I first open VStudio. However, once I start working on anything, the mere presence of the start page tab in the IDE main window is really, really irritating to me for some reason. Of course, given that even one little using directive out of place also drives me to distraction, this probably isn't too surprising.

At any rate, the other day, I decided that it was high time to do a little something about that start page annoyance. At some point in the medium-ish future, I might turn Bordecal.ImportSorter into Bordecal.ToolsForObsessiveCompulsiveNitPickers and throw in a start page closer. In the meantime, here's a VStudio 2005 macro to close the start page as soon as any document is loaded in the IDE.

For those of you who are interested in using the macro, here are a couple of picky details:

  1. The start page will be closed when any document is opened, regardless of how it was opened. If you open a solution that has open documents, that's good enough to trigger closing of the start page.
  2. The start page will only be closed once. If you reload the start page manually, the macro won't close it on you.

The above are completely intentional features--that's exactly how I personally prefer the start page to be closed. If you would prefer some alternate behaviour, please feel free to ask. I'm quite willing to consider adding customization of the behaviour when I eventually get around to adding this stuff to an add-in...

Posted by calinoiu | 2 comment(s)
Filed under:

I skipped ahead a while back with my post on the exceptions theme, and it's time to get back on track with stuff that would usually precede that rather involved topic during an FxCop backlog cleanup project.

The good news with the disposition and finalization topic is that its scope is quite a bit narrower than the exceptions topic. The bad news is that you'll probably see almost as many preconceptions regarding how it works, particularly if you have quite a few developers who are accustomed to deterministic finalization. As with the exceptions topic, it's probably a pretty good idea to start the topic from scratch, making the assumption that most of the developers on your team will get at least something from coverage of even the fundamentals of the subject matter.


Resources

This time around, Chris Brumme's post on finalization is absolutely required reading, at least for the topic trainer. If you don't understand at least 90% of that article, you probably don't know enough about finalization to be teaching the topic.

You'll also need some resources that are more on the "how to" side of things. The MSDN articles Implementing Finalize and Dispose to Clean Up Unmanaged Resources and Implementing a Dispose Method to provide a reasonable starting point around guidelines for finalization and disposition. By the time you complete your training on finalization and disposition, all the developers on your team should be able to read and understand these articles.


Some gotchas

By and large, the documentation available is pretty clear, and the recommended practices are relatively unambiguous. However, we did run into a few picky detail issues while tackling the disposition and finalization theme at FinRad:

  1. In a disposable class that contains a collection of disposable items, how does one deal with exceptions that might be thrown during the disposition of any individual item?
    The issue here is that one wants to continue disposing of any remaining items even if an exception is thrown while disposing of any given item. However, one still wants to throw an exception out of the parent class Dispose method. The question here is exactly what should one end up throwing? One answer is to store any exceptions into some custom exception that contains an exception collection, then throw the resulting beastie at the end of the process. Unfortunately, there are several problems with this approach, not least of which is that the original Dispose method caller isn't likely to be want to iterate over the exception collection trying to decide what to do about each and every one of the member exceptions.

    What we ended up deciding to do instead was to treat disposition of each individual collection item as if it we were disposing a field on the parent class. In other words, we wanted to expose the same exception scenario that would be seen if we were able to code the parent Dispose method like this:
    try
    {
    	this._someDisposableField.Dispose();
    }
    finally
    {
    	try
    	{
    		this._someOtherDisposableField.Dispose();
    	}
    	finally
    	{
    		try
    		{
    			this._collectionOfDisposables[0].Dispose();
    		}
    		finally
    		{
    			try
    			{
    				this._collectionOfDisposables[1].Dispose();
    			}
    			finally
    			{
    				...
    			}
    		}
    	}
    }
    If one were able to do this, the only exception the caller of the parent class Dispose method would see is the last exception thrown in any of the individual try blocks. This is quite easy to simulate even while disposing in a loop. e.g.:
    try
    {
    	this._someDisposableField.Dispose();
    }
    finally
    {
    	try
    	{
    		this._someOtherDisposableField.Dispose();
    	}
    	finally
    	{
    		try
    		{
    			Exception lastException = null;
    			foreach (IDisposable item in this._collectionOfDisposables)
    			{
    				try
    				{
    					item.Dispose();
    				}
    				catch (Exception ex)
    				{
    					lastException = ex;
    				}
    			}
    
    			if (lastException != null) throw lastException;
    		}
    		finally
    		{
    			GC.SuppressFinalize(this);
    		}
    	}
    }
    Is this ideal? Obviously not, but I happen to believe that it presents less problems than any of the alternatives. If you think you have a better approach, I'd love to hear about it.
  2. What exactly is an unmanaged resource?
    The basic rules for implementing disposition and finalization are the following:
    • If a class has a disposable field, it should disposed from the Dispose method of the parent class.
    • If a class has a field that represents an unmanaged resource, this should be released from both the Dispose method and the finalizer of the parent class.
    The first rule is pretty unambiguous, but what exactly qualifies as an "unmanaged resource" that should be released both on disposition and finalization of the parent class? Most of the articles on disposition and finalization make some airy-fairy mention of things like database connections and file handles, but does that mean a SqlConnection should be disposed in the parent class finalizer? The simple answer is "no". The barely more complicated answer is pretty much the only things you should be cleaning up during finalization are IntPtr fields. If you've gone beyond IntPtr to start using SafeHandle in .NET 2.0, you don't even need worry about those since the unmanaged handle is already wrapped in a disposable, finalizable managed object.
  3. There's no framework design guideline with respect to interface inheritance from IDisposable, but there probably ought to be.
    Interfaces that are likely to be implemented by types that will need to be disposable should themselves be disposable. This is because you don't want to end up having to write code that looks like this:
    IShouldBeDisposableButIsNot thingy = Factory.CreateThingammy();
    try
    {
    	//...
    }
    finally
    {
    	IDisposable disposableThingy = thingy as IDisposable;
    	if (disposableThingy != null) disposableThingy.Dispose();
    }
    Instead, you want to be able to simply write code like this:
    using (IShouldBeDisposableAndIs thingy = Factory.CreateThingammy())
    {
    	//...
    }

The built-in rules

Here are the built-in rules that you'll want to start activating during the disposition and finalization theme:

Category Rule In FxCop 1.35? In VStudio 2005?
Design ImplementIDisposableCorrectly x x
TypesThatOwnDisposableFieldsShouldBeDisposable  x x
TypesThatOwnNativeResourcesShouldBeDisposable x
Performance  DisposeMethodsShouldCallSuppressFinalize x x
RemoveEmptyFinalizers x  
Reliability  DisposeObjectsBeforeLosingScope   x
 Usage DisposableFieldsShouldBeDisposed x x
DisposableTypesShouldDeclareFinalizer x x
DisposeMethodsShouldCallBaseClassDispose x x
DoNotDisposeObjectsMultipleTimes x
FinalizersShouldBeProtected  x x
FinalizersShouldCallBaseClassFinalizer x x

Some custom rule ideas

We haven't created any custom rules yet around finalization and disposition at FinRad, but there are at least a couple that I would like to implement eventually:

  • IDisposable.Dispose should be called from within a finally block (and the only other code in the finally block should be a cast to IDisposable)
  • An interface with at least one disposable implementation should itself be disposable

That last one is bound to be interesting to try to author, but I've got loads of time to figure out how to implement it since there are quite a few more important rules to tackle first...

Posted by calinoiu | 2 comment(s)
Filed under:

With profound apologies to VB lovers, there are a few features of the VB compiler that occasionally make me want to start drinking at a very early hour of the day. Perhaps the most troublesome of these is its failure to start screaming bloody murder upon detection of namespaces that differ only by case, even while compiling an assembly that is marked as CLS-compliant. VB 2005 is at least kind enough to conserve the casing from the source code. However, VB 2003 seems to more or less randomly pick one case version and apply it throughout an assembly, and that leads to all sorts of fun and games.

If you've developed mixed VB and C# solutions, chances are pretty good that this isn't exactly news to you since C# tends to get mighty unhappy the resulting changes in namespace casing. However, you might be blissfully unaware that FxCop is also case-sensitive, and that has its own special set of consequences...

At FinRad, we're still supporting applications that run against .NET 1.1 (and will be for at least a couple more years), and a very large part of that code base is written in VB 2003. Given that this is .NET 1.1 code, we're using the stand-alone version of FxCop for static analysis, and exclusions are stored in the FxCop project file, not as SuppressMessageAttribute annotations in the source code.

Given that this also happens to be the same code base covered by the bulk of our backlog cleanup efforts, the FxCop project files contain an unusually large number of exclusions. (I'll write about the reasons for this soon. For now, just trust me: lots more violation exclusions than you might normally expect.) Every now and then, we would end up with a situation where a sizeable chunk of our exclusions would suspiciously disappear, and all the violations they covered would auto-magically become active again. Needless to say, this was terribly annoying, but we chalked it up to gremlins, manually re-created the exclusions, and forged bravely ahead.

A few weeks ago, we hit a really weird problem where an automated build running on an integration server was detecting active FxCop rule violations that couldn't be duplicated on any developer machines. Several of us tried clean check outs, rebuilding, etc., but without any success at reproducing the problems seen in the automated build. In retrospect, it seems painfully obvious, but the namespace casing problem (which also tended to raise its ugly head in a Visual Studio vs. automated build disjunction with respect to C# callers) didn't occur to me until I was stuck in traffic that evening. It was the first thing I checked the next morning and, sure enough, the exclusions were registered under the "Something" namespace, but the classes compiled in the automated build now fell under the "SomeThing" namespace.

Some lessons learned:

  • FxCop is case-sensitive.
  • The gremlins actually lived in the VB compiler. (Luckily, they've been evicted, but the nasty smells are lingering.)
  • Giving an "aha!" smack to one's forehead while stuck in traffic will earn some very concerned looks from drivers of neighbouring vehicles.

Posted by calinoiu | 2 comment(s)
Filed under:

Since I started monitoring traffic on this blog a little more closely about a week ago, I had the unexpected surprise that the posts on HTML encoding and server vs. client cultures were getting a lot more hits than I expected. I had been planning on starting a series of "how to" posts on those topics this weekend, but that was before David Kean from the FxCop team was kind enough to direct a bunch of folks my way with a post about my recent FxCop posts. Since it would seem that I've now got quite a few new subscribers who are interested in FxCop, I figured I might as well continue on that theme for a while...


Jumping the gun

I've got exception stuff on the brain this morning, so that's what I'm going to write about. However, if you have decided to tackle an FxCop backlog, exceptions are probably not the first major theme you should cover. Then again, even assuming that my recent posts have inspired you to immediately start a backlog winnowing project, you should still be at least a month away from covering any major theme. I'm going to try to write about the much easier disposition and finalization topic within the next couple of weeks. Heck, it might even happen tomorrow morning if I don't get coerced into some sort of housework. In the meantime, just keep in mind that the sequence in which I end up covering the topics here isn't necessarily the sequence in which you'll want to cover them in your backlog process.


Starting from scratch

Unless your team already has very specific guidelines in place around the generation and handling of exceptions (along with training and code reviews to reinforce those guidelines), it's pretty safe to assume that there is a very wide variety of exception treatment in your code base. In addition, the developers on your team likely have similarly disparate understandings of when and how exceptions ought to be generated and handled. Instead of jumping right into covering the existing FxCop rules around exceptions, it would probably be a much better idea to start with some training around the basics of exceptions. Chances are excellent that individual developers will be carrying quite a bit of baggage based on their experiences with exception treatment in other languages or platforms, and you will most likely need to devote quite a bit of time to dispelling inappropriate preconceptions.

Here are a few resources that you may wish to consult during your exception training:

If you're concerned about globalization and localization issues, you might also want to have a look at my post on client vs. server cultures. The disparity between application user and administrator languages can present challenges when emitting localized exception messages for any application, and it's something you may wish to consider as you develop your internal exception generation guidelines, even if you don't happen to be writing web applications.

The built-in rules

As part of the exceptions theme, you'll want to start activating the FxCop rules that touch on exception generation and handling. The following table lists the rules available in FxCop 1.35 and Visual Studio 2005 static analysis:

Category Rule
Design DoNotCatchGeneralExceptionTypes
ExceptionsShouldBePublic1
Security CatchNonClsCompliantExceptionsInGeneralHandlers1
WrapVulnerableFinallyClausesInOuterTry
Usage DoNotRaiseExceptionsInFilterBlocks
DoNotRaiseReservedExceptionTypes
InstantiateArgumentExceptionsCorrectly
RethrowToPreserveStackDetails

I suppose that it could be argued that Usage.DoNotIgnoreMethodResults might also be worth including while activating exception rules. If your team is resistant to the idea of throwing exceptions instead of using failure return codes, then a bit of time cleaning up violations of this rule might be just the thing to convince them. Otherwise, it's probably safe to activate this rule whenever it otherwise seems reasonable during your backlog process. If you're following the guidelines, you'll probably want to fix most violations of this rule by modifying the invoked method to throw exceptions rather than return a status code, which is a breaking change for public methods, and scheduling breaking changes can be a wee bit tricky.


Custom rules

If you've read the design guidelines concerning exceptions, it should be pretty obvious that the above eight rules don't exactly cover all the potential problems for which you might want to screen. Here are a few custom rules for exceptions that we're either already using at FinRad or will be introducing once we get a little further along in the backlog cleanup:

  • Do not throw System.NotImplementedException
    Pretty obvious, although you'll occasionally need to create temporary exclusions, particularly for when someone checks in a class skeleton to be fleshed out by somebody else. Also, as a bit of side note, when you introduce this guideline, you might want to stress that the exception signifies that the code is not implemented yet, not that it's not implemented here. For "not implemented here", you should be using System.NotSupportedException instead.
  • Methods that throw System.NotSupportedException should not include other instructions
    Simple as it might seem, this rule is a bit tricky to author since the VB compiler has the nasty habit of adding extra instructions (e.g.: return statements) to methods. Also, one needs to keep in mind that a custom message might be provided for the exception, and that message might be retrieved via a method call.
  • Throw System.ComponentModel.InvalidEnumArgumentException for an invalid enum argument value
    Not really relevant to the rule itself, but it's really annoying that this particular exception lives in the System.ComponentModel namespace rather than System.
  • An argument exception should be thrown directly from the method whose argument is being validated
    I had hoped that the parameter name verification on Usage.InstantiateArgumentExceptionsCorrectly would cover this issue but, after looking at the rule logic via Reflector, I've decided that a custom rule is going to be need for this one. I'm planning on activating it at the same time as Design.ValidateArgumentsOfPublicMethods (which will definitely be happening before we move to Orcas <gdr>).
  • Use recommended exception logging methods
    In order to make logged exceptions more noticeable, we decided to create a new helper method that is capable of writing exception details to both a central repository and whatever target location was previously used. The decentralized logging problem was the consequence of an application architecture that you may not share, so this might not be particularly relevant for your team. However, if you do have similarly decentralized logging, you may want to consider a similar approach. If you do decide to add a special exception logging method, then you'll probably also want to add a custom rule to verify that it's being used.
  • Review use of System.Diagnostics.Debug methods
    Some folks have a tendency to use Debug.Assert or Debug.Fail where they really ought to be throwing exceptions.
  • Review catch without throw
    Last but definitely not least, this is our "do not eat exceptions" rule. Unfortunately, fixing an undesirable violation of this rule can be very time-consuming, but discovering places where an application is hiding exceptions has tremendous impact on reliability, and the short-term cost is well worth the long-term benefit.

Personally, I'd love to see at least the more generally useful rules from the above list make it into the rule set shipped by Microsoft. I seem to remember seeing something from the code analysis team that promised some new rules around exceptions for Orcas, but I can't find it anymore, and I haven't seen any new exception rules either. It's a pity, because I was really hoping to get rid of my custom rules, particularly the less than ideal verification for other instructions in methods that throw System.NotSupportedException...




1This rule is not available in Visual Studio 2005, but it is shipped with both FxCop 1.35 and Orcas beta 1.

Posted by calinoiu | 3 comment(s)
Filed under:

If you've decided to try to tackle an FxCop violation backlog, one of the first issues you're going to face is deciding which rules to activate when. Here are some general guidelines...


Starting out

When you first begin the backlog clean-up process, you're going to need to introduce the FxCop tool to your team (assuming, of course, that you're not already using FxCop for new projects). In order to focus on mastering the tool and the cleanup process before diving into "difficult" rules, you'll want to pick a few rules to activate that meet the following criteria:

  1. The rule itself should be easy to understand with minimal background explanation.
  2. Violations of the rule should be easy to fix.
  3. Your code should contain very few violations of the rule.

My favourite introductory rule is DoNotConcatenateStringsInsideLoops. Unfortunately, it's slated to disappear in Orcas, so you might want to get a start on that backlog cleanup soon if you want to use it. ;) For finding other candidate "early" rules, the best approach would generally be to count your violations per rule, then look for any rules with a violation count of between approximately 5 and 20 that also meet the "easy to understand" and "easy to fix" criteria.


If it ain't broke, it don't need fixin'...

Another thing you'll want to do fairly early in the process is activate rules for which you have no existing violations. Even though this lock down might seem very important with respect to preventing additional backlog problems, it should probably wait until you're a few weeks into the cleanup process, and the team is starting to get reasonably comfortable with the whole thing. Don't worry too much about this delay causing new rule breaking -- if you've gone this long without breaking those rules, chances are pretty slim that you'll break them in the next three weeks or so. ;)

Simple as it may seem, there are still a few things to consider as you select "no violation" rules to activate:

  1. Some of these rules will have no violations because your team has been actively avoiding some problem. Take this opportunity to highlight that they've been doing the right thing and confirm what that "right thing" is since newer team members may not be aware of it.
  2. Some of the rules will have no violations because your project doesn't touch on the area covered by the rule. For example, if you don't interact with native APIs, there are several rules around p/invoke calls that you won't have broken. Given that your team is unlikely to start breaking these "irrelevant" rules, you can safely activate them without too much explanation. However, you should at least mention that you are activating the rule, give a quick description of its significance, and explain why it's not likely to be violated in the target code base. This approach will ensure that the team is aware of the rule should it eventually become more relevant due to changes in the project scope.
  3. Some rules (but not very many) will have no violations almost by chance alone and will require more in-depth explanation than you may wish to allot this early in your process. You might want to consider delaying activating such rules until you have a more appropriate opportunity to provide more in-depth training on the rule.

Mix and match

As you start activating rules, you'll soon realize that some rule violations are a lot quicker and easier to fix than others. For example, fixing a violation of RethrowToPreserveStackDetails usually takes seconds, while there are a few rules for which a fix may take hours. It's important that your immediate fix backlog contain a mix of problems, both with respect to difficulty and fix duration. The quick and easy fixes will provide small tasks that any developer will be able to perform, and they're nice little "fillers" for those five minute blocks when you're waiting on something like a pre-commit test run for another task. They'll also help keep the momentum going when you've got a backlog of "big" problems to address. If you only have big problems in the backlog at any given time, the overall fix rate will be slow, and that's just plain depressing for everyone.

Some fixes will require considerable knowledge of a particular area of your product, and one approach to handling these would be have developers fix such problems in their own areas of responsibility. However, you may also want to view such fixes as an opportunity for a bit of cross-training. Either way, you should make sure that your immediate backlog doesn't only contain violations against a limited subset of your product. Any given developer on your team should be able to draw problems from the backlog pool at any given time.


Slow down, you crazy child

Now and then, your immediate fix backlog is going to grow just a bit too big. This usually happens because you've activated a rule with a large number of violations and/or for which violations are expensive to fix. In order to prevent the backlog from growing even more, you're going to have to stop activating new rules for a while. There are basically two things you can do to avoid losing momentum because of this hiatus. The first is pretty trivial, and simply involves having developers fix violations during the time that would normally be allotted to introducing new rules. If you're concerned that the fix rate has dropped off because developers have fallen out of the habit of working on the backlog, then having everyone work on fixing problems at the same time like this can be one way of "rebooting" the process. However, if developers are already spending the time they ought to be on the backlog, a fix session like this isn't necessarily a very effective use of anyone's time.

The other "delaying tactic" is to treat the increased between-rule activation delay as an opportunity to provide more in-depth training, particularly for rules where a lot of background information may be required. It can help quite a bit to divide such rules into themes, with background training on any given theme spanning several weeks. A few obvious themes are:

  • finalization and disposition,
  • exceptions, and
  • code access security.

Custom rules

As you tackle your backlog, you'll probably start to identify some less than desirable usage patterns that aren't covered by existing rules. While it can be quite tempting to immediately activate a new custom rule based on such a discovery, don't forget that you have plenty of built-in rules that aren't yet active either. Your custom rule should go into the same backlog "pot" as the built-in rules, and it should be activated when most appropriate to your backlog process, which may or may not be when you first discover the problem that the rule covers.


How far, how fast?

Obviously, the rate at which you will be able to activate rules will depend quite a bit on the size of the backlog and rate at which your team will be able to fix violations. However, you should also expect the activation rate to drop off over time as you start to activate rules that are difficult to fix and rules with large numbers of violations. Given that you may also start adding custom rules, the proportion of rules activated may drop even though you're not inactivating any rules. Nothing exactly surprising in any of that, but it does mean that you shouldn't use the rate of rule activation as an important indicator of your backlog clearing progress.


Posted by calinoiu | 3 comment(s)
Filed under:

Surprise! (not the good kind)

If you use FxCop or Visual Studio Static Analysis and haven't yet started playing with Orcas, you may be in for a bit of an unpleasant surprise. While the code analysis team is doing all sorts of interesting things for Orcas, one somewhat less desirable change you probably haven't heard about yet is removal of the control flow engine and, consequently, the following rules which depend upon it:

Category Rule
Design ValidateArgumentsOfPublicMethods
Globalization DoNotPassLiteralsAsLocalizedParameters
Performance AvoidUnnecessaryStringCreation
DoNotCallPropertiesThatCloneValuesInLoops
DoNotConcatenateStringsInsideLoops
Reliability DisposeObjectsBeforeLosingScope
Security ReviewSqlQueriesForSecurityVulnerabilities
Usage AttributeStringLiteralsShouldParseCorrectly
DisposeMethodsShouldCallBaseClassDispose
DoNotDisposeObjectsMultipleTimes
LiteralsShouldBeSpelledCorrectly
ProvideCorrectArgumentsToFormattingMethods

The reasons given for removing the engine are basically that it's slow, buggy, and difficult to maintain. Well, we've all had to deal with that sort of problem before, but most of us haven't have the luxury of removing the nasty, old implementation without providing a replacement. I can certainly understand that the code analysis team might be under pressure to rapidly address performance problems introduced by an engine that slows any use of their product by a factor of 2 or more, but this strikes me as a case of throwing the baby out with the bath water.

The one piece of good news is that there seem to be plans to eventually replace the engine. However, we supposedly won't be seeing the new implementation until at least the post-Orcas release of Visual Studio, which very likely means waiting two years or more. A couple of things to consider:

  1. How big a backlog of violations against the missing control flow rules do you think your team will likely create in two years?
  2. Given that the code analysis team apparently considers it acceptable to remove this feature set without providing an immediate replacement, how confident are you in their ability to keep development of the replacement at a high enough priority level that it will actually ship in the post-Orcas release of Visual Studio?

On again, off again...

Let's assume for the moment that the performance hit caused by the control flow engine really does prevent some folks from using FxCop. Even if it doesn't, performance improvements are probably pretty high on many folks' wish lists for FxCop, and it would seem likely that the code analysis team has to put up with a lot of complaining on that front. In addition, there are known bugs in the control flow engine and rules, and those problems may cause both some minor inconvenience as well as false confidence that rules are being applied where they're not. Are these really sufficient justifications for eliminating the control flow engine entirely in Orcas?

Rather than removing the engine, I would much rather see it become an optional, disabled-by-default component. This scenario would remove the potential performance obstacle for new users while still allowing existing customers to at least keep up the same level of screening that was previously applied. If there is concern on the part of the code analysis team that users aren't aware of the buggy, partial screening against control flow rules, the feature that allows enabling of the control flow engine could include whatever warnings they deem necessary.

Given that Orcas is already in beta, it seems likely that re-introducing the control flow engine might be too big a change to support. If so, another possibility might be to make it available only in the stand-alone FxCop product. Obviously, this would be a little less convenient for users of Visual Studio Code Analysis. However, it would at least allow such users to make their own performance/convenience/quality trade-off decisions rather than having a potentially unsuitable decision imposed upon them by the complete absence of the engine.

I've already submitted a suggestion regarding re-introduction of the control flow engine as an optional component on the Microsoft Connect site. If you're concerned about the removal of the engine, perhaps it's time to consider voicing your opinion there as well...

Posted by calinoiu | 3 comment(s)
Filed under:

A few months ago, I gave a presentation on using FxCop at the Montreal Visual Studio Users Group. The material was divided into two main topics: (a) the mechanics of using FxCop and (b) integrating FxCop use into a development process. During the first part of the talk, some members of the peanut gallery kept piping up with questions about what one can do to handle the huge number of FxCop rule violations that an existing code base will have when one first runs FxCop against it. Lucky for me, most of the second part of the talk covered exactly that problem, and I managed to finish the evening without having any rotten produce lobbed in my direction. However, their questions did confirm my suspicion that dealing with the backlog problem is a topic of potential interest to folks, so it's one I might as well spend a bit of time writing about...


No backlog?  No problem.

Before tackling the hard part of the backlog problem, let's look at the easy part. If you're starting a new project, the easiest way to handle the backlog issue is to never develop a backlog in the first place. The way to do this is to start running FxCop with all rules activated from the moment you first set up your project(s) and to fix problems (or flag true exclusions) before each and every check-in.

This may seem like a lot of work and, to be honest, it will almost certainly add noticeable overhead at the very beginning, but the additional effort required will probably taper off more quickly than you might expect. Within a week or two, most individual developers should be over the learning curve "hump" with respect to the FxCop tool itself. What might surprise you is that it probably won't take much longer for most developers to become familiar with the rules that touch their own work and to stop introducing new violations of those rules. This will, of course, vary depending on the spectrum of tasks that any given developer performs, but most developers in anything but the very smallest shops are likely working on fairly similar tasks from one week to the next.

Another issue on the "no backlog" side of things is that you might need to occasionally create temporary exclusions (e.g.: when you've added the first class to a namespace, but the others that would eventually prevent Design.AvoidNamespacesWithFewTypes from firing don't yet exist), and allowing these to accumulate could also create a backlog problem. The way to prevent this is to avoid them where possible and clean them up as soon as possible when their creation can't be avoided. I would recommend creating a project work item for removing any given temporary exclusion immediately before adding the exclusion itself. A final check that they have all been removed should be part of a pre-release checklist, and the release in question should be the earliest you can isolate (e.g.: internal release to QA).


And now for the hard stuff...

If, like many folks with a medium-to-large existing code base, you've run FxCop and been overwhelmed at the thousands or tens of thousands of violations that it spits out, what can you possibly do? The most common reaction seems to be essentially giving up on the idea of using FxCop for that product. That's unfortunate, partly because the existing problems will go unaddressed until they manifest as user-noticeable bugs, but also because the product will continue to accumulate new problems as development progresses.

Another path is to disregard the existing backlog and follow the "no backlog" approach described above for new development. Since the FxCop team has produced a tool to help facilitate this approach, I presume that it is a relatively common choice. Unfortunately, while it does avoid the problem of accumulating new violations, it doesn't address the existing backlog particularly well.

So... Can one do something to both prevent new problems and clean up the old problems? At FinRad, we started a process in late September to do just that. One of the things that was immediately obvious was that "hiding" the existing backlog wasn't likely to be terribly helpful if our goal was to eventually clean it all up. Part of the problem is that folks would probably become rapidly inured to the large volume of existing exclusions. Another, potentially more important issue, is that a big backlog is just plain too depressing.

Rather than activating all rules at the beginning and excluding all existing rule violations (even temporarily), we decided to adopt a process whereby rules are activated one-by-one. Once a rule is activated, its existing violations are added to the immediate backlog and nobody is allowed to introduce new violations. In addition, we decided to provide training on the newly activated rules rather than expecting individual developers to do all the rule reason and fix approaches research on their own. The basic outline of this process is:

  1. Every week, a new set of rules to activate is selected.
  2. We have a one hour training session on the new rules. This training includes:
    • any background information that is required to understand the rules,
    • the specific problems that the rules are meant to address,
    • recommended approaches to fixing rule violations, and
    • reasons for which it is acceptable to create exclusions for violations.
    In addition, progress statistics are presented at each training session so that everyone is aware of where we stand with respect to the existing backlog.
  3. Immediately after the training session, the new rules are activated, and existing violations are added to the backlog by creating "TODO" exclusions.
  4. As of the activation of a rule, no new violations are accepted.
  5. Each developer is supposed to spend two hours per week fixing rule violations from the backlog.

In practice, this exact process isn't followed to the letter every week. Besides the obvious impossibility of grabbing three hours of each developer's time every single week, there are also all sorts of picky details around which rules can/should be activated when, and what one should do to keep the momentum going when the size of the backlog jumps because of the activation of a single rule. There's also the question of what sort of hit we're taking with respect to new violations against the rules that haven't yet been activated. However, I think I'll keep all of that as meat for further blog posts for now...

Posted by calinoiu | 11 comment(s)
Filed under:

Since I posted an FxCop rule sample over at bordecal.mvps.org, it's rapidly become the most popular content on the site. Not something I expected but, given that I have trouble coming up with blogging topics and there seems to be some interest in FxCop, I figured I might as well spend a lovely Saturday morning writing about it... (Actually, I'm pretty much just killing time while waiting for a tire change, so please feel free to keep that "get a life" comment to yourself. <g>)

There's all sorts of stuff I could (and will try to) write about FxCop use, but one thing that jumped out at me this week was quite a few searches for "no FxCop rules could be found" landing on my rule sample page, so I guess I'll start with that...

There are several reasons why FxCop might have trouble finding rules in a custom rule assembly:

  1. There are no rule classes.

    If you haven't added any rule classes to your assembly, it shouldn't be too much of a problem to figure out why FxCop can't find them. ;)

    Another problem might be that your rule classes don't implement the necessary rule interfaces. If you've done the usual thing and created your rules as subclasses of Microsoft.FxCop.Sdk.Introspection.BaseIntrospectionRule, then you've got the interface problem covered. However, if you're rolling your own rules from scratch, you're probably going to have to do quite a bit more work... (Which I've never done, so I have no idea what's the minimum required. My first guess would be IRule, but that seems rather useless since there are no problem collections exposed until one gets to IControlFlowRule or IIntrospectionRule.)

  2. You've forgotten to add an embedded rules resource file.

    This is probably the most common problem for beginners. If you're not sure what the rule resource file should contain, grab yourself a copy of Reflector, and take a look at the Microsoft.FxCop.Rules.Design.DesignRules.xml embedded resource in the DesignRules.dll file that ships with FxCop or Visual Studio Static Analysis.

    Also, remember that this file must be set to be an embedded resource. If the file doesn't end up in your compiled assembly, FxCop won't be able to find it.

  3. You've incorrectly identified the embedded rules resource file in which the data for your rule class can be found.

    Once you've created your embedded rules resource file, this may be the next problem you hit. If you're inheriting from BaseIntrospectionRule, you must pass the full resource file name (minus the .xml extension) when invoking the protected BaseIntrospectionRule constructor. Since all the custom rules in a given assembly will typically use the same rules resource file, one would usually create a base class from which they would all inherit. The rule base class might look something like this:

    namespace Bordecal.FxCop.Rules
    {
    	internal abstract class IntrospectionRule : BaseIntrospectionRule
    	{
    		protected IntrospectionRule(string name)
    			: base(name, "Bordecal.FxCop.Rules.Rules", typeof(IntrospectionRule).Assembly)
    		{
    		}
    	}
    }
    

    This avoids requiring all rules to provide the resource file name, but it doesn't help with figuring out what the file name ought to be in the first place.

    The bad news is that inferring the resource file name can be a bit tricky. It will depend on your project's default namespace, as well as on where the rule file resides within your project's directory hierarchy. The good news is that you don't need to infer the name at all. Instead, simply open your rule assembly in Reflector, and copy the full name of your resource file (remember to omit the .xml extension):

Posted by calinoiu | 2 comment(s)
Filed under:

I've been getting a small but steady trickle of requests for new ImportsSorter features and source code availability, so I created a CodePlex project for it a while back at http://www.codeplex.com/ImportsSorter.  Given that it took me almost two months to get around to writing a teensy little announcement about it to post here, there's probably good reason to expect that it might take me a while to actually implement any new features. ;)  However, if you want to request one, http://www.codeplex.com/ImportsSorter/WorkItem/List.aspx is the place to do it.

If you can't be bothered to wait around for me to add your feature, you may wish to grab the source code from http://www.codeplex.com/ImportsSorter/SourceControl/ListDownloadableCommits.aspx.  Just a little warning: it's in need of a fairly major clean-up, and you're pretty much on your own until I find some time to do that...

Posted by calinoiu | 2 comment(s)

I've frequently wondered why it is that folks that very heartily embrace a particular software quality attribute seem to lose all interest in any others. Now, if you've read any of my other postings, you're probably wondering what right a security wonk like me has to be casting aspersions on the hue of anyone else's kettle. Well, the truth is that, while I happen to be very interested in security, that interest doesn't mean that I necessarily value security over other quality factors. It simply means that I want to know how to evaluate and implement security aspects of software appropriately, not that I think that it's always of high importance on any given project.

Unfortunately, I've been seeing quite a bit of tendency recently toward systematic promotion of a given quality factor over all others, and Roy Osherove's recent promotion of "Testable Object Oriented Design" (or "TOOD") struck a bit of a nerve for me. Personally, I'm a great fan of testability, but I'm less than thrilled with seeing it promoted at the expense of other quality factors, and Roy's post is far from the first time that I've seen this occur.

But isn't it a cute little TOOD?

So what's wrong with what Roy proposes anyway? If making everything public, virtual, and injectable/swappable by default increases testability, why not just go ahead and do it? Unfortunately, there are three fairly large problems that seem to escape most folks who are focused on testing:

  1. Increasing the public (including virtual) portions of your API has a negative impact on its maintainability.

    Once something is public, changing it becomes a breaking change with potential impact on existing consumers. If someone uses my private classes and/or methods, and I make a change to the private interface, and their code breaks, that's their problem. However, if I make something public or virtual, I become responsible for ensuring that the interface does not change since anyone might be using any part of it. Maintaining that public behemoth could become a very big and ugly problem for me over the long term, and that's a pretty nasty hit to take for the sake of testability.

  2. Increasing the public portions of your API has a negative impact on its supportability.

    If you publish an API, you've pretty much got to document every public corner of it. Otherwise, customer inquiries (and often complaints) are going to start piling up regarding the lack of documentation around bits and pieces that folks want to use. You'll eat team time either creating documentation or dealing with customer issues around lack of documentation, and I'd rather spend that time tackling the backlog.

  3. You need to do a lot more testing if you want to increase your public and virtual APIs without a deleterious effect on correctness/reliability.

    Roy hits this point on a tangent when he mentions that one might sometimes want to seal a class for security reasons. However, the problem isn't just restricted to potential security risks. For example, when one extracts a private method to decrease the size or complexity of another method, one generally doesn't bother validating the inputs to the new private method. (By the way, this isn't just laziness – quite often adding the validation would have a negative impact on performance, particularly if the extracted method is invoked from within a loop.  And guest what?  Performance just happens to be another of those pesky quality factors that compete for our attention on any given project.) However, if one makes the method public, one becomes responsible both for validating its inputs and testing not only the "happy path" functioning of the method, but also the response of the method to bad inputs and bad state.

    The testing story gets even more complex when one considers potential inheritance. If you don't seal a class, you really ought to be testing that it behaves as expected regardless of whatever bits of oddness a subclass might implement. Some subclass misbehaviour may be intentional (causing the potential security problems to which Roy refers), but most will likely be unintentional issues introduced by subclass developers who try to do things that you never expected. Unfortunately, designing tests to cover such scenarios is quite difficult and time-consuming, and the chances of achieving decent coverage are pretty slim for anyone without a truly devious mind.

    To be honest, I've never seen anyone do extensive testing around subclassing scenarios, and I certainly haven't seen much mention of it in the TDD literature (not that I've been reading quite as much as I probably ought to in that space). My guess would be that there's very little such testing going on out there, which make recommendation of making everything virtual by default for testability reasons strike me as a wee bit hypocritical...

Back to the tadpole

In case you missed it, Roy's TOOD post was provoked by news of the restriction of visibility and removal of functionality upon which his FxCop rule testing approach depends from the next release of FxCop. Roy's opinion is that the FxCop team has "decided to remove the little tiny shred of testability and automation" from their product. Them strike me as fighting words, and I feel that the FxCop team deserves a bit of support on this one.

For starters, restriction of visibility may be an inconvenience, but it's certainly not a complete impediment to testing. With a bit of help from reflection, one can happily invoke those newly private methods to one's heart's content. Sure, it's a bit more work and there's a performance hit to take, but that's a far cry from leaving not a "little tiny shred" of testability.

More importantly, there's actually a very effective means of testing FxCop rules that remains available regardless of what the FxCop team might do to the innards of their assemblies. Instead of attempting to evaluate individual rules against individual targets for each test, why not process an FxCop output report and test its data against expected results? This has a few potential benefits over Roy's proposed approach:

  1. It's resistant to changes within the FxCop assemblies since it only deals with the output. (It's even reasonably resistant to changes in output format since, as long as the format doesn't change to the extent of modifying the contained data, it can be fairly easily transformed to one's preferred processing format.)
  2. Its overall performance should be better since the FxCop engine doesn't need to be restarted for each rule by target combination.
  3. It's more thorough since each rule is tested against all available targets, thereby ensuring that violations aren't created where they're not expected.
  4. Adding new tests is easier since one only needs to specify expected violations, not select and specify rule by target combinations for test runs.

Now, this might not be quite the way that Roy would prefer to test his FxCop rules, but it works quite well, and it's not particularly difficult to implement. Given this, doesn't it seem more than a little harsh to accuse the FxCop team of making it next to impossible to test custom rules? Besides which, if there were one thing you could ask the FxCop team to work on, would it really be enhanced testability, or might it be the fabled SDK?

Posted by calinoiu | 1 comment(s)
Filed under:

The problem

The ability offered by .NET to set a thread-level culture then automatically format and select localizable resources using that culture's settings is wonderful stuff. Unfortunately, it's an approach that plays out quite a bit better in a client-side application than in a server-based application. The reason for this lies in the nature of the work one performs in a server-based application: some formatting and/or rendering is intended for consumption by client applications, but some (e.g.: log entries) is intended for consumption on the server.

Things tend to muddle along just fine as long as both the client and the server use the same or similar cultures. This is helped along by the fact that most servers are unlikely to have additional .NET language packs installed, so any exceptions logged on the server are likely to contain English error messages. However, there are plenty of cases in which it might be necessary to install additional language packs on the server (e.g.: server and/or application administrators don't speak English), so it might be a wee bit naïve for a software vendor to assume that the only .NET language supported on a server is one that will be understood by folks who read text generated for comsumption on the server.

In case this all sounds a bit odd, let's take a look at some concrete examples. Our base scenario is a localizable ASP.NET application that, as is typical, sets the current thread's CurrentCulture and CurrentUICulture properties to match the preferred culture of the client user. Amongst other things, this application happens to create a log entry every time an invoice is created via its UI. e.g.:

	
private void LogOrder(int invoiceId, double price)
{
	string logText = string.Format("Invoice {0:N0} for {1:C} created at {2:G}.",
		invoiceId, price, DateTime.Now);

	// Text written to log here...
}
	

If the application has set the current thread's CurrentCulture property to the client's preferred culture, here is what will be written to the server-side log for various values of the client culture:

Client culture Logged text
en-US Invoice 21,456 for $5,000.00 created at 5/6/2006 9:30:00 AM.
en-CA Invoice 21,456 for $5,000.00 created at 06/05/2006 9:30:00 AM.
en-GB Invoice 21,456 for £5,000.00 created at 06/05/2006 09:30:00.
fr-FR Invoice 21 456 for 5 000,00 € created at 06/05/2006 09:30:00.
fr-CA Invoice 21 456 for 5 000,00 $ created at 2006-05-06 09:30:00.
es-ES Invoice 21.456 for 5.000,00 € created at 06/05/2006 9:30:00.
es-MX Invoice 21,456 for $5,000.00 created at 06/05/2006 09:30:00 a.m..
ja-JP Invoice 21,456 for ¥5,000 created at 2006/05/06 9:30:00.

The above table shows some of the sorts of trouble that one can get into applying client-side culture options to formatting of server-side texts, and things will only get worse when/if digit substitution is fully implemented. The problem is compounded by the fact that the client-specified culture used to apply formatting is no longer known at the time one reads the server-side text. Of course, formatting as in the above example is a mistake, and there are ways to avoid the problem, but more on that after we look at something a little more difficult to work around.

Like many applications built upon it, the .NET framework emits a variety of texts based on localized resources. The examples with which many developers will likely be most familiar are exception messages. In fact, in most applications, a very large majority of exception messages will have been generated within the .NET Framework base class libraries. Let's say that your application happens to be running on a server with the Hungarian language pack, and an exception is logged while a Hungarian client is using the application. Would you have any idea what the exception message "Nem megfelelő a bemeneti karakterlánc formátuma" means? (It happens to be the Hungarian version of "Input string was not in a correct format", but BabelFish won't exactly be helping you figure that one out...)

To add to the fun, some Framework-generated texts (including exception messages) are automatically formatted from resource strings containing placeholders. Since you cannot control the formatting applied in these instances, you may run into interpretation issues even if you don't have any additional language packs installed. For example, if you see an exception message containing the number "2,345", how can you tell if this was a 2345 written under an American culture option or a 2.345 written under a French Canadian setting? Sometimes context may help, but other times you'll just plain have to guess.


The workaround

So what can we do to avoid the problem? The easy answer is to not use the thread culture as a store for the client's preferred culture. This, of course, means that we would then need to perform explicit formatting of any non-string values before passing them to UI elements for rendering. While somewhat burdensome, particularly for folks who like living the RAD life, this isn't exactly the end of the world. There's also a bit of good news: FxCop can help you identify at least those cases of explicit formatting that are defaulting to using the thread culture.

The first bit of rather bad news with this approach is that you can't use ASP.NET's auto-magical client culture detection since it pops the detected culture into the current thread properties. Things might be a bit different if the ASP.NET team had chosen to make the HttpApplication.SetCulture method virtual, but it's not, so we're stuck implementing our own mechanism from scratch if we want to use an alternate store for the client culture.

The second bit of bad news is that there are several ASP.NET controls that are rather aggressive about using the thread culture internally. For example, when using the CompareValidator with ValidationDataType.Double, it's possible to use culture-specific parsing1. However, there's no way to specify which culture one wants to use for parsing. Unless you choose to use culture-invariant parsing, the thread culture will be used. That means that a French Canadian user's 5,123 would theoretically be interpreted as 5123 rather than 5.123 if the thread culture is set to en-US because that's the appropriate setting for server-side use. However, things are even worse in practice than in theory, and our 5,123 will fail validation since the CompareValidator does not permit use of thousands group separators and will reject the comma if the thread culture is en-US.

For a fully functional workaround, you'll need to implement at least four sets of changes:

  1. Use an alternate store for the client culture,
  2. Create a mechanism for populating this new store with the client-preferred culture (rather than setting Culture and UICulture attributes to "auto"),
  3. Where possible, pre-format non-string values that will be rendered by controls, and
  4. Implement custom versions of control that take the rendering culture as a property rather than using the thread culture exclusively.

The solution

The workarounds are a pack of trouble, and far more work than most folks would be willing to put in unless they've already been bitten by client vs. server culture bugs. The real solution here is for the ASP.NET platform to properly accomodate separate tracking of client and server cultures. Obviously, items 1 and 2 from the workarounds list above would be a start. In addition, it would probably be a great idea if controls defaulted to using the client culture but allowed easy overriding to use either the server culture or a custom assigned culture.



1For some bizarre reason, CompareValidator is strictly culture-invariant for some data types (e.g.: ValidationDataType.Integer), which presents its own set of potential problems. However, this isn't directly relevant to the client vs. server culture issue, so I'll drop it for now. However, if you want to complain, connect.microsoft.com is the place.

Posted by calinoiu | 2 comment(s)
Filed under:

There's a new version of the Bordecal.ImportsSorter add-in available for download. This new version allows shortcut keys to be permanently assigned to the configuration and/or sorting menu items via VStudio options. The hashes for the new MSI file are:

MD5: f89f3c1bfa2a40adbb67315de8fef148

SHA1: c54803ba3392ca68ca29fd4dc9b4b359606f46c7

Posted by calinoiu | 4 comment(s)

The problem

Back when ASP.NET was first introduced, I had pretty high hopes that the new controls would offer support for automatic HTML encoding. Unfortunately, there was very little of this, and most of it was more than a bit lukewarm (more on this later). In some ways, things have improved a bit in v. 2.0, but they're considerably worse in others.

Before you read any further, you might want to ask yourself which ASP.NET controls perform HTML encoding for you and under what circumstances this is done. If the answer doesn't leap to mind, you've perhaps got a first inkling that there might be a little problem with API consistency and/or the documentation. Then again, maybe you've never worried about HTML encoding in your web applications, in which case I'd strongly recommend that you read up on HTML injection and cross-site scripting. A good starting point might be CERT Advisory CA-2000-02.

We'll look at which controls perform HTML encoding soon. First, we're going to need to nail down some conceptual stuff because not all encoding is created equal. You may already be aware that HTML, URLs, and client-side script use different encodings. For the sake of simplicity, the remainder of this post will refer mainly to HTML encoding, although the other two forms of encoding do merit consideration as well.

There is more than one flavour of HTML encoding, even within ASP.NET. The first is exposed via the System.Web.HttpUtility.HtmlEncode methods. These encode the characters >, <, &, ", as well as any characters with codes between 160 and 255, inclusive. The other main encoding flavour used by ASP.NET is "attribute" encoding, which is exposed via the System.Web.HttpUtility.HtmlAttributeEncode methods. In ASP.NET 1.1., these encode the & and " characters only. In ASP.NET 2.0, these encode the characters <, &, and ".

Attribute encoding ought to be a superset of full HTML encoding that also encodes the single quote character in case that's what happens to be wrapping the attribute. However, as you may have noticed from the above, the ASP.NET version of attribute encoding is even wimpier than its full encoding brother. To make matters worse, the full HTML encoding implemented by ASP.NET is no great shakes in the first place. Security isn't the only reason for HTML encoding, and failure to encode everything outside the low ASCII range can impact on page readability when client browsers don't apply the correct code page (which happens more often than you might think, whether it's the client's or the server's fault).

Now that we know what kinds of HTML encoding are available in ASP.NET, let's take a look at the encoding support offered by the built-in ASP.NET controls.  The following table covers some of the more commonly used controls and properties.  (There are, of course, many other controls and properties that one might wish to see encoded, but I've tried to keep the list down to things that most folks are likely to use reasonably frequently.)


Control ASP.NET 1.1 ASP.NET 2.0
Literal None None by default.
HTML encoded if Mode property is set to LiteralMode.Encode.
Label None
Button Text is attribute encoded.
LinkButton None
ImageButton Image URL is attribute encoded. Image URL is URL path encoded then attribute encoded.
HyperLink Text is not encoded.
NavigateUrl is attribute encoded.
Text is not encoded.
NavigateUrl is URL path encoded (unless it uses the BLOCKED SCRIPT protocol) then attribute encoded.
TextBox Single-line text box (input type="text") is attribute encoded.
Multi-line text box (textarea) is HTML encoded.
DropDownList and ListBox Option values are attribute encoded.
Option display texts are HTML encoded.
CheckBox and CheckBoxList Value is not used.
Display text is not encoded.
RadioButton and RadioButtonList Value is attribute encoded.
Display text is not encoded.
Table None
DataGrid None for text columns.
Hyperlink columns follow the pattern for HyperLink controls.
Validators (BaseValidator subclasses) and ValidationSummary Validator display text is not encoded.
For client script, the validator error message and validation summary header text are attribute encoded.
When rendering "populated" validators and the validation summary controls from the server, no encoding is applied.
Validator display text is not encoded.
For client script, the validator error message and validation summary header text are javascript encoded (blacklisting approach).
When rendering "populated" validators and the validation summary controls from the server, no encoding is applied.
HiddenField N/A Value is attribute encoded.
GridView and DetailsView N/A Text fields HTML encode if their HtmlEncode property is set to true. (This is the default, which is also used for auto-generated columns.) However, the null display text for text fields is not encoded even if the field's HtmlEncode is set to true.
Hyperlink fields follow the pattern for HyperLink controls.

Assuming you've actually taken the time to read the above, you might have noticed that there are five basic patterns of encoding usage:

  1. No encoding ever applied.
  2. HTML and/or attribute encoding, as appropriate (with a bit of additional URL and/or javascript encoding applied when appropriate), applied all the time.
  3. Attribute encoding applied for attributes, but no encoding applied for other text.
  4. Optional encoding set via a boolean property, defaulting to applying the encoding.
  5. Optional encoding set via an enumerated property, defaulting to not applying the encoding.

If this strikes you as perhaps a wee bit inconsistent, you wouldn't be alone. Wouldn't it be great to see a consistent approach that telegraphs well and acts as a pit of success? If all the controls performed HTML encoding by default but allowed overriding when necessary (preferably via a single approach), the vast majority of developers writing for ASP.NET would end up generating a more secure, more reliable applications with considerably less effort.

Workarounds

While we're all waiting around for the ASP.NET team to eventually provide reasonable built-in support for HTML encoding, what can we do to ensure that our apps are both protected from HTML injection and character mis-rendering? A good starting point would be to fully encode all data (i.e.: anything not 100% known at compile time, and even some stuff that is) that will be pushed to the client browser. Unfortunately, as was already mentioned above, the built-in encoding scheme leaves a little something to be desired. Luckily, the ACE team folks at Microsoft have been working on a couple of tools that take a more robust approach to HTML (and URL and script) encoding. Rather than blacklisting a fixed set of potentially problematic characters for encoding, they whitelist a set of known safe characters (low ASCII a-z, A-Z, 0-9, space, period, comma, dash, and underscore for HTML encoding) and encode everything else. This quite nicely takes care of both security and appearance issues, and you may wish to seriously consider using this approach rather than calling System.Web.HttpUtility.HtmlEncode to perform your HTML encoding.

Regardless of which HTML encoding approach you select, you're quickly going to run into a bit of trouble with double encoding if you simply start assigning pre-encoded text to control properties (e.g.: someTextBox.Text = HttpUtility.HtmlEncode(someString)). When dealing with malicious input, this is pretty much a non-issue. However, not all data that ought to be encoded is malicious, and you usually wouldn't want users seeing stuff like a &gt; b rather than a > b. Unfortunately, if we want to avoid double encoding in the set of controls that perform non-overrideable encoding (including attribute encoding), we need to use custom controls. To make matters worse, it can require rather a lot of work to subclass most of the controls in order to override the encoding behaviour. In quite a few cases, simply starting from scratch would probably make more sense than trying to subclass the built-in controls. Also, even for those controls where double encoding wouldn't be an issue (e.g.: Label, CheckBox), it's probably worth considering using custom controls anyway since the pain of authoring the custom control isn't likely to outweigh the cumulative effort of all the manual encoding calls you might make across all your projects.

Don't like these workarounds? Maybe it's time to start complaining...

Posted by calinoiu | 19 comment(s)
Filed under:
More Posts Next page »