Lessons learned from Protocol Buffers, part 4: static interfaces
Warning: During this entire post, I will use the word static to mean "relating to a type instead of an instance". This isn't a strictly accurate use but I believe it's what most developers actually think of when they hear the word.
A few members of the interfaces in Protocol Buffers have no logical reason to act on instances of their types. The message interface has members to return the message's type descriptor (the PB equivalent of System.Type), the default instance for the message type, and a builder for the message type. The builder interface copies the first two of these, and also has a method to create a builder for a particular field. None of these touch any instance data.
In most cases this doesn't actually cause any difficulties - we usually have an instance available when we're in the PB library code, and the generated types have static properties for the default instance and the type descriptor anyway. Even so, it feels messy to have interface members which rely only on the type of the implementation and not on any of the actual data of the instance.
I've wondered before now about the possibility of having static members in interfaces - usually when thinking about plug-in architectures - but there's always been the problem of working out how to specify the type on which to call the members. Variables and other expressions usually refer to values rather than types, and System.Type doesn't help as it provides no compile-time knowledge of the type being referred to.
There's one big exception to this, however: generic type parameters. I don't know why it had never occurred to me before, but this is a great fit for the ability to safely call static methods on types which are unknown at compile-time. Furthermore, it could provide a great way of enforcing the presence of constructors with appropriate signatures, and even operators. Before I get too far ahead of myself, let's tie the simple case to a concrete example.
Creating builders from nothing
In my previous post I gave an example of a method which ideally wanted to return a new message given a CodedInputStream and an ExtensionRegistry (both types within Protocol Buffers, the details of which are unimportant to this example). The current code looks like this:
private static TMessage BuildImpl<TMessage2, TBuilder> (Func<TBuilder> builderBuilder,
CodedInputStream input,
ExtensionRegistry registry)
where TBuilder : IBuilder<TMessage2, TBuilder>
where TMessage2 : TMessage, IMessage<TMessage2, TBuilder>
{
TBuilder builder = builderBuilder();
input.ReadMessage(builder, registry);
return builder.Build();
}
For the purposes of this discussion I'll simplify it a little, making it a generic method in a non-generic type.
private static TMessage BuildImpl<TMessage, TBuilder> (Func<TBuilder> builderBuilder,
CodedInputStream input,
ExtensionRegistry registry)
where TBuilder : IBuilder<TMessage, TBuilder>
where TMessage : IMessage<TMessage, TBuilder>
{
TBuilder builder = builderBuilder();
input.ReadMessage(builder, registry);
return builder.Build();
}
The first parameter is a function which will return us a builder. We can't simply add a new() constraint to TBuilder as not all geenrated builders will have a public constructor. However, we do know that the TMessage type has a CreateBuilder() method because it implements IMessage<TMessage, TBuilder>. Unfortunately we don't have an instance of TMessage to call CreateBuilder() on! Really, we'd like to be able to change the code to this:
private static TMessage BuildImpl<TMessage, TBuilder> (CodedInputStream input, ExtensionRegistry registry)
where TBuilder : IBuilder<TMessage, TBuilder>
where TMessage : IMessage<TMessage, TBuilder>
{
TBuilder builder = TMessage.CreateBuilder();
input.ReadMessage(builder, registry);
return builder.Build();
}
That's currently impossible, but only because we can't specify static methods in interfaces. Suppose we could write:
public interface IMessage<TMessage, TBuilder>
where TMessage : IMessage<TMessage, TBuilder>
where TBuilder : IBuilder<TMessage, TBuilder>
{
static TBuilder CreateBuilder();
}
Wouldn't that be useful? The value would almost entirely be for generic types or methods where the type parameter is constrained to specify the relevant interface, but that could arguably still be very handy.
Operators and constructors
At this point hopefully the idea I mentioned earlier of being able to specify operators and constructors is quite obvious. For instance, we could make all the existing numeric types implement IArithmetic<T> (where T was the same type, e.g. int : IArithmetic<int>):
public interface IArithmetic<T>
{
static T operator +(T left, T right);
static T operator -(T left, T right);
static T operator /(T left, T right);
static T operator *(T left, T right);
}
public static T Sum<T>(this IEnumerable<T> source) where T : IArithmetic<T>
{
T total = default(T);
foreach (T element in source)
{
total += element;
}
return total;
}
Plug-ins for a particular program could implement IPlugin:
public interface IPlugin
{
static new (PluginHost host);
string Title { get; }
}
public T CreatePlugin<T>() where T : IPlugin
{
T plugin = new T(this);
log.Info("Loaded plugin {0}", plugin.Title);
return plugin;
}
In fact, I'd imagine it would make sense to define a whole family of IConstructable interfaces, along the same lines as the Func and Action delegate families:
public interface IConstructable
{
static new();
}
public interface IConstructable<T>
{
static new(T arg)
}
public interface IConstructable<T1, T2>
{
static new(T1 arg1, T2 arg2);
}
Inheritance raises its ugly head
There's a fly in the ointment here. Normally, if a base type implements an interface, that means a type derived from it will effectively implement the interface too. That ceases to hold in all cases. You can get away with it for straightforward static methods/properties, in the same way that people often write code such as UTF8Encoding.UTF8 when they really just mean Encoding.UTF8. However, it doesn't work for constructors - they aren't inherited, so you can't guarantee that Banana has a parameterless constructor just because Fruit does.
This is not only a problem for one concrete class deriving from another, but also for abstract implementations of interfaces. This happens a lot in Protocol Buffers; often the interface is partially implemented by the abstract class, with the final "leaf" classes in the inheritance tree implementing outstanding members and occasionally overriding earlier implementations for the sake of efficiency. Should we be able to specify an abstract static method in the abstract class, making sure that there's an appropriate implementation by the time we hit a concrete class? As it happens that would be useful elsewhere in the Protocol Buffer library, but I'll admit it's slightly messy. I suspect there are ways round all of these issues, even if they might sometimes involve restricting the feature to particular, common use cases. However, every niggle would add its own piece of complexity.
There may well be other issues which would prove challenging - and other interesting aspects such as what explicit interface implementation would mean, if anything, in the context of static members. Language experts may well be able to reel off great lists of problems - I'd be very interested to here the ensuing discussions.
Conclusion
I believe that static interface members could prove very useful in generic algorithms, particularly if operators and constructors were allowed as well as the existing member types available in interfaces. There are significant sticking points to be carefully considered, and I wouldn't like to prejudge the outcome of such deliberations in terms of whether the feature would be useful enough to merit the additional language complexity involved. It feels odd to implement an interface member which is effectively only of use when the implementing type is being used as the type argument for a generic type or method, but as developers learn to think more generically that may be less of a restriction than it currently seems.
This post is the last in my current batch talking about Protocol Buffers. There may well be more, but it's unlikely now that the Protocol Buffer port is almost entirely complete. Most of these posts have involved generics, and the current limitations of what C# allows us to express in them. I do not intend to give the impression that I'm dissatisfied with C# - I've just found it interesting to take a look at what lies beyond the current boundaries of the language. The one aspect of these posts which I would definitely like to see addresses is that of covariant return types - the Java implementation of Protocol Buffers is significantly simpler in many ways purely due to this one point.