More fun with DateTime
(Note that this is deliberately not posted in the Noda Time blog. I reckon it's of wider interest from a design perspective, and I won't be posting any of the equivalent Noda Time code. I'll just say now that we don't have this sort of craziness in Noda Time, and leave it at that...)
A few weeks ago, I was answering a Stack Overflow question when I noticed an operation around dates and times which should have been losing information apparently not doing so. I investigated further, and discovered some "interesting" aspects of both DateTime and TimeZoneInfo. In an effort to keep this post down to a readable length (at least for most readers; certain WebDriver developers who shall remain nameless have probably given up by now already) I'll save the TimeZoneInfo bits for another post.
Background: daylight saving transitions and ambiguous times
There's one piece of inherent date/time complexity you'll need to understand for this post to make sense: sometimes, a local date/time occurs twice. For the purposes of this post, I'm going to assume you're in the UK time zone. On October 28th 2012, at 2am local time (1am UTC), UK clocks will go back to 1am local time. So 1:20am local time occurs twice - once at 12:20am UTC (in daylight saving time, BST), and once at 1:20am UTC (in standard time, GMT).
If you want to run any of the code in this post and you're not in the UK, please adjust the dates and times used to a similar ambiguity for when your clocks go back. If you happen to be in a time zone which doesn't observe daylight savings, I'm afraid you'll have to adjust your system time zone in order to see the effect for yourself.
DateTime.Kind and conversions
As you may already know, as of .NET 2.0, DateTime has a Kind property, of type DateTimeKind - an enum with the following values:
- Local: The DateTime is considered to be in the system time zone. Not an arbitrary "local time in some time zone", but in the specific current system time zone.
- Utc: The DateTime is considered to be in UTC (corollary: it always unambiguously represents an instant in time)
- Unspecified: This means different things in different contexts, but it's a sort of "don't know" kind; this is closer to "local time in some time zone" which is represented as LocalDateTime in Noda Time.
DateTime provides three methods to convert between the kinds:
- ToUniversalTime: if the original kind is Local or Unspecified, convert it from local time to universal time in the system time zone. If the original kind is Utc, this is a no-op.
- ToLocalTime: if the original kind is Utc or Unspecified, convert it from UTC to local time. If the original kind is Local, this is a no-op.
- SpecifyKind: keep the existing date/time, but just change the kind. (So 7am stays as 7am, but it changes the meaning of that 7am effectively.)
(Prior to .NET 2.0, ToUniversalTime and ToLocalTime were already present, but always assumed the original value needed conversion - so if you called x.ToLocalTime().ToLocalTime().ToLocalTime() the result would probably end up with the appropriate offset from UTC being applied three times!)
Of course, none of these methods change the existing value - DateTime is immutable, and a value type - instead, they return a new value.
DateTime's Deep Dark Secret
(The code in this section is presented in several chunks, but it forms a single complete piece of code - later chunks refer to variables in earlier chunks. Put it all together in a Main method to run it.)
Armed with the information in the previous sections, we should be able to make DateTime lose data. If we start with 12:20am UTC and 1:20am UTC on October 28th as DateTimes with a kind of Utc, when we convert them to local time (on a system in the UK time zone) we should get 1:20am in both cases due to the daylight saving transition. Indeed, that works:
var original1 = new DateTime(2012, 10, 28, 0, 20, 0, DateTimeKind.Utc);
var original2 = new DateTime(2012, 10, 28, 1, 20, 0, DateTimeKind.Utc);
var local1 = original1.ToLocalTime();
var local2 = original2.ToLocalTime();
var expected = new DateTime(2012, 10, 28, 1, 20, 0, DateTimeKind.Local);
Console.WriteLine(local1 == expected);
Console.WriteLine(local2 == expected);
Console.WriteLine(local1 == local2);
If we've started with two different values, applied the same operation to both, and ended up with equal values, then we must have lost information, right? That doesn't mean that operation is "bad" any more than "dividing by 2" is bad. You ought to be aware of that information loss, that's all.
So, we ought to be able to demonstrate that information loss further by converting back from local time to universal time. Here we have the opposite problem: from our local time of 1:20am, we have two valid universal times we could convert to - either 12:20am UTC or 1:20am UTC. Both answers would be correct - they are universal times at which the local time would be 1:20am. So which one will get picked? Well... here's the surprising bit:
var roundTrip1 = local1.ToUniversalTime();
var roundTrip2 = local2.ToUniversalTime();
Console.WriteLine(roundTrip1 == original1);
Console.WriteLine(roundTrip2 == original2);
Console.WriteLine(roundTrip1 == roundTrip2);
Somehow, each of the local values knows which universal value it came from. The The information has been recovered, so the reverse conversion round-trips each value back to its original one. How is that possible?
It turns out that DateTime actually has four potential kinds: Local, Utc, Unspecified, and "local but treat it as the earlier option when resolving ambiguity". A DateTime is really just a 64-bit number of ticks, but because the range of DateTime is only January 1st 0001 to December 31st 9999. That range can be represented in 62 bits, leaving 2 bits "spare" to represent the kind. 2 bits gives 4 possible values... the three documented ones and the shadowy extra one.
Through experimentation, I've discovered that the kind is preserved if you perform arithmetic on the value, too... so if you go to another "fall back" DST transition such as October 30th 2011, the ambiguity resolution works the same way as before:
var local3 = local1.AddYears(-1).AddDays(2);
var local4 = local2.AddYears(-1).AddDays(2);
Console.WriteLine(local3.ToUniversalTime().Hour);
Console.WriteLine(local4.ToUniversalTime().Hour);
If you use DateTime.SpecifyKind with DateTimeKind.Local, however, it goes back to the "normal" kind, even though it looks like it should be a no-op:
var local5 = DateTime.SpecifyKind(local1, local1.Kind);
Console.WriteLine(local5.ToUniversalTime().Hour);
Is this correct behaviour? Or should it be a no-op, just like calling ToLocalTime on a "local" DateTime is? (Yes, I've checked - that doesn't lose the information.) It's hard to say, really, as this whole business appears to be undocumented... at least, I haven't seen anything in MSDN about it. (Please add a link in the comments if you find something. The behaviour actually goes against what's documented, as far as I can tell.)
I haven't looked into whether various forms of serialization preserve values like this faithfully, by the way - but you'd have to work hard to reproduce it in non-framework code. You can't explicitly construct a DateTime with the "extra" kind; the only ways I know of to create such a value are via a conversion to local time or through arithmetic on a value which already has the kind. (Admittedly if you're serializing a DateTime with a Kind of Local, you're already on potentially shaky ground, given that you could be deserializing it on a machine with a different system time zone.)
Unkind comparisons
I've misled you a little, I have to admit. In the code above, when I compared the "expected" value with the results of the first conversions, I deliberately specified DateTimeKind.Local in the constructor call. After all, that's the kind we do expect. Well, yes - but I then printed the result of comparing this value with local1 and local2... and those comparisons would have been the same regardless of the kind I'd specified in the constructor.
All comparisons between DateTimes ignore the Kind property. It's not just restricted to equality. So for example, consider this comparison:
var dt1 = new DateTime(2012, 6, 1, 8, 0, 0, DateTimeKind.Utc);
var dt2 = new DateTime(2012, 6, 1, 8, 30, 0, DateTimeKind.Local);
Console.WriteLine(dt1 < dt2);
When viewed in terms of "what instants in time do these both represent?" the answer here is wrong - when you convert both values into the same time zone (in either direction), dt1 occurs after dt2. But a simple look at the properties tells a different story. In practice, I suspect that most comparisons between DateTime values of different kinds involve code which is at best sloppy and is quite possibly broken in a meaningful way.
Of course, if you bring Kind=Unspecified into the picture, it becomes impossible to compare meaningfully in a kind-sensitive way. Is 12am UTC before or after 1am Unspecified? It depends what time zone you later use.
To be clear, it is a hard-to-resolve issue, and one that we don't do terribly well at in Noda Time at the moment for ZonedDateTime. (And even with just LocalDateTime you've got issues between calendars.) This is a situation where providing separate Comparer<T> implementations works nicely - so you can explicitly say what kind of comparison you want.
Conclusions
There's more fun to be had with a similar situation when we look at TimeZoneInfo, but for now, a few lessons:
- Giving a type different "modes" which make it mean fairly significantly different things is likely to cause headaches
- Keeping one of those modes secret (and preventing users from even constructing a value in that mode directly) leads to even more fun and games
- If two instances of your type are considered "equal" but behave differently, you should at least consider whether there's something smelly going on
- There's always more fun to be had with DateTime...