Reimplementing LINQ to Objects: Part 20 - ToList

This morning I started writing the tests for GroupBy, prior to implementing it. That turned out to be a pain - really I wanted an easy way of getting at each element of the result (i.e. each result group). If only I had the ability to convert an arbitrary sequence into a query... I needed ToList. So, we enter a fairly brief diversion.

What is it?

ToList has a single, simple overload:

public static List<TSource> ToList<TSource>(this IEnumerable<TSource> source)

Fairly obviously, ToList converts the source sequence into a list. Some points to note:

  • The signature specifies List<T>, not just IList<T>. Of course it could return a subclass of List<T>, but there seems little point.
  • It uses immediate execution - nothing is deferred here
  • The parameter (source) musn't be null
  • It's optimized for the case when source implements ICollection<T>
  • It always creates a new, independent list.

The last two points are worth a bit more discussion. Firstly, the optimization for ICollection<T> isn't documented, but it makes a lot of sense:

  • List<T> stores its data in an array internally
  • ICollection<T> exposes a Count property so the List<T> can create an array of exactly the right size to start with
  • ICollection<T> exposes a CopyTo method so that the List<T> can copy all the elements into the newly created array in bulk

ToList always creates a new list for consistency. If it just returned the source parameter directly if it was already a List<T>, that would mean that changes to the source after calling ToList would sometimes be visible in the returned list and sometimes not... making it harder to reason about any code which used ToList.

What are we going to test?

I have tests for the bullet points listed above, and one extra test just to prove that it can work with lazily evaluated sequences as well as simple collections like arrays. (The test uses a range followed by a projection.)

In order to test the optimization for ICollection<T>, I've implemented another collection like NonEnumerableList, but this time just NonEnumerableCollection. Again, this just delegates all ICollection<T> operations to a backing List<T>, but throws an exception if you try to call GetEnumerator(). The test then just calls ToList on a NonEnumerableCollection: as no exception is thrown, that proves that either the operation is optimized as we'd expect, or the exception is being swallowed. I think it's reasonable to assume that exceptions aren't swallowed in LINQ to Objects :)

Let's implement it!

This will probably be the simplest implementation in the whole of Edulinq:

public static List<TSource> ToList<TSource>(this IEnumerable<TSource> source)
{
    if (source == null)
    {
        throw new ArgumentNullException("source");
    }
    return new List<TSource>(source);
}

You may well be wondering what happened to the optimization... well, it's in the List<T> constructor. We just get it for free. Unfortunately that's not documented either... so we end up with an implementation which implements one undocumented optimization if List<T> implements another undocumented optimization :)

We can't actually do much better than that - we can't use ICollection<T>.CopyTo ourselves, as we don't have access to the underlying array of List<T>. We could perform some optimization by calling the List<T> constructor which specifies a capacity, and then call AddRange. That would at least prevent the list from having to resize itself, but it would still need to iterate over the whole collection instead of using the (potentially very fast) CopyTo method.

Conclusion

You may be wondering why we even need ToList, if we could just create a list by calling the constructor directly. The difference is that in order to call a constructor, you need to specify the element type as the type argument. When we use ToList, we can take advantage of type inference. In many cases this is just extremely convenient, but for anonymous types it's actually required. How can you end up with a strongly typed list of an anonymous type? It's easy with ToList, like this:

var query = Enumerable.Range(0, 10)
                      .Select(x => new { Value = x, Doubled = x * 2 });
        
var list = query.ToList();

Try doing that without an extension method or something similar. It's worth noting at this point that although there are similar methods for arrays and Dictionary, there's no equivalent for HashSet. It's incredibly easy to write, of course, and an obvious extension to LINQ to Objects - but it's not in the standard library. Maybe for .NET 5...

So, now that we've got ToList sorted, I can get back to GroupBy and its eight overloads - easy to implement, but hard to test simply and hard to describe clearly. Lucky me.

Published Sat, Jan 1 2011 13:17 by skeet
Filed under: , ,

Comments

# re: Reimplementing LINQ to Objects: Part 20 - ToList

It's not that a big deal that a change to the source affects the result -- after all most of LINQ methods have this property too, strictly speaking.

The big deal is that a change to the result affects the source. That is a big no-no here.

Saturday, January 01, 2011 9:41 AM by Mihailik

# re: Reimplementing LINQ to Objects: Part 20 - ToList

@Mihailik: While I certainly agree that changing the result shouldn't change the source either, I disagree with your claim that source change => result change isn't a big deal. While it's true that some other LINQ operators behave like this, that's *only* the ones which support deferred execution. Everything which uses *immediate* execution returns a result which is then independent of the original.

Saturday, January 01, 2011 10:03 AM by skeet