Reimplementing LINQ to Objects: Part 3 - "Select" (and a rename...)
It's been a long time since I wrote part 1 and part 2 of this blog series, but hopefully things will move a bit more quickly now.
The main step forward is that the project now has a source repository on Google Code instead of just being a zip file on each blog post. I had to give the project a title at that point, and I've chosen Edulinq, hopefully for obvious reasons. I've changed the namespaces etc in the code, and the blog tag for the series is now Edulinq too. Anyway, enough of the preamble... let's get on with reimplementing LINQ, this time with the Select operator.
What is it?
Like Where, Select has two overloads:
public static IEnumerable<TResult> Select<TSource, TResult>(
this IEnumerable<TSource> source,
Func<TSource, TResult> selector)
public static IEnumerable<TResult> Select<TSource, TResult>(
this IEnumerable<TSource> source,
Func<TSource, int, TResult> selector)
Again, they both operate the same way - but the second overload allows the index into the sequence to be used as part of the projection.
Simple stuff first: the method projects one sequence to another: the "selector" delegate is applied to each input element in turn, to yield an output element. Behaviour notes, which are exactly the same as Where (to the extent that I cut and paste these from the previous blog post, and just tweaked them):
- The input sequence is not modified in any way.
- The method uses deferred execution - until you start trying to fetch items from the output sequence, it won't start fetching items from the input sequence.
- Despite deferred execution, it will validate that the parameters aren't null immediately.
- It streams its results: it only ever needs to look at one result at a time.
- It will iterate over the input sequence exactly once each time you iterate over the output sequence.
- The "selector" function is called exactly once per yielded value.
- Disposing of an iterator over the output sequence will dispose of the corresponding iterator over the input sequence.
What are we going to test?
The tests are very much like those for Where - except that in cases where we tested the filtering aspect of Where, we're now testing the projection aspect of Select.
There are a few tests of some interest. Firstly, you can tell that the method is generic with 2 type parameters instead of 1 - it has type parameters of TSource and TResult. They're fairly self-explanatory, but it means it's worth having a test for the case where the type arguments are different - such as converting an int to a string:
[Test]
public void SimpleProjectionToDifferentType()
{
int[] source = { 1, 5, 2 };
var result = source.Select(x => x.ToString());
result.AssertSequenceEqual("1", "5", "2");
}
Secondly, I have a test that shows what sort of bizarre situations you can get into if you include side effects in your query. We could have done this with Where as well of course, but it's clearer with Select:
[Test]
public void SideEffectsInProjection()
{
int[] source = new int[3];
int count = 0;
var query = source.Select(x => count++);
query.AssertSequenceEqual(0, 1, 2);
query.AssertSequenceEqual(3, 4, 5);
count = 10;
query.AssertSequenceEqual(10, 11, 12);
}
Notice how we're only calling Select once, but the results of iterating over the results change each time - because the "count" variable has been captured, and is being modified within the projection. Please don't do things like this.
Thirdly, we can now write query expressions which include both "select" and "where" clauses:
[Test]
public void WhereAndSelect()
{
int[] source = { 1, 3, 4, 2, 8, 1 };
var result = from x in source
where x < 4
select x * 2;
result.AssertSequenceEqual(2, 6, 4, 2);
}
There's nothing mind-blowing about any of this, of course - hopefully if you've used LINQ to Objects at all, this should all feel very comfortable and familiar.
Let's implement it!
Surprise surprise, we go about implementing Select in much the same way as Where. Again, I simply copied the implementation file and tweaked it a little - the two methods really are that similar. In particular:
- We're using iterator blocks to make it easy to return sequences
- The semantics of iterator blocks mean that we have to separate the argument validation from the real work. (Since I wrote the previous post, I've learned that VB11 will have anonymous iterators, which will avoid this problem. Sigh. It just feels wrong to envy VB users, but I'll learn to live with it.)
- We're using foreach within the iterator blocks to make sure that we dispose of the input sequence iterator appropriately - so long as our output sequence iterator is disposed or we run out of input elements, of course.
I'll skip straight to the code, as it's all so similar to Where. It's also not worth showing you the version with an index - because it's such a trivial difference.
public static IEnumerable<TResult> Select<TSource, TResult>(
this IEnumerable<TSource> source,
Func<TSource, TResult> selector)
{
if (source == null)
{
throw new ArgumentNullException("source");
}
if (selector == null)
{
throw new ArgumentNullException("selector");
}
return SelectImpl(source, selector);
}
private static IEnumerable<TResult> SelectImpl<TSource, TResult>(
this IEnumerable<TSource> source,
Func<TSource, TResult> selector)
{
foreach (TSource item in source)
{
yield return selector(item);
}
}
Simple, isn't it? Again, the real "work" method is even shorter than the argument validation.
Conclusion
While I don't generally like boring my readers (which may come as a surprise to some of you) this was a pretty humdrum post, I'll admit. I've emphasized "just like Where" several times to the point of tedium very deliberately though - because it makes it abundantly clear that there aren't really as many tricky bits to understand as you might expect.
Something slightly different next time (which I hope will be in the next few days). I'm not quite sure what yet, but there's an awful lot of methods still to choose from...