Reimplementing LINQ to Objects: Part 35 - Zip

Zip will be a familiar operator to any readers who use Python. It was introduced in .NET 4 - it's not entirely clear why it wasn't part of the first release of LINQ, to be honest. Perhaps no-one thought of it as a useful operator until it was too late in the release cycle, or perhaps implementing it in the other providers (e.g. LINQ to SQL) took too long. Eric Lippert blogged about it in 2009, and I find it interesting to note that aside from braces, layout and names we've got exactly the same code. (I read the post at the time of course, but implemented it tonight without looking back at what Eric had done.) It's not exactly surprising, given how trivial the implementation is. Anyway, enough chit-chat...

What is it?

Zip has a single signature, which isn't terribly complicated:

public static IEnumerable<TResult> Zip<TFirst, TSecond, TResult>(
    this IEnumerable<TFirst> first,
    IEnumerable<TSecond> second,
    Func<TFirst, TSecond, TResult> resultSelector)

Just from the signature, the name, and experience from the rest of this blog series it should be easy enough to guess what Zip does:

  • It uses deferred execution, not reading from either sequence until the result sequence is read
  • All three parameters must be non-null; this is validated eagerly
  • Both sequences are iterated over "at the same time": it calls GetEnumerator() on each sequence, then moves each iterator forward, then reads from it, and repeats.
  • The result selector is applied to each pair of items obtained in this way, and the result yielded
  • It stops when either sequence terminates
  • As a natural consequence of how the sequences are read, we don't need to perform any buffering: we only care about one element from each sequence at a time.

There are really only two things that I could see might have been designed differently:

  • It could have just returned IEnumerable<Tuple<TFirst, TSecond>> but that would have been less efficient in many cases (in terms of the GC) and inconsistent with the rest of LINQ
  • It could have provided different options for what to do with sequences of differents lengths. For example:
    • Throw an exception
    • Use the default value of the shorter sequence type against the remaining items of the longer sequence
    • Use a specified default value of the shorter sequence in the same way

I don't have any problem with the design that's been chosen here though.

What are we going to test?

There are no really interesting test cases here. We test argument validation, deferred execution, and the obvious "normal" cases. I do have tests where "first" is longer than "second" and vice versa.

The one test case which is noteworthy isn't really present for the sake of testing at all - it's to demonstrate a technique which can occasionally be handy. Sometimes we really want to perform a projection on adjacent pairs of elements. Unfortunately there's no LINQ operator to do this naturally (although it's easy to write one) but Zip can provide a workaround, so long as we don't mind evaluating the sequence twice. (That could be a problem in some cases, but is fine in others.)

Obviously if you just zip a sequence with itself directly you get each element paired with the same one. We effectively need to "shift" or "delay" one sequence somehow. We can do this using Skip, as shown in this test:

[Test]
public void AdjacentElements()
{
    string[] elements = { "a", "b", "c", "d", "e" };
    var query = elements.Zip(elements.Skip(1), (x, y) => x + y);
    query.AssertSequenceEqual("ab", "bc", "cd", "de");
}

It always takes me a little while to work out whether I want to make first skip or second - but if we want the second element as the first element of second (try getting that right ten times in a row - it makes sense, honest!) means that we want to call Skip on the sequence used as the argument for second. Obviously it would work the other way round too - we'd just get the pairs presented with the values switched, so the results of the query above would be "ba", "cb" etc.

Let's implement it!

Guess what? It's yet another operator with a split implementation between the argument validation and the "real work". I'll skip argument validation, and get into the tricky stuff. Are you ready? Sure you don't want another coffee?

private static IEnumerable<TResult> ZipImpl<TFirst, TSecond, TResult>(
    IEnumerable<TFirst> first,
    IEnumerable<TSecond> second,
    Func<TFirst, TSecond, TResult> resultSelector)
{
    using (IEnumerator<TFirst> iterator1 = first.GetEnumerator())
    using (IEnumerator<TSecond> iterator2 = second.GetEnumerator())
    {
        while (iterator1.MoveNext() && iterator2.MoveNext())
        {
            yield return resultSelector(iterator1.Current, iterator2.Current);
        }
    }
}

Okay, so possibly "tricky stuff" was a bit of an overstatement. Just about the only things to note are:

  • I've "stacked" the using statements instead of putting the inner one in braces and indenting it. For using statements with different variable types, this is one way to keep things readable, although it can be a pain when tools try to reformat the code. (Also, I don't usually omit optional braces like this. It does make me feel a bit dirty.)
  • I've used the "symmetric" approach again instead of a using statement with a foreach loop inside it. That wouldn't be hard to do, but it wouldn't be as simple.

That's just about it. The code does exactly what it looks like, which doesn't make for a very interesting blog post, but does make for good readability.

Conclusion

Two operators to go, one of which I might not even tackle fully (AsQueryable - it is part of Queryable rather than Enumerable, after all).

AsEnumerable should be pretty easy...

Published Fri, Jan 14 2011 21:44 by skeet
Filed under: , ,

Comments

# re: Reimplementing LINQ to Objects: Part 35 - Zip

That's another method I implemented in one of our libs which will have to go when we finally move to .NET 4.0. (the other is String.Join(String, IEnumerable<String>) implemented as an extension method). It's reassuring to see I got the same code as you and Eric.

I don't think the using stacking is too dirty, although I understand that you do need to be aware that it only works by virtue of the fact the second using block is regarded as a single statement. It's a shame there's no syntax for declaring multiple variables of differing types in a single using block. It's either using stacking or code creeping across your screen.

Another way might be a type like CompositeDisposable<T1, T2>, although that's probably overkill.

Friday, January 14, 2011 4:01 PM by Alex Humphrey

# re: Reimplementing LINQ to Objects: Part 35 - Zip

C# doesn't allow you to do "using x=E1, y=E2"? VB does.

Zip is the main operator I wish worked in query expressions. In particular, the naming is a lot worse with lambdas than it would be with some nice behind-the-scenes anonymous types.

IEnumerable<T> Step<T>(this IEnumerable<T> sequence, int step)

{

   return from e in sequence

          zip i in naturals

          where i % step == 0

          select e

}

Friday, January 14, 2011 4:29 PM by Strilanc

# re: Reimplementing LINQ to Objects: Part 35 - Zip

@Strilanc: C# allows multiple variables to be declared in one using statement, but only if they're of the same type.

Friday, January 14, 2011 5:18 PM by skeet

# re: Reimplementing LINQ to Objects: Part 35 - Zip

From memory, Visual Studio knows what's going on and will *unindent* to that formatting for consecutive usings.

Also, I'm not sure how I feel about the different length behavior. This way does make stuff easier, but the BCL has tended towards ensuring you know what's going on.

Friday, January 14, 2011 11:46 PM by Simon Buchan

# re: Reimplementing LINQ to Objects: Part 35 - Zip

@Simon: I'm not quite sure what you mean about the "different length behaviour" - could you clarify?

Saturday, January 15, 2011 1:43 AM by skeet

# re: Reimplementing LINQ to Objects: Part 35 - Zip

@skeet: Sorry, I was a little unclear, the documented behavior of Zip when the sequences have different lengths. Not referencing your implementation.

Saturday, January 15, 2011 2:21 AM by Simon Buchan

# re: Reimplementing LINQ to Objects: Part 35 - Zip

I was going to write a comment but Simon already wrote it.

VS treats "stacked" usings very nicely. And the different length behaviour is surprising and would cause loss of information, which isn't a good combination. What's wrong with getting default(T) for the rest?

Saturday, January 15, 2011 1:33 PM by configurator

# re: Reimplementing LINQ to Objects: Part 35 - Zip

While they didn't release Zip in the original release of LINQ. They included it as a "Custom Operator" in the samples. However it had a different name (Combine).

msdn.microsoft.com/.../aa336749

Monday, January 17, 2011 9:05 AM by James Miles

# re: Reimplementing LINQ to Objects: Part 35 - Zip

Zip done in LinqBridge compatible terms for VB as

Sub Main

dim elements() as string={"a","b","c","d","e"}

dim query = zipbyindex(elements,elements.Skip(1), Function (x,y) x+y)

'query.AssertSequenceEqual("ab", "bc", "cd", "de")

query.Dump()

End Sub

' Define other methods and classes here

Public Function ZipByIndex(Of TFirst, TSecond, TResult)(ByVal first As IEnumerable(Of TFirst), _

ByVal second As IEnumerable(Of TSecond), _

ByVal resultSelector As Func(Of TFirst, TSecond, TResult) _

) As IEnumerable(Of TResult)

Dim q = From f In first.Select(Function(item1, index) New With {index, item1}) _

Join s In second.Select(Function(item2, index) New With {index, item2}) _

On f.index Equals s.index _

Select resultSelector(f.item1, s.item2)

Return q

End Function

Where there other tests to pass or run? or any obvious issues you see with this?

Tuesday, February 01, 2011 4:39 PM by Brandon Dimperio

# re: Reimplementing LINQ to Objects: Part 35 - Zip

@Brandon: Using a join means the second sequence is completely consumed. Zip itself can be used with two infinite sequences - the result will be infinite too, but with no significant memory usage.

Tuesday, February 01, 2011 4:50 PM by skeet

# re: Reimplementing LINQ to Objects: Part 35 - Zip

alright, will note/comment that the work-around has scalability issues, anything else obvious or troubling (besides my saying Were instead of where above)?

Tuesday, February 01, 2011 7:17 PM by Brandon Dimperio

# re: Reimplementing LINQ to Objects: Part 35 - Zip

@Brandon: Changing the model of how it's read fundamentally changes the operator IMO. Given that Zip is relatively easy to implement "normally", why not just do that?

Tuesday, February 01, 2011 7:28 PM by skeet

# re: Reimplementing LINQ to Objects: Part 35 - Zip

Well I was trying to get this hack to work in 2.0 with linq bridge to make it pretty and do more than one zip in the same query a little less ugly.

solutionizing.net/.../hacking-linq-expressions-select-with-index

and had no luck since IQueryable has to be one of the extension methods for GetIndex to work.

Tuesday, February 01, 2011 8:31 PM by Brandon Dimperio

# re: Reimplementing LINQ to Objects: Part 35 - Zip

Is this a typo: "I do have tests where first is longer than short"?

Should it be "first is longer than second"?

Tuesday, March 15, 2011 11:58 AM by Dennis Palmer

# re: Reimplementing LINQ to Objects: Part 35 - Zip

@Dennis: Yup, thanks - fixing now.

Tuesday, March 15, 2011 1:09 PM by skeet

# re: Reimplementing LINQ to Objects: Part 35 - Zip

Thank you for this article! I could easily expand on it to create a zip for three sequences. :) Just wanted to point out that you forgot the 'this' of the first parameter.

Monday, April 04, 2011 5:06 PM by Steven Jeuris

# re: Reimplementing LINQ to Objects: Part 35 - Zip

@Steven: I didn't - the code I've shown is the "impl" method which would be *called* by the actual extension method, after performing argument validation.

I deliberately left out the extension method itself as it's tedious and exactly what you'd expect.

Tuesday, April 05, 2011 12:27 AM by skeet

# re: Reimplementing LINQ to Objects: Part 35 - Zip

@Jon: Oh, I missed that. :) I did argument checking using code contracts, so I placed everything directly in the extension method. Thanks for pointing it out.

Tuesday, April 05, 2011 4:54 AM by Steven Jeuris