Reimplementing LINQ to Objects: Part 26c - Optimizing OrderedEnumerable

Part 26b left us with a working implementation of the ordering operators, with two caveats:

  • The sort algorithm used was awful
  • We were performing the key selection on every comparison, instead of once to start with

Today's post is just going to fix the first bullet - although I'm pretty sure that fixing the second will require changing it again completely.

Choosing a sort algorithm

There are lots of sort algorithms available. In our case, we need the eventual algorithm to:

  • Work on arbitrary pair-based comparisons
  • Be stable
  • Go like the clappers :)
  • (Ideally) allow the first results to be yielded without performing all the sorting work, and without affecting the performance in cases where we do need all the results.

The final bullet it an interesting one to me: it's far from unheard of to want to get the "top 3" results from an ordered query. In LINQ to Objects we can't easily tell the Take operator about the OrderBy operator so that it could pass on the information, but we can potentially yield the first results before we've sorted everything. (In fact, we could add an extra interface specifically to enable this scenario, but it's not part of normal LINQ to Objects, and could introduce horrible performance effects with innocent-looking query changes.)

If we decide to implement sorting in terms of a naturally stable algorithm, that limits the choices significantly. I was rather interested in timsort, and may one day set about implementing it - but it looked far too complicated to introduce just for the sake of Edulinq.

The best bet seemed to be merge sort, which is reasonably easy to implement and has reasonable efficiency too. It requires extra memory and a fair amount of copying, but we can probably cope with that.

We don't have to use a stable sort, of course. We could easily regard our "key" as the user-specified key plus the original index, and use that index as a final tie-breaker when comparing elements. That gives a stable result while allowing us to use any sorting algorithm we want. This may well be the approach I take eventually - especially as quicksort would allow us to start yielding results early in a fairly simple fashion. For the moment though, I'll stick with merge sort.

Preparing for merge sort

Just looking from the algorithm for merge sort, it's obvious that there will be a good deal of shuffling data around. As we want to make the implementation as fast as possible, that means it makes sense to use arrays to store the data. We don't need dynamic space allocation (after we've read all the data in, anyway) or any of the other features associated with higher-level collections. I'm aware that arrays are considered (somewhat) harmful, but purely for the internals of an algorithm which does so much data access, I believe they're the most appropriate solution.

We don't even need our arrays to be the right size - assuming we need to read in all the data before we start processing it (which will be true for this implementation of merge sort, but not for some other algorithms I may consider in the future) it's fine to use an oversized array as temporary storage - it's never going to be seen by the users, after all.

We've already got code which reads in all the data into a possibly-oversized array though - in the optimized ToArray code. So my first step was to extract out that functionality into a new internal extension method. This has to return a buffer containing all the data and give us an indication of the size. In .NET 4 I could use Tuple to return both pieces of data, but we can also just use an out parameter - I've gone for the latter approach at the moment. Here's the ToBuffer extension method:

internal static TSource[] ToBuffer<TSource>(this IEnumerable<TSource> source, out int count)
{
    // Optimize for ICollection<T>
    ICollection<TSource> collection = source as ICollection<TSource>;
    if (collection != null)
    {
        count = collection.Count;
        TSource[] tmp = new TSource[count];
        collection.CopyTo(tmp, 0);
        return tmp;
    }

    // We'll have to loop through, creating and copying arrays as we go
    TSource[] ret = new TSource[16];
    int tmpCount = 0;
    foreach (TSource item in source)
    {
        // Need to expand...
        if (tmpCount == ret.Length)
        {
            Array.Resize(ref ret, ret.Length * 2);
        }
        ret[tmpCount++] = item;
    }
    count = tmpCount;
    return ret;
}

Note that I've used a local variable to keep track of the count in the loop near the end, only copying it into the output variable just before returning. This is due to a possibly-unfounded performance concern: we don't know where the variable will actually "live" in storage - and I'd rather not cause some arbitrary page of heap memory to be required all the way through the loop. This is a gross case of micro-optimization without evidence, and I'm tempted to remove it... but I thought I'd at least share my thinking.

This is only an internal API, so I'm trusting callers not to pass me a null "source" reference. It's possible that it would be a useful operator to expose at some point, but not just now. (If it were public, I would definitely use a local variable in the loop - otherwise callers could get weird effects by passing in a variable which could be changed elsewhere - such as due to side-effects within the loop. That's a totally avoidable problem, simply by using a local variable. For an internal API, I just need to make sure that I don't do anything so silly.)

Now ToArray needs to be changed to call ToBuffer, which is straightforward:

public static TSource[] ToArray<TSource>(this IEnumerable<TSource> source)
{
    if (source == null)
    {
        throw new ArgumentNullException("source");
    }
    int count;
    TSource[] ret = source.ToBuffer(out count);
    // Now create another copy if we have to, in order to get an array of the
    // right size
    if (count != ret.Length)
    {
        Array.Resize(ref ret, count);
    }
    return ret;
}

then we can prepare our OrderedEnumerable.GetEnumerator method for merging:

public IEnumerator<TElement> GetEnumerator()
{
    // First copy the elements into an array: don't bother with a list, as we
    // want to use arrays for all the swapping around.
    int count;
    TElement[] data = source.ToBuffer(out count);
    TElement[] tmp = new TElement[count];
            
    MergeSort(data, tmp, 0, count - 1);
    for (int i = 0; i < count; i++)
    {
        yield return data[i];
    }
}

The "tmp" array is for use when merging - while there is an in-place merge sort, it's more complex than the version where the "merge" step merges two sorted lists into a combined sorted list in temporary storage, then copies it back into the original list.

The arguments of 0 and count - 1 indicate that we want to sort the whole list - the parameters to my MergeSort method take the "left" and "right" boundaries of the sublist to sort - both of which are inclusive. Most of the time I'm more used to using exclusive upper bounds, but all the algorithm descriptions I found used inclusive upper bounds - so it made it easier to stick with that than try to "fix" the algorithm to use exclusive upper bounds everywhere. I think it highly unlikely that I'd get it all right without any off-by-one errors :)

Now all we've got to do is write an appropriate MergeSort method, and we're done.

Implementing MergeSort

I won't go through the details of how a merge sort works - read the wikipedia article for a pretty good description. In brief though, the MergeSort method guarantees that it will leave the specified portion of the input data sorted. It does this by splitting that section in half, and recursively merge sorting each half. It then merges the two halves by walking along two cursors (one from the start of each subsection) finding the smallest element out of the two at each point, copying that element into the temporary array and advancing just that cursor. When it's finished, the temporary storage will contain the sorted section, and it's copied back to the "main" array. The recursion has to stop at some point, of course - and in my implementation it stops if the section has fewer than three elements.

Here's the MergeSort method itself first:

// Note: right is *inclusive*
private void MergeSort(TElement[] data, TElement[] tmp, int left, int right)
{
    if (right > left)
    {
        if (right == left + 1)
        {
            TElement leftElement = data[left];
            TElement rightElement = data[right];
            if (currentComparer.Compare(leftElement, rightElement) > 0)
            {
                data[left] = rightElement;
                data[right] = leftElement;
            }
        }
        else
        {
            int mid = left + (right - left) / 2;
            MergeSort(data, tmp, left, mid);
            MergeSort(data, tmp, mid + 1, right);
            Merge(data, tmp, left, mid + 1, right);
        }
    }
}

The test for "right > left" is part of a vanilla merge sort (if the section either has one element or none, we don't need to take any action), but I've optimized the common case of only two elements. All we need to do is swap the elements - and even then we only need to do so if they're currently in the wrong order. There's no point in setting up all the guff of the two cursors - or even have the slight overhead of a method call - for that situation.

Other than that one twist, this is a pretty standard merge sort. Now for the Merge method, which is slightly more complicated (although still reasonably straighforward):

private void Merge(TElement[] data, TElement[] tmp, int left, int mid, int right)
{
    int leftCursor = left;
    int rightCursor = mid;
    int tmpCursor = left;
    TElement leftElement = data[leftCursor];
    TElement rightElement = data[rightCursor];
    // By never merging empty lists, we know we'll always have valid starting points
    while (true)
    {
        // When equal, use the left element to achieve stability
        if (currentComparer.Compare(leftElement, rightElement) <= 0)
        {
            tmp[tmpCursor++] = leftElement;
            leftCursor++;
            if (leftCursor < mid)
            {
                leftElement = data[leftCursor];
            }
            else
            {
                // Only the right list is still active. Therefore tmpCursor must equal rightCursor,
                // so there's no point in copying the right list to tmp and back again. Just copy
                // the already-sorted bits back into data.
                Array.Copy(tmp, left, data, left, tmpCursor - left);
                return;
            }
        }
        else
        {
            tmp[tmpCursor++] = rightElement;
            rightCursor++;
            if (rightCursor <= right)
            {
                rightElement = data[rightCursor];
            }
            else
            {
                // Only the left list is still active. Therefore we can copy the remainder of
                // the left list directly to the appropriate place in data, and then copy the
                // appropriate portion of tmp back.
                Array.Copy(data, leftCursor, data, tmpCursor, mid - leftCursor);
                Array.Copy(tmp, left, data, left, tmpCursor - left);
                return;
            }
        }
    }
}

Here, "mid" is the exclusive upper bound of the left subsection, and the inclusive lower bound of the right subsection... whereas "right" is the inclusive upper bound of the right subsection. Again, it's possible that this is worth tidying up at some point to be more consistent, but it's not too bad.

This time there's a little bit more special-casing. We take the approach that whichever sequence runs out first (which we can detect as soon as the "currently advancing" cursor hits its boundary), we can optimize what still has to be copied. If the "left" sequence runs out first, then we know the remainder of the "right" sequence must already be in the correct place - so all we have to do is copy as far as we've written with tmpCursor back from the temporary array to the main array.

If the "right" sequence runs out first, then we can copy the rest of the "left" sequence directly into the right place (at the end of the section) and then again copy just what's needed from the temporary array back to the main array.

This is as fast as I've managed to get it so far (without delving into too many of the more complicated optimizations available) - and I'm reasonably pleased with it. I have no doubt that it could be improved significantly, but I didn't want to spend too much effort on it when I knew I'd be adapting everything for the key projection difficulty anyway.

Testing

I confess I don't know the best way to test sorting algorithms. I have two sets of tests here:

  • A new project (MergeSortTest) where I actually implemented the sort before integrating it into OrderedEnumerable
  • All my existing OrderBy (etc) tests

The new project also acts as a sort of benchmark - although it's pretty unscientific, and the key projection issue means the .NET implementation isn't really comparable with the Edulinq one at the moment. Still, it's a good indication of very roughly how well the implementation is doing. (It varies, interestingly enough... on my main laptop, it's about 80% slower than LINQ to Objects; on my netbook it's only about 5% slower. Odd, eh?) The new project sorts a range of sizes of input data, against a range of domain sizes (so with a small domain but a large size you're bound to get equal elements - this helps to verify stability). The values which get sorted are actually doubles, but we only sort based on the integer part - so if the input sequence is 1.3, 3.5, 6.3, 3.1 then we should get an output sequence of 1.3, 3.5, 3.1, 6.3 - the 3.5 and 3.1 are in that order due to stability, as they compare equal under the custom comparer. (I'm performing the "integer only" part using a custom comparer, but we could equally have used OrderBy(x => (int) x)).

Conclusion

One problem (temporarily) down, one to go. I'm afraid that the code in part 26d is likely to end up being pretty messy in terms of generics - and even then I'm likely to talk about rather more options than I actually get round to coding.

Still, our simplistic model of OrderedEnumerable has served us well for the time being. Hopefully it's proved more useful educationally this way - I suspect that if I'd dived into the final code right from the start, we'd all end up with a big headache.

Published Thu, Jan 6 2011 19:04 by skeet
Filed under: , ,

Comments

# re: Reimplementing LINQ to Objects: Part 26c - Optimizing OrderedEnumerable

Not to nitpick, but your "new internal extension method" is public...

I'm rather surprised you went with mergesort and not quicksort after talking about yielding the first items as soon as possible. I'm not sure why but quicksort is a de-facto standard and used in quite a lot of places (though timsort is likely better, but not as widely used because it is newer) - I'd go with that by default, and adding your extra yielding requirement would seal the deal I think.

Thursday, January 06, 2011 3:12 PM by configurator

# re: Reimplementing LINQ to Objects: Part 26c - Optimizing OrderedEnumerable

@configurator: Doh! Will fix the public bit in a minute :)

I went for merge sort because it was naturally stable without any extra effort. With a bit more work, however (sorting the indexes rather than elements) I've now used quicksort for my next implementation. It *currently* still waits until it's finished, but that's mostly because I wanted to get it working first. I suspect I won't get it written up tonight, but the code is working. I'll try to commit it soon.

Thursday, January 06, 2011 3:23 PM by skeet

# re: Reimplementing LINQ to Objects: Part 26c - Optimizing OrderedEnumerable

Ahh, sorting! Who *hasn't* written their own terrible sort implementation? Bad programmers, that's who!

Funny, I do the copy local to out param as well, but I do it as a correctness helper, as well as a micro-optimization. I find it easier to reason about the behavior of a single assignment in the face of multi-threading and reentrancy, as well as ensuring that I remember that that name is an out parameter, so I don't "simplify" an assignment at the bottom of a method. I believe the optimization is less about keeping a page in memory (a possible consideration for an out param in, say, a recursive sort method, since it could potentially cause thrashing) and more simplifying the generated code - a local can be enregistered and not need any load/store instructions (more the machine instructions than MSIL).

Also, I'm totally going to have to implement timsort now :/.

Thursday, January 06, 2011 10:37 PM by Simon Buchan

# re: Reimplementing LINQ to Objects: Part 26c - Optimizing OrderedEnumerable

From what I understand about the List<T>.Sort method, it uses the Quicksort algorithm.  If you weren't interested in yielding sorted elements as you iterate, then you can simplify the implementation of GetEnumerator() for OrderedEnumerable to:

List<T> list = new List<T>(this.source);

list.Sort(this.comparer);

return list.GetEnumerator();

The other thing that concerns me is that yielding results while sorting may be a bigger performance drag on smaller collections.  That is, if you had a list of 10 elements, then it would probably be better just to sort the list and return the IEnumerator object.  I'm sure this is also dependent on the complexity of the IComparer operations as well, since you can potentially have several objects within objects due to the Decorator pattern that is employed there.

Friday, January 07, 2011 9:21 AM by swiftfoxmark2

# re: Reimplementing LINQ to Objects: Part 26c - Optimizing OrderedEnumerable

@swiftfoxmark2: List<T>.Sort is unstable, so your proposed code would fail on that front. It's fixable of course, using an extra "hidden" key of the original index within the collection.

Yielding results while sorting *could* potentially be a performance problem due to locality of reference... on the other hand, it can certainly be a big win. I'll talk about this in the next post (as I've now implemented early yielding).

Friday, January 07, 2011 9:30 AM by skeet

# re: Reimplementing LINQ to Objects: Part 26c - Optimizing OrderedEnumerable

@skeet: Yeah, I guess having an unstable sort in a query is a big deal in many cases.  Especially considering that it is key-based sorting rather than element-based.

I view the potential performance problem is only a theory based on my own experiences.  But I'll just wait for your next post.

By the way, the Timsort looks like the best option for this LINQ operator because it minimizes the number of comparisons, which is potentially huge considering the IComparer decorator objects.  Of course, I understand why you didn't implement it here.  Still, it would be nice to see a .NET version of it.

Friday, January 07, 2011 12:27 PM by swiftfoxmark2

# re: Reimplementing LINQ to Objects: Part 26c - Optimizing OrderedEnumerable

Testing sorting is trivial with pex. Here is how it could be done:

[PexMethod]

public void TestSort(int[] items)

{

MergeSort.Sort(items);

Assert.IsTrue(items.IsSorted());

}

Running this test will try out all relevant items sequences that could potentially fail the assert. I really encourage you to check it out. It is very easy to get started. I love pex.

Friday, January 07, 2011 3:22 PM by tobi

# re: Reimplementing LINQ to Objects: Part 26c - Optimizing OrderedEnumerable

@tobi: While there are certainly plenty of things like Pex that I'd like to try out, I'm currently somewhat busy with Edulinq itself.

If you fancy writing Pex tests (perhaps in a separate project?) and contributing a patch, I'm sure that would be interesting to many people, myself included.

Friday, January 07, 2011 3:33 PM by skeet