I've been thinking a little bit about this ticket. One of this nice things it provides is a the ability to re-sort a set following a join. So we could innerJoin->sort-rollup, which is a key use case. We can also innerJoin->sort->innerJoin which is also a key use case.
I did a quick test to see how many random strings could be sorted per-second. I used the Random class to pick random longs and turned the longs into Strings for the test set.
I was seeing sort times of 1 second for 1.5 million random strings, using Collections.sort().
So with 50 workers that translates to roughly 75 million records per second.
With fork/join merge sort we should be able to scale nearly linearly until we hit the number of processors on the server. This is because of the tight memory locality of sorting, which won't saturate the memory bus. So with 8 threads we can expect to sort close to 12 million records per second on each worker. Now we're talking some big numbers. With 50 workers we'd be sorting 600,000,000 records per-second.
What's nice about the fork/join is it gives us two levels of parallelism. We get the first level a of parallelism by having multiple workers and then we get the second level by threading. I see some very fast operations following joins in the future.