In the sorter, we codegen the comparator function but call it indirectly via a function pointer. We should consider codegening the perf-critical loops so that we can make the comparator function call direct and inlinable. Inlining the comparison will be very beneficial if it is trivial, e.g. order by a numeric column: I expect sorts on simple keys will get noticably faster.
We should also be able to get rid of FreeLocalAllocations() calls for most comparators, although I'm not sure what the best way to approach that is.
The Partition() loop is the most perf-critical, followed by InsertionSort().
We also don't do this yet for the TopN node, see
While evaluating Sort performance I noticed that the codegened compare function is not inlined which results in large overhead per row.
Expected speedup is 10-15%