Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Invalid
-
None
-
None
-
None
-
ghx-label-9
Description
In the patch for IMPALA-9853, we reuse some code in the output phase that necessitated building the in-memory heap from the sorter's output. This has some inherent overhead that gets worse for larger limits and/or partition counts.
It would be better to have the sorter do a full sort on partition/order by columns and then apply the limit while streaming the results back from the sorter. In combination with IMPALA-10023 this would let us gracefully degrade to doing something closer to a regular sort and probably let us bump ANALYTIC_PUSHDOWN_THRESHOLD.