Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-10022 Improvements to analytic rank pushdown to top-N
  3. IMPALA-10025

Avoid rebuilding in-memory heap during output phase of top-n

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Invalid
    • None
    • None
    • Backend
    • None
    • ghx-label-9

    Description

      In the patch for IMPALA-9853, we reuse some code in the output phase that necessitated building the in-memory heap from the sorter's output. This has some inherent overhead that gets worse for larger limits and/or partition counts.

      It would be better to have the sorter do a full sort on partition/order by columns and then apply the limit while streaming the results back from the sorter. In combination with IMPALA-10023 this would let us gracefully degrade to doing something closer to a regular sort and probably let us bump ANALYTIC_PUSHDOWN_THRESHOLD.

      Attachments

        Activity

          People

            tarmstrong Tim Armstrong
            tarmstrong Tim Armstrong
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: