Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-13333

Curb memory estimation for SORT node

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Frontend
    • None
    • ghx-label-6

    Description

      High cardinality overestimation can lead to severe memory overestimation for SORT node, even in Parallel Plan. TPC-DS Q31 and Q51 plan against synthetic 3TB scale workload shows such huge overestimation:

      https://github.com/apache/impala/blob/ae6a3b9ec058dfea4b4f93d4828761f792f0b55e/testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q31.test#L1319-L1323

      https://github.com/apache/impala/blob/ae6a3b9ec058dfea4b4f93d4828761f792f0b55e/testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q51.test#L511-L515

      Planner should be aware to not estimate terabytes/petabytes of memory for SORT node, knowing that SORT node has ability to spill-to-disk under memory pressure. Planner can also take account for SORT_RUN_BYTES_LIMIT or MAX_SORT_RUN_SIZE option value to come up with lower memory estimate.

      Attachments

        Activity

          People

            Unassigned Unassigned
            rizaon Riza Suminto
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: