Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-12657

Improve ProcessingCost of ScanNode and NonGroupingAggregator

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • Impala 4.3.0
    • Impala 4.4.0
    • Frontend
    • None

    Description

      Several benchmark run measuring Impala scan performance indicates some costing improvement opportunity around ScanNode and NonGroupingAggregator.

      profile_1f4d7a679a3e12d5_4223115700000000.txt shows an example of simple count query.

      Key takeaway:

      1. There is a strong correlation between total materialized bytes (row-size * cardinality) with total materialized tuple time per fragment. Row materialization cost should be adjusted to be based on this row-sized instead of equal cost per scan range.
      2. NonGroupingAggregator should have much lower cost that GroupingAggregator. In example above, the cost of NonGroupingAggregator dominates the scan fragment even though it only does simple counting instead of hash table operation.

      Attachments

        Issue Links

          Activity

            People

              drorke David Rorke
              rizaon Riza Suminto
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: