Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-13405

Lower AggregationNode cardinality by analyzing estimate of source Tuple

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • Impala 4.4.0
    • Impala 4.5.0
    • Frontend
    • None

    Description

      If an aggregation node has multiple grouping expressions that originate from the same tuple, then their combined NDV must not exceed output cardinality of PlanNode producing that tuple. Take example of this PARALLELPLANS from Q31.

       

      |  11:AGGREGATE [STREAMING]
      |  |  output: sum(ss_ext_sales_price)
      |  |  group by: ca_county, d_qoy, d_year
      |  |  mem-estimate=84.55MB mem-reservation=34.00MB spill-buffer=2.00MB thread-reservation=0
      |  |  tuple-ids=8 row-size=50B cardinality=1.43M cost=1948896250
      |  |  in pipelines: 06(GETNEXT)
      
      ....
      
      |  |  07:SCAN HDFS [tpcds_partitioned_parquet_snap.date_dim, RANDOM]
      |  |     HDFS partitions=1/1 files=1 size=2.17MB
      |  |     predicates: tpcds_partitioned_parquet_snap.date_dim.d_year = CAST(1998 AS INT), tpcds_partitioned_parquet_snap.date_dim.d_qoy = CAST(2 AS INT)
      |  |     stored statistics:
      |  |       table: rows=73.05K size=2.17MB
      |  |       columns: all
      |  |     extrapolated-rows=disabled max-scan-range-rows=73.05K
      |  |     parquet statistics predicates: tpcds_partitioned_parquet_snap.date_dim.d_year = CAST(1998 AS INT), tpcds_partitioned_parquet_snap.date_dim.d_qoy = CAST(2 AS INT)
      |  |     parquet dictionary predicates: tpcds_partitioned_parquet_snap.date_dim.d_year = CAST(1998 AS INT), tpcds_partitioned_parquet_snap.date_dim.d_qoy = CAST(2 AS INT)
      |  |     mem-estimate=16.00MB mem-reservation=512.00KB thread-reservation=0
      |  |     tuple-ids=6 row-size=12B cardinality=186 cost=16728
      |  |     in pipelines: 07(GETNEXT) 

       

      Cardinality estimate of 11:AGGREGATE comes from this calculation:

      est_cardinality(11:AGG) = NDV(ca_county) * NDV(d_qoy) * NDV (d_year)
                              = 1825 * 4 * 196
                              = 1430800

      However, d_qoy and d_year belong to the same TupleId 6 coming out from 07:SCAN, so its cardinality can be estimated lower to this:

      est_cardinality(11:AGG) = NDV(ca_county) * est_cardinality(07:SCAN)
                              = 1825 * 186
                              = 339450

       

       

      Attachments

        Activity

          People

            rizaon Riza Suminto
            rizaon Riza Suminto
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: