Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-10136

Cardinality estimates for aggregation operations don't consider conjuncts on grouping expressions correctly

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • Impala 3.4.0
    • None
    • Frontend
    • ghx-label-5

    Description

      ComputeStats() in the PlanNode calls estimateNumGroups() for the AggregationNode to calculate the cardinality of a grouping expression. Then in a later step applyConjunctsSelectivity() is called to adjust the cardinality based on the available conjuncts. However with aggregation operations certain conjuncts i.e. those from the HAVING clause or conjuncts on the grouping expressions affect the number of groups produced.

      ndv(day) = 11 

      count(alltypesagg) = 10280

      Query: explain select day, count(*) from alltypesagg where day=2 group by 1
      +------------------------------------------------------------+
      | Explain String                                             |
      +------------------------------------------------------------+
      | Max Per-Host Resource Reservation: Memory=4.06MB Threads=4 |
      | Per-Host Resource Estimates: Memory=52MB                   |
      | Codegen disabled by planner                                |
      |                                                            |
      | PLAN-ROOT SINK                                             |
      | |                                                          |
      | 04:EXCHANGE [UNPARTITIONED]                                |
      | |                                                          |
      | 03:AGGREGATE [FINALIZE]                                    |
      | |  output: count:merge(*)                                  |
      | |  group by: `day`                                         |
      | |  row-size=12B cardinality=11                             |
      | |                                                          |
      | 02:EXCHANGE [HASH(`day`)]                                  |
      | |                                                          |
      | 01:AGGREGATE [STREAMING]                                   |
      | |  output: count(*)                                        |
      | |  group by: `day`                                         |
      | |  row-size=12B cardinality=11                             |
      | |                                                          |
      | 00:SCAN HDFS [functional.alltypesagg]                      |
      |    partition predicates: `day` = 2                         |
      |    HDFS partitions=1/11 files=1 size=74.48KB               |
      |    row-size=4B cardinality=1.00K                           |
      +------------------------------------------------------------+
      Fetched 24 row(s) in 0.02s
      

       

      Given the predicate day=1 applies to the grouping expression the cardinality of the aggregation node should b 1 as opposed to 11.

      Attachments

        Issue Links

          Activity

            People

              superdupershant Shant Hovsepian
              superdupershant Shant Hovsepian
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: