Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-976

Planner cardinality estimates from joins can be improved.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • Impala 1.3.1, Impala 2.0, Impala 2.3.0
    • Impala 2.5.0
    • Frontend

    Description

      I think we assume n:1 joins so the cardinality is not reduced after the join operator

      Here's an example from tpcds q19 but it applies to almost all the tpcds queries.

      Operator          Detail          #Hosts      Avg Time      Max Time         #Rows    Est. #Rows      Peak Mem   Est. Peak Mem
      ------------------------------------------------------------------------------------------------------------------------------
      18:TOP-N                               1       3.708ms       3.708ms           100           100     117.44 KB         -1.00 B
      17:EXCHANGE                            1     236.149us     236.149us          1000           100       9.77 KB         -1.00 B
      10:TOP-N                              10     555.602us      591.50us          1000           100      36.00 KB         4.69 KB
      16:AGGREGATE                          10       4.950ms        5.85ms          3227     264233526     184.84 KB         6.75 GB
      15:EXCHANGE                           10      14s941ms      14s952ms         32270     264233526      30.09 KB               0
      09:AGGREGATE                          10     201.615ms     280.335ms         32270     264233526      13.68 MB        12.99 GB
      08:HASH JOIN      BROADCAST           10     112.415ms     168.229ms       4646655     264233526      12.32 MB        42.00 KB                       
      |--14:EXCHANGE                        10     286.415ms     296.645ms          1350          1350      44.96 KB               0
      |  04:SCAN HDFS   store                1      39.760ms      39.760ms          1350          1350     243.32 KB        32.00 MB
      07:HASH JOIN      BROADCAST           10       3s519ms       3s900ms       4708900     264233526     993.89 MB       453.97 MB
      |--13:EXCHANGE                        10       2s731ms       2s912ms      15000000      15000000      21.86 MB               0
      |  03:SCAN HDFS   customer_address     8     328.503ms     862.653ms      15000000      15000000      26.30 MB        80.00 MB
      06:HASH JOIN      BROADCAST           10      16s891ms      17s963ms       4708900     264233526       1.44 GB       377.66 MB
      |--12:EXCHANGE                        10       3s785ms       4s422ms      30000000      30000000      17.84 MB               0
      |  02:SCAN HDFS   customer             9     292.638ms       1s030ms      30000000      30000000      40.81 MB       176.00 MB
      05:HASH JOIN      BROADCAST           10       2s903ms       3s397ms       4823041     264233526       1.10 MB       292.41 KB                       <-- We assumed all probe rows are returned and now the estimate is off by a factor of 54.
      |--11:EXCHANGE                        10       1s629ms       1s642ms          6478          3429      99.51 KB               0      
      |  01:SCAN HDFS   item                 1     284.503ms     284.503ms          6478          3429       7.76 MB       288.00 MB                       <-- The estimate of the first dim table is close
      00:SCAN HDFS      store_sales         10     134.410ms      227.81ms     264233526     264233526     446.98 MB       352.00 MB                       <-- We have perfect stats on the left most table.
      

      Attachments

        Activity

          People

            alex.behm Alexander Behm
            nong_impala_60e1 Nong Li
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: