Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-10697

NDV for rank() expression is incorrect

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Frontend
    • None
    • ghx-label-7

    Description

      In the following query the cardinality of the final Aggregate is always 1 regardless of the cardinality of its child. This is because the NDV of the analytic expr such as RANK seems to always be computed as 1 which is incorrect.

      Query: explain select rnk, count(*) from (
      select * from
       (SELECT rank() OVER (ORDER BY ss_net_profit ASC) rnk
          FROM store_sales ss1
          WHERE ss_store_sk = 4) v1
      where rnk < 1000) v2
      group by rnk
      +------------------------------------------------------------------------------------------+
      | Explain String                                                                           |
      +------------------------------------------------------------------------------------------+
      | Max Per-Host Resource Reservation: Memory=13.94MB Threads=3                              |
      | Per-Host Resource Estimates: Memory=142MB                                                |
      | Analyzed query: SELECT rnk, count(*) FROM (SELECT * FROM (SELECT rank() OVER             |
      | (ORDER BY ss_net_profit ASC) rnk FROM tpcds.store_sales ss1 WHERE ss_store_sk =          |
      | CAST(4 AS INT)) v1 WHERE rnk < CAST(1000 AS BIGINT)) v2 GROUP BY rnk                     |
      |                                                                                          |
      | F01:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1                                    |
      | |  Per-Host Resources: mem-estimate=14.01MB mem-reservation=5.94MB thread-reservation=1  |
      | PLAN-ROOT SINK                                                                           |
      | |  output exprs: rnk, count(*)                                                           |
      | |  mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB thread-reservation=0   |
      | |                                                                                        |
      | 04:AGGREGATE [FINALIZE]                                                                  |
      | |  output: count(*)                                                                      |
      | |  group by: rank()                                                                      |
      | |  mem-estimate=10.00MB mem-reservation=1.94MB spill-buffer=64.00KB thread-reservation=0 |
      | |  tuple-ids=5 row-size=16B cardinality=1                                                |
      | |  in pipelines: 04(GETNEXT), 06(OPEN)                                                   |
      | |                                                                                        |
      | 03:SELECT                                                                                |
      | |  predicates: rank() < CAST(1000 AS BIGINT)                                             |
      | |  mem-estimate=0B mem-reservation=0B thread-reservation=0                               |
      | |  tuple-ids=8,7 row-size=16B cardinality=999                                            |
      | |  in pipelines: 06(GETNEXT)                                                             |
      | |                                                                                        |
      | 02:ANALYTIC                                                                              |
      | |  functions: rank()                                                                     |
      | |  order by: ss_net_profit ASC                                                           |
      | |  window: RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW                             |
      | |  mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB thread-reservation=0   |
      | |  tuple-ids=8,7 row-size=16B cardinality=999                                            |
      | |  in pipelines: 06(GETNEXT)                                                             |
      | |                                                                                        |
      | 06:TOP-N                                                                                 |
      | |  order by: ss_net_profit ASC                                                           |
      | |  limit with ties: 999                                                                  |
      | |  mem-estimate=7.80KB mem-reservation=0B thread-reservation=0                           |
      | |  tuple-ids=8 row-size=8B cardinality=999                                               |
      | |  in pipelines: 06(GETNEXT), 01(OPEN)                                                   |
      | |                                                                                        |
      | 05:EXCHANGE [UNPARTITIONED]                                                              |
      | |  mem-estimate=37.72KB mem-reservation=0B thread-reservation=0                          |
      | |  tuple-ids=8 row-size=8B cardinality=999                                               |
      | |  in pipelines: 01(GETNEXT)                                                             |
      | |                                                                                        |
      | F00:PLAN FRAGMENT [RANDOM] hosts=3 instances=3                                           |
      | Per-Host Resources: mem-estimate=128.01MB mem-reservation=8.00MB thread-reservation=2    |
      | 01:TOP-N                                                                                 |
      | |  order by: ss_net_profit ASC                                                           |
      | |  limit with ties: 999                                                                  |
      | |  source expr: rank() < CAST(1000 AS BIGINT)                                            |
      | |  mem-estimate=7.80KB mem-reservation=0B thread-reservation=0                           |
      | |  tuple-ids=8 row-size=8B cardinality=999                                               |
      | |  in pipelines: 01(GETNEXT), 00(OPEN)                                                   |
      | |                                                                                        |
      | 00:SCAN HDFS [tpcds.store_sales ss1, RANDOM]                                             |
      |    HDFS partitions=1824/1824 files=1824 size=346.60MB                                    |
      |    predicates: ss_store_sk = CAST(4 AS INT)                                              |
      |    stored statistics:                                                                    |
      |      table: rows=2.88M size=346.60MB                                                     |
      |      partitions: 1824/1824 rows=2.88M                                                    |
      |      columns: all                                                                        |
      |    extrapolated-rows=disabled max-scan-range-rows=130.09K                                |
      |    mem-estimate=128.00MB mem-reservation=8.00MB thread-reservation=1                     |
      |    tuple-ids=0 row-size=8B cardinality=480.07K                                           |
      |    in pipelines: 00(GETNEXT)                                                             |
      +------------------------------------------------------------------------------------------+
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            amansinha Aman Sinha
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: