Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-11030

Wrong result due to predicate pushdown into inline view with Analytic function

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • Impala 3.4.0
    • Impala 4.1.0
    • Frontend
    • None
    • ghx-label-11

    Description

      Table DDL and population:

      create table t1( c1 int, c2 char(1));
      insert into t1 values (1,cast('P' as char(1))),(2,cast('P' as char(1))),(3,cast('P' as char(1))),(4,cast('N' as char(1))),(5,cast('P' as char(1))),(6, cast('N' as char(1))),(7, cast('P' as char(1))),(8, cast('N' as char(1))),(9, cast('N' as char(1))),(10, cast('N' as char(1))), (11,cast('P' as char(1))),(12,cast('N' as char(1))),(13,cast('P' as char(1))),(14,cast('N' as char(1))),(15,cast('N' as char(1))),(16, cast('N' as char(1))),(17, cast('P' as char(1))),(18, cast('N' as char(1))),(19, cast('P' as char(1))),(20, cast('N' as char(1)));
      
      default> select * from t1;
      ------+
      
      c1	c2
      ------+
      
      11	P
      12	N
      13	P
      14	N
      15	N
      16	N
      17	P
      18	N
      19	P
      20	N
      1	P
      2	P
      3	P
      4	N
      5	P
      6	N
      7	P
      8	N
      9	N
      10	N
      ------+
      
      The following query produces a wrong num_row() for num_ranks column.
      
      default> select * from (select c1, c2 , row_number() over(order by c1) as num_ranks, row_number() over( partition by c2 order by c1) as prime_rank from t1) a where c2='P';
      
      -------------------------+
      
      c1	c2	num_ranks	prime_rank
      -------------------------+
      
      1	P	1	1
      2	P	2	2
      3	P	3	3
      5	P	4	4
      7	P	5	5
      11	P	6	6
      13	P	7	7
      17	P	8	8
      19	P	9	9
      -------------------------+
      
      

      The plan indicates that the predicate c2='P' is incorrectly pushed to the scan and affects the order of operations in the SQL statement.

      Query: explain select * from (select c1, c2 , row_number() over(order by c1) as num_ranks, row_number() over( partition by c2 order by c1) as prime_rank from t1) a where c2='P'
      
      +------------------------------------------------------------------------------------------+
      | Explain String                                                                           |
      +------------------------------------------------------------------------------------------+
      | Max Per-Host Resource Reservation: Memory=20.00MB Threads=2                              |
      | Per-Host Resource Estimates: Memory=30MB                                                 |
      | Codegen disabled by planner                                                              |
      | Analyzed query: SELECT * FROM (SELECT c1, c2, row_number() OVER (ORDER BY c1             |
      | ASC) num_ranks, row_number() OVER (PARTITION BY c2 ORDER BY c1 ASC) prime_rank           |
      | FROM `default`.t1) a WHERE CAST(c2 AS STRING) = 'P'                                      |
      |                                                                                          |
      | F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1                                    |
      | |  Per-Host Resources: mem-estimate=30.00MB mem-reservation=20.00MB thread-reservation=2 |
      | PLAN-ROOT SINK                                                                           |
      | |  output exprs: c1, c2, row_number(), row_number()                                      |
      | |  mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB thread-reservation=0   |
      | |                                                                                        |
      | 04:ANALYTIC                                                                              |
      | |  functions: row_number()                                                               |
      | |  order by: c1 ASC                                                                      |
      | |  window: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW                              |
      | |  mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB thread-reservation=0   |
      | |  tuple-ids=7,3 row-size=21B cardinality=10                                             |
      | |  in pipelines: 03(GETNEXT)                                                             |
      | |                                                                                        |
      | 03:SORT                                                                                  |
      | |  order by: c1 ASC                                                                      |
      | |  mem-estimate=6.00MB mem-reservation=6.00MB spill-buffer=2.00MB thread-reservation=0   |
      | |  tuple-ids=7 row-size=13B cardinality=10                                               |
      | |  in pipelines: 03(GETNEXT), 01(OPEN)                                                   |
      | |                                                                                        |
      | 02:ANALYTIC                                                                              |
      | |  functions: row_number()                                                               |
      | |  partition by: c2                                                                      |
      | |  order by: c1 ASC                                                                      |
      | |  window: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW                              |
      | |  mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB thread-reservation=0   |
      | |  tuple-ids=5,4 row-size=13B cardinality=10                                             |
      | |  in pipelines: 01(GETNEXT)                                                             |
      | |                                                                                        |
      | 01:SORT                                                                                  |
      | |  order by: c2 ASC NULLS LAST, c1 ASC                                                   |
      | |  mem-estimate=6.00MB mem-reservation=6.00MB spill-buffer=2.00MB thread-reservation=0   |
      | |  tuple-ids=5 row-size=5B cardinality=10                                                |
      | |  in pipelines: 01(GETNEXT), 00(OPEN)                                                   |
      | |                                                                                        |
      | 00:SCAN HDFS [default.t1]                                                                |
      |    HDFS partitions=1/1 files=2 size=91B                                                  |
      |    predicates: CAST(default.t1.c2 AS STRING) = 'P'                                       |
      |    stored statistics:                                                                    |
      |      table: rows=20 size=91B                                                             |
      |      columns: all                                                                        |
      |    extrapolated-rows=disabled max-scan-range-rows=10                                     |
      |    mem-estimate=16.00MB mem-reservation=8.00KB thread-reservation=1                      |
      |    tuple-ids=0 row-size=5B cardinality=10                                                |
      |    in pipelines: 00(GETNEXT)                                                             |
      +------------------------------------------------------------------------------------------+
      

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            amansinha Aman Sinha
            amansinha Aman Sinha
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment