Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
Impala 3.4.0
-
None
-
ghx-label-11
Description
Table DDL and population:
create table t1( c1 int, c2 char(1)); insert into t1 values (1,cast('P' as char(1))),(2,cast('P' as char(1))),(3,cast('P' as char(1))),(4,cast('N' as char(1))),(5,cast('P' as char(1))),(6, cast('N' as char(1))),(7, cast('P' as char(1))),(8, cast('N' as char(1))),(9, cast('N' as char(1))),(10, cast('N' as char(1))), (11,cast('P' as char(1))),(12,cast('N' as char(1))),(13,cast('P' as char(1))),(14,cast('N' as char(1))),(15,cast('N' as char(1))),(16, cast('N' as char(1))),(17, cast('P' as char(1))),(18, cast('N' as char(1))),(19, cast('P' as char(1))),(20, cast('N' as char(1)));
default> select * from t1; ------+ c1 c2 ------+ 11 P 12 N 13 P 14 N 15 N 16 N 17 P 18 N 19 P 20 N 1 P 2 P 3 P 4 N 5 P 6 N 7 P 8 N 9 N 10 N ------+ The following query produces a wrong num_row() for num_ranks column. default> select * from (select c1, c2 , row_number() over(order by c1) as num_ranks, row_number() over( partition by c2 order by c1) as prime_rank from t1) a where c2='P'; -------------------------+ c1 c2 num_ranks prime_rank -------------------------+ 1 P 1 1 2 P 2 2 3 P 3 3 5 P 4 4 7 P 5 5 11 P 6 6 13 P 7 7 17 P 8 8 19 P 9 9 -------------------------+
The plan indicates that the predicate c2='P' is incorrectly pushed to the scan and affects the order of operations in the SQL statement.
Query: explain select * from (select c1, c2 , row_number() over(order by c1) as num_ranks, row_number() over( partition by c2 order by c1) as prime_rank from t1) a where c2='P' +------------------------------------------------------------------------------------------+ | Explain String | +------------------------------------------------------------------------------------------+ | Max Per-Host Resource Reservation: Memory=20.00MB Threads=2 | | Per-Host Resource Estimates: Memory=30MB | | Codegen disabled by planner | | Analyzed query: SELECT * FROM (SELECT c1, c2, row_number() OVER (ORDER BY c1 | | ASC) num_ranks, row_number() OVER (PARTITION BY c2 ORDER BY c1 ASC) prime_rank | | FROM `default`.t1) a WHERE CAST(c2 AS STRING) = 'P' | | | | F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1 | | | Per-Host Resources: mem-estimate=30.00MB mem-reservation=20.00MB thread-reservation=2 | | PLAN-ROOT SINK | | | output exprs: c1, c2, row_number(), row_number() | | | mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB thread-reservation=0 | | | | | 04:ANALYTIC | | | functions: row_number() | | | order by: c1 ASC | | | window: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW | | | mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB thread-reservation=0 | | | tuple-ids=7,3 row-size=21B cardinality=10 | | | in pipelines: 03(GETNEXT) | | | | | 03:SORT | | | order by: c1 ASC | | | mem-estimate=6.00MB mem-reservation=6.00MB spill-buffer=2.00MB thread-reservation=0 | | | tuple-ids=7 row-size=13B cardinality=10 | | | in pipelines: 03(GETNEXT), 01(OPEN) | | | | | 02:ANALYTIC | | | functions: row_number() | | | partition by: c2 | | | order by: c1 ASC | | | window: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW | | | mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB thread-reservation=0 | | | tuple-ids=5,4 row-size=13B cardinality=10 | | | in pipelines: 01(GETNEXT) | | | | | 01:SORT | | | order by: c2 ASC NULLS LAST, c1 ASC | | | mem-estimate=6.00MB mem-reservation=6.00MB spill-buffer=2.00MB thread-reservation=0 | | | tuple-ids=5 row-size=5B cardinality=10 | | | in pipelines: 01(GETNEXT), 00(OPEN) | | | | | 00:SCAN HDFS [default.t1] | | HDFS partitions=1/1 files=2 size=91B | | predicates: CAST(default.t1.c2 AS STRING) = 'P' | | stored statistics: | | table: rows=20 size=91B | | columns: all | | extrapolated-rows=disabled max-scan-range-rows=10 | | mem-estimate=16.00MB mem-reservation=8.00KB thread-reservation=1 | | tuple-ids=0 row-size=5B cardinality=10 | | in pipelines: 00(GETNEXT) | +------------------------------------------------------------------------------------------+