Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
Impala 2.9.0
Description
A recent commit broke PlannerTest.testPredicatePropagation. There's a file size mismatch. The tricky part is that we already added code to ignore file size mismatches in IMPALA-2565. However, the code needs to be generalized to take into account differences in the unit, e.g., "B", vs. "KB". See the actual//results:
Section PLAN of query: SELECT count(*) FROM (SELECT * from tpch_parquet.customer c CROSS JOIN tpch_parquet.nation n WHERE n_name = 'BRAZIL' AND n_regionkey = 1 AND c_custkey % 2 = 0) cn LEFT OUTER JOIN tpch_parquet.region r ON n_regionkey = r_regionkey Actual does not match expected result: PLAN-ROOT SINK | 05:AGGREGATE [FINALIZE] | output: count(*) | 04:HASH JOIN [LEFT OUTER JOIN] | hash predicates: n.n_regionkey = r_regionkey | |--03:SCAN HDFS [tpch_parquet.region r] | partitions=1/1 files=1 size=939B ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | predicates: r.r_regionkey = 1 | 02:NESTED LOOP JOIN [CROSS JOIN] | |--01:SCAN HDFS [tpch_parquet.nation n] | partitions=1/1 files=1 size=2.25KB | predicates: n_regionkey = 1, n_name = 'BRAZIL' | 00:SCAN HDFS [tpch_parquet.customer c] partitions=1/1 files=1 size=12.27MB predicates: c_custkey % 2 = 0 Expected: PLAN-ROOT SINK | 05:AGGREGATE [FINALIZE] | output: count(*) | 04:HASH JOIN [LEFT OUTER JOIN] | hash predicates: n.n_regionkey = r_regionkey | |--03:SCAN HDFS [tpch_parquet.region r] | partitions=1/1 files=1 size=1.01KB | predicates: r.r_regionkey = 1 | 02:NESTED LOOP JOIN [CROSS JOIN] | |--01:SCAN HDFS [tpch_parquet.nation n] | partitions=1/1 files=1 size=2.38KB | predicates: n_regionkey = 1, n_name = 'BRAZIL' | 00:SCAN HDFS [tpch_parquet.customer c] partitions=1/1 files=1 size=12.27MB predicates: c_custkey % 2 = 0 Verbose plan: F00:PLAN FRAGMENT [UNPARTITIONED] PLAN-ROOT SINK | 05:AGGREGATE [FINALIZE] | output: count(*) | hosts=1 per-host-mem=unavailable | tuple-ids=4 row-size=8B cardinality=1 | 04:HASH JOIN [LEFT OUTER JOIN] | hash predicates: n.n_regionkey = r_regionkey | hosts=1 per-host-mem=unavailable | tuple-ids=0,1,3N row-size=35B cardinality=15000 | |--03:SCAN HDFS [tpch_parquet.region r] | partitions=1/1 files=1 size=939B | predicates: r.r_regionkey = 1 | table stats: 5 rows total | column stats: all | parquet statistics predicates: r.r_regionkey = 1 | parquet dictionary predicates: r.r_regionkey = 1 | hosts=1 per-host-mem=unavailable | tuple-ids=3 row-size=2B cardinality=1 | 02:NESTED LOOP JOIN [CROSS JOIN] | hosts=1 per-host-mem=unavailable | tuple-ids=0,1 row-size=33B cardinality=15000 | |--01:SCAN HDFS [tpch_parquet.nation n] | partitions=1/1 files=1 size=2.25KB | predicates: n_regionkey = 1, n_name = 'BRAZIL' | table stats: 25 rows total | column stats: all | parquet statistics predicates: n_regionkey = 1, n_name = 'BRAZIL' | parquet dictionary predicates: n_regionkey = 1, n_name = 'BRAZIL' | hosts=1 per-host-mem=unavailable | tuple-ids=1 row-size=25B cardinality=1 | 00:SCAN HDFS [tpch_parquet.customer c] partitions=1/1 files=1 size=12.27MB predicates: c_custkey % 2 = 0 table stats: 150000 rows total column stats: all parquet dictionary predicates: c_custkey % 2 = 0 hosts=1 per-host-mem=unavailable tuple-ids=0 row-size=8B cardinality=15000