Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
Impala 2.5.0, Impala 2.6.0
Description
There seems to be a bug related to join cardinality estimation when views are involved.
Different plans are produced when running the same query against either base tables or identical views. The view plan is worse and the join-cardinality estimates in the view plan are consistent with not being able to find the column stats from the view's base table columns.
Plan against base tables:
select count(a.int_col) from functional.alltypessmall a inner join functional.alltypes b on (a.id = b.id) inner join functional.alltypestiny c on (b.id = c.id) +-----------------------------------------------------------+ | Explain String | +-----------------------------------------------------------+ | Estimated Per-Host Requirements: Memory=170.00MB VCores=4 | | | | 10:AGGREGATE [FINALIZE] | | | output: count:merge(a.int_col) | | | | | 09:EXCHANGE [UNPARTITIONED] | | | | | 05:AGGREGATE | | | output: count(a.int_col) | | | | | 04:HASH JOIN [INNER JOIN, PARTITIONED] | | | hash predicates: b.id = a.id | | | runtime filters: RF000 <- a.id | | | | | |--08:EXCHANGE [HASH(a.id)] | | | | | | | 00:SCAN HDFS [functional.alltypessmall a] | | | partitions=4/4 files=4 size=6.32KB | | | | | 07:EXCHANGE [HASH(b.id)] | | | | | 03:HASH JOIN [INNER JOIN, BROADCAST] | | | hash predicates: b.id = c.id | | | runtime filters: RF001 <- c.id | | | | | |--06:EXCHANGE [BROADCAST] | | | | | | | 02:SCAN HDFS [functional.alltypestiny c] | | | partitions=4/4 files=4 size=460B | | | runtime filters: RF000 -> c.id | | | | | 01:SCAN HDFS [functional.alltypes b] | | partitions=24/24 files=24 size=478.45KB | | runtime filters: RF000 -> b.id, RF001 -> b.id | +-----------------------------------------------------------+
Plan against views created with CREATE VIEW as SELECT * FROM basetable:
select count(a.int_col) from alltypessmall_view a inner join alltypes_view b on (a.id = b.id) inner join alltypestiny_view c on (b.id = c.id); +--------------------------------------------------------------------------------------+ | Explain String | +--------------------------------------------------------------------------------------+ | Estimated Per-Host Requirements: Memory=170.00MB VCores=3 | | | | 09:AGGREGATE [FINALIZE] | | | output: count:merge(a.int_col) | | | | | 08:EXCHANGE [UNPARTITIONED] | | | | | 05:AGGREGATE | | | output: count(functional.alltypessmall.int_col) | | | | | 04:HASH JOIN [INNER JOIN, BROADCAST] | | | hash predicates: functional.alltypes.id = functional.alltypestiny.id | | | runtime filters: RF000 <- functional.alltypestiny.id | | | | | |--07:EXCHANGE [BROADCAST] | | | | | | | 02:SCAN HDFS [functional.alltypestiny] | | | partitions=4/4 files=4 size=460B | | | | | 03:HASH JOIN [INNER JOIN, BROADCAST] | | | hash predicates: functional.alltypes.id = functional.alltypessmall.id | | | runtime filters: RF001 <- functional.alltypessmall.id | | | | | |--06:EXCHANGE [BROADCAST] | | | | | | | 00:SCAN HDFS [functional.alltypessmall] | | | partitions=4/4 files=4 size=6.32KB | | | runtime filters: RF000 -> functional.alltypessmall.id | | | | | 01:SCAN HDFS [functional.alltypes] | | partitions=24/24 files=24 size=478.45KB | | runtime filters: RF000 -> functional.alltypes.id, RF001 -> functional.alltypes.id | +--------------------------------------------------------------------------------------+
This is most likely regression from Impala 2.5, possibly introduced by IMPALA-976 (not yet confirmed).