Affects Version/s: Impala 3.4.0
Fix Version/s: None
When a query involves the join of views each created based on multiple tables, the inferred predicate(s) is(are) not assigned to the scan node(s). This issue is/seems related to https://issues.apache.org/jira/browse/IMPALA-4578#.
In the following a minimum example to reproduce the phenomenon.
For easy reference, the contents of tables pt1, pt2, pta1, pta2, and views myview_1_on_2_tables, myview_2_on_2_tables are also given as follows.
Contents of table pt1 afterwards:
Contents of table pt2 afterwards:
Contents of table pta1 afterwards:
Contents of table pta2 afterwards:
Contents in myview_1_on_2_parquet_tables (union of tables t1 and t2):
Contents in myview_2_on_2_parquet_tables (union of tables ta1 and ta2):
After creating the related tables and views described above, we consider the following 2 queries.
Both queries join those 2 views on the column table_source and filter out those rows not satisfying table_source = 'ONE'. Both queries produce the same result set as the following.
However, according to the query profile, Query 1 results in 3 scans on tables pt1, pta1, and pta2, respectively. On the other hand, Query 2 that incorporates the additional/redundant predicate "b.table_source_a = 'ONE'" only involves 2 scans on tables pt1 and pta1, respectively due to this seemingly redundant predicate on b.table_source_a.
Hence, it can be seen that the plan generated from Query 1 is sub-optimal since a table that cannot contain any row in the result set is still scanned, i.e., table pta2.