Author: Alex Behm <firstname.lastname@example.org>
Date: Fri Nov 4 10:41:25 2016 -0700
IMPALA-3167: Fix assignment of WHERE conjunct through grouping agg + OJ.
Background: We generally allow the assignment of predicates below the
nullable side of a left/right outer join, explained as follows using an
SELECT * FROM t1 LEFT OUTER JOIN t2 ON t1.id = t2.id
WHERE t2.int_col < 10
The scan of 't2' picks up 't2.int_col < 10' via
Analyzer.getBoundPredicates() and recognizes that the predicate must
also be evaluated by a join later, so the predicate is not marked as
assigned. The join then picks up the unassigned predicate via
The bug was that our logic for detecting whether a bound predicate must
also be evaluated at a join node was flawed because it only considered
whether the tuples of the source or destination predicate were outer
joined (plus other conditions).
The underlying assumption is that either the source or destination tuple
are bound by a tuple produced by a TableRef, but in the buggy query the
source predicate is bound by an aggregation tuple, so we incorrectly
marked the bound predicate as assigned in Analyzer.getBoundPredicates().
The fix is to conservatively not mark bound predicates as assigned if
the slots referenced by the predicate have equivalent slots that
belong to an outer-joined tuple. As a result, a plan node may pick up
the same predicate multiple times, once via
Analyzer.getBoundPredicates() and another time via
Analyzer.getUnassignedConjuncts(). Those are deduped now.
The following example explains the duplicate predicate assignment:
SELECT * FROM (SELECT * FROM t t1) a LEFT OUTER JOIN t b ON a.id = b.id
WHERE a.id < 10
1. The predicate 'a.id < 10' gets migrated into the inline view.
'a.id < 10' is marked as assigned but is still registered as
a single-tid conjunct in the Analyzer for potential propagation
2. The scan node of 't1' calls Analyzer.getBoundPredicates() and
generates 't1.id < 10' based on the source predicate 'a.id < 10'.
3. The scan node of 't1' picks up the migrated conjunct 't1.id < 10'
Reviewed-by: Alex Behm <email@example.com>
Tested-by: Internal Jenkins