Impala may return wrong results for plans that have a partitioned join inside a union.
- plan has a partitioned join inside a union
- tables must have stats - otherwise a partitioned join would not be chosen
- for at least one equi-join condition, the left-hand side and right-hand side join keys have different types
Query that returns correct results:
Query that returns wrong results:
The bug is a missing implicit cast in the EXCHANGE 05. The id should be cast to BIGINT to be consistent with the left input of the join.
We already have code to properly cast partition expressions in exchanges, but the code incorrectly assumes that we only need to do so for hash-partitioned fragments. The problem is that the UNION makes the fragment RANDOM partitioned (because the union children could be arbitrarily partitioned there is no guarantee on which partition is produced by the fragment).
The buggy code is in PlanFragment#finalizeExchanges():
- Use the broadcast and straight_join hints to force the join to use a broadcast distribution strategy
- Reformulate the query to avoid the join inside a union
- Write the join result into a separate table and use that table in the original query instead of