Query plans that have an outer join followed by a grouping aggregation or an analytic function with a partition by clause may produce incorrect results. The reason is that Impala incorrectly optimizes away a hash exchange that is believed to be redundant (but actually is required).
Example aggregation query and bad plan:
Example analytic query and bad plan:
The problem is that if grouping/partition exprs reference nullable tuples, then we need a hash exchange to bring the NULLs of outer-join non-matches together.