Impala's equivalence class computation has a subtle bug which can lead to:
1. omitting predicates
2. adding redundant predicates
3. adding predicates that are non-evaluable at that point in the plan
In most queries, the bug has no effect on the final plan.
However, in case (1) incorrect results may be returned, and in case (3) a crash will occur.
Unfortunately, it is extremely difficult to determine from a query when this bug is being hit because the bug may or may not trigger depending on the specific implementation of Java's HashMap which has a tendency to slightly change across JVM versions. It also depends on the total number of columns (including virtual view columns) in the query.
For queries hitting this bug, even minor changes that do not affect the end result are enough to make them not hit this bug (e.g., changing a '*" to an explicit list of fewer columns).
The root cause is a bug in Impala's DisjointSet implementation which is used for computing equivalence classes.
Even minor query modifications that do not affect the query result might be enough to fix a query. For example, changing a '*' to an explicit list of (fewer) columns may be enough. Likewise, adding column references in places where they are not needed, e.g., in a EXISTS or NOT EXISTS subquery may fix the problem.