Reported by icook
The Impala docs entry for the IS DISTINCT FROM operator states:
The <=> operator, used like an equality operator in a join query, is more efficient than the equivalent clause: A = B OR (A IS NULL AND B IS NULL). The <=> operator can use a hash join, while the OR expression cannot.
But this expression is not equivalent to A <=> B. See the attached screenshot demonstrating their non-equivalence. An expression that is equivalent to A <=> B is this:
(A IS NULL AND B IS NULL) OR ((A IS NOT NULL AND B IS NOT NULL) AND (A = B))
This expression should replace the existing incorrect expression.
Another expression that is equivalent to A <=> B is:
if(A IS NULL OR B IS NULL, A IS NULL AND B IS NULL, A = B)
This one is a bit easier to follow. If you use this one in the docs, just replace the following line with:
The <=> operator can use a hash join, while the if expression cannot.