Description
If "dual" is a one-row table, the query
SELECT count(v) FROM (
SELECT v FROM (
SELECT NULL AS v FROM dual
UNION ALL
SELECT 'foo' AS v FROM dual
) my_union WHERE isnull(v)
) my_subview;
returns 0 and the same query with union'ed sub-queries switched returns 1:
SELECT count(v) FROM (
SELECT v FROM (
SELECT 'foo' AS v FROM dual
UNION ALL
SELECT NULL AS v FROM dual
) my_union WHERE isnull(v)
) my_subview;
Example output (with Catalyst tracing turned on): https://gist.githubusercontent.com/mbautin/c916a2a7ce733d039137/raw
This is caused by the behavior in set operation implementation where only the nullability of the first child is taken into account:
SetOperation in org.apache.spark.sql.catalyst.plans.logical:
https://github.com/apache/spark/blob/82268f07abfa658869df2354ae72f8d6ddd119e8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala#L93
Union in org.apache.spark.sql.execution:
https://github.com/apache/spark/blob/e626ac5f5c27dcc74113070f2fec03682bcd12bd/sql/core/src/main/scala/org/apache/spark/sql/execution/basicOperators.scala#L178
Intersect in org.apache.spark.sql.execution:
https://github.com/apache/spark/blob/e626ac5f5c27dcc74113070f2fec03682bcd12bd/sql/core/src/main/scala/org/apache/spark/sql/execution/basicOperators.scala#L320
Except in org.apache.spark.sql.execution:
https://github.com/apache/spark/blob/e626ac5f5c27dcc74113070f2fec03682bcd12bd/sql/core/src/main/scala/org/apache/spark/sql/execution/basicOperators.scala#L307