This is very likely a regression from
When using array_intersect(a, b), if the first parameter contains a NULL value and the second one does not, an extraneous NULL is present in the output. This also leads to array_intersect(a, b) != array_intersect(b, a) which is incorrect as set intersection should be commutative.
Example using PySpark:
Note that in the first case, a does not contain a NULL, and the final output is correct: . In the second case, since b does contain NULL and is now the first parameter.
The same behavior occurs in Scala when writing to Parquet: