Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
None
-
None
-
None
Description
I found a correctness issue while working on HIVE-4838. The following query from join_nullsafe.q gives different results depending on if it's executed map-side or reduce-side:
SELECT /*+ MAPJOIN(a) */ * FROM smb_input1 a JOIN smb_input1 b ON a.key <=> b.key AND a.value <=> b.value ORDER BY a.key, a.value, b.key, b.value;
For that query, on the map side, rows which should be joined are not. For example, the reduce side outputs this row:
a.key a.value b.key b.value 148 NULL 148 NULL
which makes sense since a.key is equal to b.key and a.value is equal to b.value but the current map-side code omits this row. The reason is that MapJoinDoubleKey is used for the map-side join which doesn't properly compare null values.