Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-4845

Correctness issue with MapJoins using the null safe operator

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.12.0
    • Component/s: None
    • Labels:
      None

      Description

      I found a correctness issue while working on HIVE-4838. The following query from join_nullsafe.q gives different results depending on if it's executed map-side or reduce-side:

      SELECT /*+ MAPJOIN(a) */ * FROM smb_input1 a JOIN smb_input1 b ON a.key <=> b.key AND a.value <=> b.value ORDER BY a.key, a.value, b.key, b.value;
      

      For that query, on the map side, rows which should be joined are not. For example, the reduce side outputs this row:

      a.key   a.value   b.key   b.value
      148     NULL      148     NULL
      

      which makes sense since a.key is equal to b.key and a.value is equal to b.value but the current map-side code omits this row. The reason is that MapJoinDoubleKey is used for the map-side join which doesn't properly compare null values.

        Attachments

        1. HIVE-4845.patch
          5 kB
          Brock Noland
        2. HIVE-4845.patch
          4 kB
          Brock Noland
        3. HIVE-4845.patch
          9 kB
          Brock Noland

          Activity

            People

            • Assignee:
              brocknoland Brock Noland
              Reporter:
              brocknoland Brock Noland
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: