Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-4845

Correctness issue with MapJoins using the null safe operator

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • None
    • 0.12.0
    • None
    • None

    Description

      I found a correctness issue while working on HIVE-4838. The following query from join_nullsafe.q gives different results depending on if it's executed map-side or reduce-side:

      SELECT /*+ MAPJOIN(a) */ * FROM smb_input1 a JOIN smb_input1 b ON a.key <=> b.key AND a.value <=> b.value ORDER BY a.key, a.value, b.key, b.value;
      

      For that query, on the map side, rows which should be joined are not. For example, the reduce side outputs this row:

      a.key   a.value   b.key   b.value
      148     NULL      148     NULL
      

      which makes sense since a.key is equal to b.key and a.value is equal to b.value but the current map-side code omits this row. The reason is that MapJoinDoubleKey is used for the map-side join which doesn't properly compare null values.

      Attachments

        1. HIVE-4845.patch
          5 kB
          Brock Noland
        2. HIVE-4845.patch
          4 kB
          Brock Noland
        3. HIVE-4845.patch
          9 kB
          Brock Noland

        Activity

          People

            brocknoland Brock Noland
            brocknoland Brock Noland
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: