Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-15327

Outerjoin might produce wrong result depending on joinEmitInterval value

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 1.3.0, 2.2.0
    • 2.2.0
    • Query Processor
    • None

    Description

      If joinEmitInterval is smaller than the group size, outerjoins might produce records with NULL appended values multiple times (once per group).

      HIVE-4689 targeted the same problem. However, the fix does not seem to cover all cases (in particular, it will not apply to left outer joins with filter conditions on the left input). The solution in HIVE-4689 was to disable (override) joinEmitInterval value for those cases. This fix follows the same approach.

      To reproduce the problem:

      set hive.strict.checks.cartesian.product=false;
      set hive.join.emit.interval=1;
      
      CREATE TABLE test1 (key INT, value INT, col_1 STRING);
      INSERT INTO test1 VALUES (99, 0, 'Alice');
      INSERT INTO test1 VALUES (99, 2, 'Mat');
      INSERT INTO test1 VALUES (100, 1, 'Bob');
      INSERT INTO test1 VALUES (101, 2, 'Car');
      
      CREATE TABLE test2 (key INT, value INT, col_2 STRING);
      INSERT INTO test2 VALUES (102, 2, 'Del');
      INSERT INTO test2 VALUES (103, 2, 'Ema');
      INSERT INTO test2 VALUES (104, 3, 'Fli');
      
      
      -- Equi-condition and condition on one input (left outer join)
      SELECT *
      FROM test1 LEFT OUTER JOIN test2
      ON (test1.value=test2.value AND test1.key between 100 and 102)
      LIMIT 10;
      
      -- Condition on one input (left outer join)
      SELECT *
      FROM test1 LEFT OUTER JOIN test2
      ON (test1.key between 100 and 102)
      LIMIT 10;
      

      For the first query, current (incorrect) result is:

       99     0       Alice   NULL    NULL    NULL
       100    1       Bob     NULL    NULL    NULL
       101    2       Car     103     2       Ema
       99     2       Mat     NULL    NULL    NULL
       101    2       Car     102     2       Del
       99     2       Mat     NULL    NULL    NULL
      

      Expected (correct) result is:

       99     0       Alice   NULL    NULL    NULL
       100    1       Bob     NULL    NULL    NULL
       101    2       Car     103     2       Ema
       101    2       Car     102     2       Del
       99     2       Mat     NULL    NULL    NULL
      

      For the second query, current (incorrect) result is:

       101    2       Car     104     3       Fli
       100    1       Bob     104     3       Fli
       99     2       Mat     NULL    NULL    NULL
       99     0       Alice   NULL    NULL    NULL
       101    2       Car     103     2       Ema
       100    1       Bob     103     2       Ema
       99     2       Mat     NULL    NULL    NULL
       99     0       Alice   NULL    NULL    NULL
       101    2       Car     102     2       Del
       100    1       Bob     102     2       Del
      

      Expected (correct) result is:

       101    2       Car     104     3       Fli
       101    2       Car     103     2       Ema
       101    2       Car     102     2       Del
       100    1       Bob     104     3       Fli
       100    1       Bob     103     2       Ema
       100    1       Bob     102     2       Del
       99     2       Mat     NULL    NULL    NULL
       99     0       Alice   NULL    NULL    NULL
      

      Attachments

        1. HIVE-15327.patch
          47 kB
          jcamachorodriguez
        2. HIVE-15327.01.patch
          47 kB
          jcamachorodriguez
        3. HIVE-15327.02.patch
          36 kB
          jcamachorodriguez
        4. HIVE-15327.03.patch
          36 kB
          jcamachorodriguez

        Issue Links

          Activity

            People

              jcamacho Jesús Camacho Rodríguez
              jcamacho Jesús Camacho Rodríguez
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: