Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-15524

Wrong result for LEFT outer join in Tez using MapJoinOperator

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.2.0
    • None
    • None
    • None

    Description

      To reproduce, we can revert HIVE-15493 and run in Tez:

      set hive.auto.convert.join=true;
      
      DROP TABLE IF EXISTS test_1; 
      CREATE TABLE test_1 
      ( 
      member BIGINT 
      , age VARCHAR (100) 
      ) 
      STORED AS TEXTFILE 
      ; 
      
      DROP TABLE IF EXISTS test_2; 
      CREATE TABLE test_2 
      ( 
      member BIGINT 
      ) 
      STORED AS TEXTFILE 
      ; 
      
      INSERT INTO test_1 VALUES (1, '20'), (2, '30'), (3, '40'); 
      INSERT INTO test_2 VALUES (1), (2), (3); 
      
      SELECT 
      t2.member 
      , t1.age_1 
      , t1.age_2 
      FROM 
      test_2 t2 
      LEFT JOIN ( 
      SELECT 
      member 
      , age as age_1 
      , age as age_2 
      FROM 
      test_1 
      ) t1 
      ON t2.member = t1.member 
      ;
      

      Result is:

      1	20	NULL
      3	40	NULL
      2	30	NULL
      

      Correct result is:

      1	20	20
      3	40	40
      2	30	30
      

      Bug was introduced by HIVE-10582. Though the fix in HIVE-10582 does not contain tests, it does look legit. In fact, the problem seems to be in the MapJoinOperator itself. It only happens for LEFT outer join (not with RIGHT outer or FULL outer). Although I am still trying to understand part of the MapJoinOperator code path, the bug could be in the initialization of the operator. It only happens when we have duplicate values in the right part of the output.

      HIVE-15493 provided a temporarily fix that removed the code in SemanticAnalyzer that reuses duplicated value expressions from RS to create multiple columns in the join output (this is equivalent to reverting HIVE-10582).

      This follow-up issue should take the code back and tackle the problem in the MapJoinOperator.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              jcamacho Jesús Camacho Rodríguez
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: