Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-28480

Disable SMB on partition hash generator mismatch across join branches in previous RS

    XMLWordPrintableJSON

Details

    Description

      As SMB replaces last RS op from the joining branches and the JOIN op with MERGEJOIN, we need to ensure the RS before these RS, in both branches, are partitioning using same hash generator.

      Hash code generator differs based on ReducerTraits.UNIFORM i.e. ReduceSinkOperator#computeMurmurHash()  or ReduceSinkOperator#computeHashCode(), leading to different hash code for same value.

      Skip SMB join in such cases.

      Replication:

      Consider following query, where join would get converted to SMB. Auto reducer is enabled which ensures more than 1 reducer task.

       

      CREATE TABLE t_asj_18 (k STRING, v INT);
      INSERT INTO t_asj_18 values ('a', 10), ('a', 10);
      
      set hive.auto.convert.join=false;
      set hive.tez.auto.reducer.parallelism=true;
      
      EXPLAIN SELECT * FROM (
          SELECT k, COUNT(DISTINCT v), SUM(v)
          FROM t_asj_18 GROUP BY k
      ) a LEFT JOIN (
          SELECT k, COUNT(v)
          FROM t_asj_18 GROUP BY k
      ) b ON a.k = b.k; 

       

       

      Expected result is:

       

      a   1   20  a   2 

      but on master branch, it results in

       

       

      a   1   20  NULL    NULL 

       

       

      Here for COUNT(DISTINCT), the RS key is k, v while partition is still k. In such scenario reducer trait UNIFORM is not set The hash code for "a" from 2nd subquery is generated using murmurHash (270516725) while 1st is generated using bucketHash (1086686554) and result in rows with "a" key reaching different reducer tasks.

      Attachments

        Issue Links

          Activity

            People

              himanshum Himanshu Mishra
              himanshum Himanshu Mishra
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: