Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-4781

LEFT SEMI JOIN generates wrong results when the number of rows belonging to a single key of the right table exceed hive.join.emit.interval

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.12.0
    • Fix Version/s: 0.12.0
    • Component/s: None
    • Labels:
      None

      Description

      Suppose that we have a query shown below

      SELECT key FROM t1 LEFT SEMI JOIN t2 ON (t1.key=t2.key);
      

      When the number of rows of t2 is larger than hive.join.emit.interval, JoinOperator will emit rows from t1, which will result in redundant output.

      Let's say t1 is

      1
      

      and t2 is

      1
      1
      1
      1
      

      When hive.join.emit.interval=1, the output of above query will be

      1
      1
      1
      1
      

      The correct result should be

      1
      

      This problem cannot be found in unit test. Because there is a GBY operator inserted before JoinOperator and we have only 1 mapper, the output of map phase only has distinct keys.

      Please apply the patch 'wrong_semi_join.txt' attached below and use

      ant test -Dtestcase=TestMinimrCliDriver -Dqfile="left_semi_join.q" -Dtest.silent=false
      

      to replay the problem. The wrong result can be found in

      <hive_root_dir>/build/ql/test/logs/clientpositive
      

        Attachments

        1. HIVE-4781.txt
          7 kB
          Yin Huai
        2. wrong_semi_join.txt
          3 kB
          Yin Huai

          Issue Links

            Activity

              People

              • Assignee:
                yhuai Yin Huai
                Reporter:
                yhuai Yin Huai
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: