Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
0.12.0
-
None
-
None
Description
Suppose that we have a query shown below
SELECT key FROM t1 LEFT SEMI JOIN t2 ON (t1.key=t2.key);
When the number of rows of t2 is larger than hive.join.emit.interval, JoinOperator will emit rows from t1, which will result in redundant output.
Let's say t1 is
1
and t2 is
1 1 1 1
When hive.join.emit.interval=1, the output of above query will be
1 1 1 1
The correct result should be
1
This problem cannot be found in unit test. Because there is a GBY operator inserted before JoinOperator and we have only 1 mapper, the output of map phase only has distinct keys.
Please apply the patch 'wrong_semi_join.txt' attached below and use
ant test -Dtestcase=TestMinimrCliDriver -Dqfile="left_semi_join.q" -Dtest.silent=false
to replay the problem. The wrong result can be found in
<hive_root_dir>/build/ql/test/logs/clientpositive
Attachments
Attachments
Issue Links
- relates to
-
HIVE-4689 For outerjoins, joinEmitInterval might make wrong result
- Closed