Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
1.3.0, 2.2.0
-
None
Description
If joinEmitInterval is smaller than the group size, outerjoins might produce records with NULL appended values multiple times (once per group).
HIVE-4689 targeted the same problem. However, the fix does not seem to cover all cases (in particular, it will not apply to left outer joins with filter conditions on the left input). The solution in HIVE-4689 was to disable (override) joinEmitInterval value for those cases. This fix follows the same approach.
To reproduce the problem:
set hive.strict.checks.cartesian.product=false; set hive.join.emit.interval=1; CREATE TABLE test1 (key INT, value INT, col_1 STRING); INSERT INTO test1 VALUES (99, 0, 'Alice'); INSERT INTO test1 VALUES (99, 2, 'Mat'); INSERT INTO test1 VALUES (100, 1, 'Bob'); INSERT INTO test1 VALUES (101, 2, 'Car'); CREATE TABLE test2 (key INT, value INT, col_2 STRING); INSERT INTO test2 VALUES (102, 2, 'Del'); INSERT INTO test2 VALUES (103, 2, 'Ema'); INSERT INTO test2 VALUES (104, 3, 'Fli'); -- Equi-condition and condition on one input (left outer join) SELECT * FROM test1 LEFT OUTER JOIN test2 ON (test1.value=test2.value AND test1.key between 100 and 102) LIMIT 10; -- Condition on one input (left outer join) SELECT * FROM test1 LEFT OUTER JOIN test2 ON (test1.key between 100 and 102) LIMIT 10;
For the first query, current (incorrect) result is:
99 0 Alice NULL NULL NULL 100 1 Bob NULL NULL NULL 101 2 Car 103 2 Ema 99 2 Mat NULL NULL NULL 101 2 Car 102 2 Del 99 2 Mat NULL NULL NULL
Expected (correct) result is:
99 0 Alice NULL NULL NULL 100 1 Bob NULL NULL NULL 101 2 Car 103 2 Ema 101 2 Car 102 2 Del 99 2 Mat NULL NULL NULL
For the second query, current (incorrect) result is:
101 2 Car 104 3 Fli 100 1 Bob 104 3 Fli 99 2 Mat NULL NULL NULL 99 0 Alice NULL NULL NULL 101 2 Car 103 2 Ema 100 1 Bob 103 2 Ema 99 2 Mat NULL NULL NULL 99 0 Alice NULL NULL NULL 101 2 Car 102 2 Del 100 1 Bob 102 2 Del
Expected (correct) result is:
101 2 Car 104 3 Fli 101 2 Car 103 2 Ema 101 2 Car 102 2 Del 100 1 Bob 104 3 Fli 100 1 Bob 103 2 Ema 100 1 Bob 102 2 Del 99 2 Mat NULL NULL NULL 99 0 Alice NULL NULL NULL
Attachments
Attachments
Issue Links
- blocks
-
HIVE-15251 Provide support for complex expressions in ON clauses for OUTER joins
- Closed
- relates to
-
HIVE-4689 For outerjoins, joinEmitInterval might make wrong result
- Closed