Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
0.12.0, 0.13.0
-
None
-
None
Description
For example,
select x.key, y.value, count(*) from src x right outer join src1 y on (x.key=y.key and x.value=y.value) group by x.key, y.value;
Correlation optimizer will determine that a single MR job is enough for this query. However, the group by key are from both left and right tables of the right outer join.
We will have a wrong result like
NULL 4 NULL val_165 1 NULL val_193 1 NULL val_265 1 NULL val_27 1 NULL val_409 1 NULL val_484 1 NULL 1 146 val_146 2 150 val_150 1 213 val_213 2 NULL 1 238 val_238 2 255 val_255 2 273 val_273 3 278 val_278 2 311 val_311 3 NULL 1 401 val_401 5 406 val_406 4 66 val_66 1 98 val_98 2
Rows with both x.key and y.value are null may not be grouped.