Description
Test1:
Perform full join on two relation with left relation being blank and right containing records
empty_relation = FILTER a_relation by (join_column=='eliminate everything');
Test_output = JOIN empty_relation by (join_column) FULL , non_empty_relation by (join_column);
Result : Zero records returned.
Test2:
Perform full join on two relation with left relation being blank and right containing records using skewed
Test_output = JOIN empty_relation by (join_column) FULL , non_empty_relation by (join_column) using ‘skewed’;
Result : Zero records returned.
Test3:
Perform full join on two relation with left relation being blank and right containing records using parallel
Test_output = JOIN empty_relation by (join_column) FULL , non_empty_relation by (join_column) PARALLEL 10;
Result : Zero records returned.
Test4:
Perform full join on two relation with left relation being non empty and right not containing records using parallel
Test_output = JOIN , non_empty_relation by (join_column) FULL , empty_relation by (join_column) PARALLEL 10;
Result : valid records returned.
Observation:
1) If the either relation is blank , skewed full outer join does not return anything
2) If the non empty relation is kept on left, everything works except skewed
3) FULL OUTER will only work if the left relation is not empty
4) Skewed will only work if both relation is non empty.
Attachments
Attachments
Issue Links
- is broken by
-
PIG-4377 Skewed outer join produce wrong result if a key is oversampled
- Closed