In spark mode, for a merge join, the flag is NOT set as true internally. The input splits will be in the order of file size. The output is out of order.
A = LOAD 'input*' as (a:int, b:int);
B = LOAD 'input*' as (a:int, b:int);
C = JOIN A BY $0, B BY $0 USING 'merge';
In MR mode, the flag was set as true internally for a merge join(see: PIG-2773). However, it doesn't work now. The output is still out of order, because the splits will be ordered again by hadoop-client. In spark mode, we can solve this issue.