Please find below a minimal example of a Pig script that uses splits and replicated joins and where the output differs between MapReduce and TEZ as execution engine. The attachment also contains the sample input data.
The expected output, as created by MapReduce engine is:
whereas TEZ produces
Removing the USING 'replicated' and using a regular join yields correct results. I am not sure if this is a Pig issue or a TEZ issue. However, as this issue silently can lead to data corruption I rated it critical. So far searching didn't indicate a similar bug or anybody being aware of it.