Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
-
Reviewed
Description
When RANDOM() value is used for grouping/distinct/etc, it breaks the mapreduce rule and can lead to redundant or missing records.
Some discussion can be found in
https://issues.apache.org/jira/browse/PIG-3257?focusedCommentId=13669195#comment-13669195
We should make RANDOM less random so that it'll produce the same sequence of random values from the task retries.