Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
Our project aims to ensure data uniformity, Map results use System.currentTimeMillis()%numOfRed decide which Reduce task to send to.
In the case of unstable key output from repeated Map tasks, if a Map task is rerun due to various reasons, it may cause result data omission or duplication.
I want compare some Map-Reduce Framework Counters(MAP_OUTPUT_RECORDS, COMBINER_INPUT_RECORDS, COMBINER_OUPUT_REDORDS, REDUCE_INPUT_RECORDS), to point out or warn user there are issues with the results.