Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Duplicate
-
None
-
None
-
None
-
None
Description
Using Partitioner,
If user passes negative partition number, framework happily accepts it. Data goes to wrong location and (many) reducers get zero data. Suggested resolutions:
1) Prevent the problem from start. partitioner checks the range and throws an exception if that' out of range.
2) Have a more generic check: Compare counters to see if all data gets past Shuffle stage. No leak. Per feedback we got from Owen, this idea get a bit complicated when considering having combiners.
Example: using my_id.hashCode() % numPartitions creates negative numbers and data gets lost in the framework. Reducers get zero rows ( while data is actually in partitions index with negative numbers).