Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-3425

Partitioner Happilly accepts negative int number and data gets lost in Hadoop framework

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • None
    • None
    • None
    • None

    Description

      Using Partitioner,

      If user passes negative partition number, framework happily accepts it. Data goes to wrong location and (many) reducers get zero data. Suggested resolutions:

      1) Prevent the problem from start. partitioner checks the range and throws an exception if that' out of range.

      2) Have a more generic check: Compare counters to see if all data gets past Shuffle stage. No leak. Per feedback we got from Owen, this idea get a bit complicated when considering having combiners.

      Example: using my_id.hashCode() % numPartitions creates negative numbers and data gets lost in the framework. Reducers get zero rows ( while data is actually in partitions index with negative numbers).

      Attachments

        Activity

          People

            Unassigned Unassigned
            amirhyoussefi Amir Youssefi
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: