Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-1810

Prioritize hadoop parameter "mapred.reduce.task" above estimation of reducer number

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.9.0
    • None
    • None
    • None

    Description

      Anup Point this problem in PIG-1249

      Anup added a comment - 18/Jan/11 07:46 PM
      one thing that we didn't take care is the use of the hadoop parameter "mapred.reduce.tasks".
      If I specify the hadoop parameter -Dmapred.reduce.tasks=450 for all the MR jobs , it is overwritten by estimateNumberOfReducers(conf,mro), which in my case is 15.
      I am not specifying any default_parallel and PARALLEL statements.
      Ideally, the number of reducer should be 450.

      I think we should prioritize this parameter above the estimate reducers calculations.
      The priority list should be

      1. PARALLEL statement
      2. default_parallel statement
      3. mapred.reduce.task hadoop parameter
      4. estimateNumberOfreducers();

      Attachments

        Activity

          People

            zjffdu Jeff Zhang
            zjffdu Jeff Zhang
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: