Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-5817

Mappers get rescheduled on node transition even after all reducers are completed

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.3.0
    • Fix Version/s: 2.7.3
    • Component/s: applicationmaster
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      We're seeing a behavior where a job runs long after all reducers were already finished. We found that the job was rescheduling and running a number of mappers beyond the point of reducer completion. In one situation, the job ran for some 9 more hours after all reducers completed!

      This happens because whenever a node transition (to an unusable state) comes into the app master, it just reschedules all mappers that already ran on the node in all cases.

      Therefore, if any node transition has a potential to extend the job period. Once this window opens, another node transition can prolong it, and this can happen indefinitely in theory.

      If there is some instability in the pool (unhealthy, etc.) for a duration, then any big job is severely vulnerable to this problem.

      If all reducers have been completed, JobImpl.actOnUnusableNode() should not reschedule mapper tasks. If all reducers are completed, the mapper outputs are no longer needed, and there is no need to reschedule mapper tasks as they would not be consumed anyway.

      1. MAPREDUCE-5817.001.patch
        13 kB
        Sangjin Lee
      2. MAPREDUCE-5817.002.patch
        12 kB
        Sangjin Lee
      3. mapreduce-5817.patch
        13 kB
        Sangjin Lee

        Issue Links

          Activity

          Sangjin Lee created issue -
          Sangjin Lee made changes -
          Field Original Value New Value
          Attachment mapreduce-5817.patch [ 12638107 ]
          Sangjin Lee made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Target Version/s 2.5.0 [ 12326265 ]
          Assignee Sangjin Lee [ sjlee0 ]
          Gera Shegalov made changes -
          Link This issue is related to YARN-1996 [ YARN-1996 ]
          Karthik Kambatla (Inactive) made changes -
          Target Version/s 2.5.0 [ 12326265 ] 2.6.0 [ 12327180 ]
          Allen Wittenauer made changes -
          Labels BB2015-05-TBR
          Vinod Kumar Vavilapalli made changes -
          Target Version/s 2.6.0 [ 12327180 ] 2.8.0 [ 12329060 ]
          Sangjin Lee made changes -
          Attachment MAPREDUCE-5817.001.patch [ 12749938 ]
          Sangjin Lee made changes -
          Labels BB2015-05-TBR
          Sangjin Lee made changes -
          Attachment MAPREDUCE-5817.002.patch [ 12750568 ]
          Karthik Kambatla made changes -
          Summary mappers get rescheduled on node transition even after all reducers are completed Mappers get rescheduled on node transition even after all reducers are completed
          Karthik Kambatla made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Fix Version/s 2.8.0 [ 12329060 ]
          Hadoop Flags Reviewed [ 10343 ]
          Resolution Fixed [ 1 ]
          Wangda Tan made changes -
          Fix Version/s 2.7.3 [ 12334007 ]
          Fix Version/s 2.8.0 [ 12329060 ]

            People

            • Assignee:
              Sangjin Lee
              Reporter:
              Sangjin Lee
            • Votes:
              0 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development