Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-7264

overall reduction of ApplicationMaster exit because of unhandled TA_TOO_MANY_FETCH_FAILURE event

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Patch Available
    • Priority: Critical
    • Resolution: Unresolved
    • Affects Version/s: 3.2.1
    • Fix Version/s: None
    • Component/s: applicationmaster
    • Labels:
      None

      Description

      when rolling restart nodemanager, some mapreduce job will exit because of unhandle TA_TOO_MANY_FETCH_FAILURE event

      details:
      if task stay in SUCCEEDED state, now reciveice TA_TOO_MANY_FETCH_FAILURE event,AM will handle this situation correct,but if stay in SUCCESS_FINISHING_CONTAINER or some other state,will exit by invalid event YARN-1469 MAPREDUCE-7240 MAPREDUCE-7249 MAPREDUCE-7240
      reason:
         when map task send done rpc to AM, AM will Transition this task to 

      SUCCESS_FINISHING_CONTAINER state, and add this task to 

      mapAttemptCompletionEvents List, when reduce send 

      getMapAttemptCompletionEvents rpc to get the complete map, the task stay in SUCCESS_FINISHING_CONTAINER state will return. but if now,NM is restart or stop,many reducer task will shuffle fail,and report to AM, AM will send TA_TOO_MANY_FETCH_FAILURE event,if map task state cannot handle TA_TOO_MANY_FETCH_FAILURE event,AM will exit.

      i found isusses to resolve this problem,but not cover all situation.

      The state Transition from SUCCESS_FINISHING_CONTAINER will reciveice TA_TOO_MANY_FETCH_FAILURE event,like (SUCCEEDED,SUCCESS_CONTAINER_CLEANUP,SUCCESS_FINISHING_CONTAINER,FAILED,KILL_CONTAINER_CLEANUP)

      In hadoop 3.2.1, only SUCCEEDED,FAILED AND KILLED state can handle TA_TOO_MANY_FETCH_FAILURE event, and some jira to fix SUCCESS_CONTAINER_CLEANUP,SUCCESS_FINISHING_CONTAINER,KILLED,but KILL_CONTAINER_CLEANUP,KILL_TASK_CLEANUP also should to handle TA_TOO_MANY_FETCH_FAILURE event

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                tuyu tuyu
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated: