Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-7264

overall reduction of ApplicationMaster exit because of unhandled TA_TOO_MANY_FETCH_FAILURE event

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Patch Available
    • Critical
    • Resolution: Unresolved
    • 3.2.1
    • None
    • applicationmaster
    • None

    Description

      when rolling restart nodemanager, some mapreduce job will exit because of unhandle TA_TOO_MANY_FETCH_FAILURE event

      details:
      if task stay in SUCCEEDED state, now reciveice TA_TOO_MANY_FETCH_FAILURE event,AM will handle this situation correct,but if stay in SUCCESS_FINISHING_CONTAINER or some other state,will exit by invalid event YARN-1469 MAPREDUCE-7240 MAPREDUCE-7249 MAPREDUCE-7240
      reason:
         when map task send done rpc to AM, AM will Transition this task to 

      SUCCESS_FINISHING_CONTAINER state, and add this task to 

      mapAttemptCompletionEvents List, when reduce send 

      getMapAttemptCompletionEvents rpc to get the complete map, the task stay in SUCCESS_FINISHING_CONTAINER state will return. but if now,NM is restart or stop,many reducer task will shuffle fail,and report to AM, AM will send TA_TOO_MANY_FETCH_FAILURE event,if map task state cannot handle TA_TOO_MANY_FETCH_FAILURE event,AM will exit.

      i found isusses to resolve this problem,but not cover all situation.

      The state Transition from SUCCESS_FINISHING_CONTAINER will reciveice TA_TOO_MANY_FETCH_FAILURE event,like (SUCCEEDED,SUCCESS_CONTAINER_CLEANUP,SUCCESS_FINISHING_CONTAINER,FAILED,KILL_CONTAINER_CLEANUP)

      In hadoop 3.2.1, only SUCCEEDED,FAILED AND KILLED state can handle TA_TOO_MANY_FETCH_FAILURE event, and some jira to fix SUCCESS_CONTAINER_CLEANUP,SUCCESS_FINISHING_CONTAINER,KILLED,but KILL_CONTAINER_CLEANUP,KILL_TASK_CLEANUP also should to handle TA_TOO_MANY_FETCH_FAILURE event

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              tuyu tuyu
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: