Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-6263

Configurable timeout between YARNRunner terminate the application and forcefully kill.

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.6.0
    • 2.7.0
    • client
    • None
    • Reviewed

    Description

      YARNRunner connects to the AM to send the kill job command then waits a hardcoded 10 seconds for the job to enter a terminal state. If the job fails to enter a terminal state in that time then YARNRunner will tell YARN to kill the application forcefully. The latter type of kill usually results in no job history, since the AM process is killed forcefully.

      Ten seconds can be too short for large jobs in a large cluster, as it takes time to connect to all the nodemanagers, process the state machine events, and copy a large jhist file. The timeout should be more lenient or configurable.

      Attachments

        1. MAPREDUCE-6263.v2.txt
          5 kB
          Eric Payne
        2. MAPREDUCE-6263.v1.txt
          4 kB
          Eric Payne

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            epayne Eric Payne
            jlowe Jason Darrell Lowe
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment