Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-4813

AM timing out during job commit

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 0.23.3, 2.0.1-alpha
    • Fix Version/s: 2.0.3-alpha, 0.23.6
    • Component/s: applicationmaster
    • Labels:
      None

      Description

      The AM calls the output committer's commitJob method synchronously during JobImpl state transitions, which means the JobImpl write lock is held the entire time the job is being committed. Holding the write lock prevents the RM allocator thread from heartbeating to the RM. Therefore if committing the job takes too long (e.g.: the job has tons of files to commit and/or the namenode is bogged down) then the AM appears to be unresponsive to the RM and the RM kills the AM attempt.

        Attachments

        1. MAPREDUCE-4813-2-branch-0.23.patch
          132 kB
          Jason Darrell Lowe
        2. MAPREDUCE-4813-2.patch
          138 kB
          Jason Darrell Lowe
        3. MAPREDUCE-4813-2.patch
          138 kB
          Jason Darrell Lowe
        4. MAPREDUCE-4813-2.patch
          138 kB
          Jason Darrell Lowe
        5. MAPREDUCE-4813-2.patch
          138 kB
          Jason Darrell Lowe
        6. MAPREDUCE-4813.patch
          13 kB
          Jason Darrell Lowe
        7. MAPREDUCE-4813.patch
          25 kB
          Jason Darrell Lowe
        8. MAPREDUCE-4813.patch
          31 kB
          Jason Darrell Lowe
        9. JobImplStateMachine.pdf
          42 kB
          Jason Darrell Lowe

        Issue Links

          Activity

            People

            • Assignee:
              jlowe Jason Darrell Lowe
              Reporter:
              jlowe Jason Darrell Lowe

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment