Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-4099

ApplicationMaster may fail to remove staging directory

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 0.23.2
    • 0.23.3, 2.0.2-alpha
    • mrv2
    • None
    • Reviewed

    Description

      When the ApplicationMaster shuts down it's supposed to remove the staging directory, assuming properties weren't set to override this behavior. During shutdown the AM tells the ResourceManager that it has finished before it cleans up the staging directory. However upon hearing the AM has finished, the RM turns right around and kills the AM container. If the AM is too slow, the AM will be killed before the staging directory is removed.

      We're seeing the AM lose this race fairly consistently on our clusters, and the lack of staging directory cleanup quickly leads to filesystem quota issues for some users.

      Attachments

        1. MAPREDUCE-4099-addendum.patch
          13 kB
          Jason Darrell Lowe
        2. MAPREDUCE-4099-addendum.patch
          13 kB
          Jason Darrell Lowe
        3. MAPREDUCE-4099.patch
          35 kB
          Jason Darrell Lowe
        4. MAPREDUCE-4099.patch
          35 kB
          Jason Darrell Lowe
        5. MAPREDUCE-4099.patch
          5 kB
          Jason Darrell Lowe

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            jlowe Jason Darrell Lowe
            jlowe Jason Darrell Lowe
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment