Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-5436

job history directory grows without bound, locks up job tracker on new job submission

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.19.0, 0.20.0, 0.20.1, 0.20.2
    • None
    • None
    • None

    Description

      An unpleasant surprise upgrading to 0.19: requests to jobtracker.jsp would take a long time or even time out whenever new jobs where submitted. Investigation showed the call to JobInProgress.initTasks() was calling JobHistory.JobInfo.logSubmitted() which in turn was calling JobHistory.getJobHistoryFileName() which was pegging the CPU for a couple minutes. Further investigation showed the were 200,000+ files in the job history folder – and every submission was creating a FileStatus for them all, then applying a regular expression to just the name. All this just on the off chance the job tracker had been restarted (see HADOOP-3245). To make matters worse, these files cannot be safely deleted while the job tracker is running, as the disappearance of a history file at the wrong time causes a FileNotFoundException.

      So to summarize the issues:

      • having Hadoop default to storing all the history files in a single directory is a Bad Idea
      • doing expensive processing of every history file on every job submission is a Worse Idea
      • doing expensive processing of every history file on every job submission while holding a lock on the JobInProgress object and thereby blocking the jobtracker.jsp from rendering is a Terrible Idea (note: haven't confirmed this, but a cursory glance suggests that's what's going on)
      • not being able to clean up the mess without taking down the job tracker is just Unfortunate

      Attachments

        1. HADOOP-5436.patch
          2 kB
          Tim Williamson

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            tim2 Tim Williamson
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment