Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1213

TaskTrackers restart is very slow because it deletes distributed cache directory synchronously

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.20.1
    • Fix Version/s: 0.21.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Incompatible change, Reviewed
    • Release Note:
      Directories specified in mapred.local.dir that can not be created now cause the TaskTracker to fail to start.

      Description

      We are seeing that when we restart a tasktracker, it tries to recursively delete all the file in the distributed cache. It invoked FileUtil.fullyDelete() which is very very slow. This means that the TaskTracker cannot join the cluster for an extended period of time (upto 2 hours for us). The problem is acute if the number of files in a distributed cache is a few-thousands.

        Attachments

        1. MAPREDUCE-1213.branch-0.20.patch
          16 kB
          Zheng Shao
        2. MAPREDUCE-1213.branch-0.20.2.patch
          14 kB
          Zheng Shao
        3. MAPREDUCE-1213.4.patch
          14 kB
          Zheng Shao
        4. MAPREDUCE-1213.3.patch
          14 kB
          Zheng Shao
        5. MAPREDUCE-1213.2.patch
          14 kB
          Zheng Shao
        6. MAPREDUCE-1213.1.patch
          14 kB
          Zheng Shao

        Issue Links

          Activity

            People

            • Assignee:
              zshao Zheng Shao
              Reporter:
              dhruba Dhruba Borthakur

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment