Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-2657 TaskTracker should handle disk failures
  3. MAPREDUCE-2413

TaskTracker should handle disk failures at both startup and runtime

VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Reviewed

    Description

      At present, TaskTracker doesn't handle disk failures properly both at startup and runtime.

      (1) Currently TaskTracker doesn't come up if any of the mapred-local-dirs is on a bad disk. TaskTracker should ignore that particular mapred-local-dir and start up and use only the remaining good mapred-local-dirs.
      (2) If a disk goes bad while TaskTracker is running, currently TaskTracker doesn't do anything special. This results in either
      (a) TaskTracker continues to "try to use that bad disk" and this results in lots of task failures and possibly job failures(because of multiple TTs having bad disks) and eventually these TTs getting graylisted for all jobs. And this needs manual restart of TT with modified configuration of mapred-local-dirs avoiding the bad disk. OR
      (b) Health check script identifying the disk as bad and the TT gets blacklisted. And this also needs manual restart of TT with modified configuration of mapred-local-dirs avoiding the bad disk.

      This JIRA is to make TaskTracker more fault-tolerant to disk failures solving (1) and (2). i.e. TT should start even if at least one of the mapred-local-dirs is on a good disk and TT should adjust its in-memory list of mapred-local-dirs and avoid using bad mapred-local-dirs.

      Attachments

        1. MR-2413.v0.3.patch
          44 kB
          Ravi Gummadi
        2. MR-2413.v0.2.patch
          44 kB
          Jagane Sundar
        3. MR-2413.v0.1.patch
          44 kB
          Ravi Gummadi
        4. MR-2413.v0.patch
          44 kB
          Ravi Gummadi

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            ravidotg Ravi Gummadi
            bharathm Bharath Mundlapudi
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment