Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-1252

Disk problems should be handled better by the MR framework

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.12.3
    • 0.13.0
    • None
    • None

    Description

      The MR framework should recover from Disk Failure problems without causing jobs to hang. Note that this issue is about a short-term solution to solving the problem. For example, by looking at the code and improving the exception handling (to better detect faulty disks and missing files). The long term approach might be to have a FS layer that takes care of failed disks and makes it transparent to the tasks. That will be a separate issue by itself.
      Some of the issues that have been reported are HADOOP-1087 and a comment by Koji on HADOOP-1200 (not sure whether those are all). Please add to this issue as much details as possible on disk failures leading to hung jobs (details like relevant exception traces, way to reproduce, etc.).

      Attachments

        1. 1252.may7.patch
          41 kB
          Devaraj Das
        2. 1252.new.patch
          33 kB
          Devaraj Das
        3. 1252.patch
          30 kB
          Devaraj Das
        4. 1252.patch
          31 kB
          Devaraj Das

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            ddas Devaraj Das Assign to me
            ddas Devaraj Das
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment