Hadoop Common
  1. Hadoop Common
  2. HADOOP-5349

When the size required for a path is -1, LocalDirAllocator.getLocalPathForWrite fails with a DiskCheckerException when the disk it selects is bad.

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.20.0
    • Fix Version/s: 0.20.1
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    1. HADOOP-5349.txt
      2 kB
      Vinod Kumar Vavilapalli

      Issue Links

        Activity

        Hide
        Vinod Kumar Vavilapalli added a comment -

        When the size required is unknown, LocalDirAllocator selects one target disk in a round-robin fashion. Before returning this path, it tries to create the path. If the selected disk turns about to be bad, then LocalDirAllocator.getLocalPathForWrite fails with a DiskCheckerException. The fix for this is to make getLocalPathWrite to try other disks till it finds one, similar to the case when required size of the path is known.

        Show
        Vinod Kumar Vavilapalli added a comment - When the size required is unknown, LocalDirAllocator selects one target disk in a round-robin fashion. Before returning this path, it tries to create the path. If the selected disk turns about to be bad, then LocalDirAllocator.getLocalPathForWrite fails with a DiskCheckerException. The fix for this is to make getLocalPathWrite to try other disks till it finds one, similar to the case when required size of the path is known.
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Seeing this issue many times on our clusters because of many tasks are getting failed.

        Attaching patch that should fix this issue. The patch doesn't include any tests - as of now there are no tests verifying the algorithm of round-robin disk selection; writing these tests is a bit involving and can be done as part of another JIRA issue.

        Show
        Vinod Kumar Vavilapalli added a comment - Seeing this issue many times on our clusters because of many tasks are getting failed. Attaching patch that should fix this issue. The patch doesn't include any tests - as of now there are no tests verifying the algorithm of round-robin disk selection; writing these tests is a bit involving and can be done as part of another JIRA issue.
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Patch up for review. Running it through Hudson.

        Show
        Vinod Kumar Vavilapalli added a comment - Patch up for review. Running it through Hudson.
        Hide
        Vinod Kumar Vavilapalli added a comment -

        ant test-patch results:

             [exec] -1 overall.
             [exec]
             [exec]     +1 @author.  The patch does not contain any @author tags.
             [exec]
             [exec]     -1 tests included.  The patch doesn't appear to include any new or modified tests.
             [exec]                         Please justify why no tests are needed for this patch.
             [exec]
             [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
             [exec]
             [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
             [exec]
             [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
             [exec]
             [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
             [exec]
             [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
        

        As said above, tests for this fix will be added as part of another issue. Core and contrib tests also passed.

        Show
        Vinod Kumar Vavilapalli added a comment - ant test-patch results: [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no tests are needed for this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. As said above, tests for this fix will be added as part of another issue. Core and contrib tests also passed.
        Hide
        Devaraj Das added a comment -

        I just committed this. Thanks, Vinod!

        Show
        Devaraj Das added a comment - I just committed this. Thanks, Vinod!
        Hide
        Nigel Daley added a comment -

        Vinod, please reference the Jira here that will test this.

        Show
        Nigel Daley added a comment - Vinod, please reference the Jira here that will test this.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-trunk #830 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/830/)
        . Fixes a problem in LocalDirAllocator to check for the return path value that is returned for the case where the file we want to write is of an unknown size. Contributed by Vinod Kumar Vavilapalli.

        Show
        Hudson added a comment - Integrated in Hadoop-trunk #830 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/830/ ) . Fixes a problem in LocalDirAllocator to check for the return path value that is returned for the case where the file we want to write is of an unknown size. Contributed by Vinod Kumar Vavilapalli.
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Vinod, please reference the Jira here that will test this.

        HADOOP-5799

        Show
        Vinod Kumar Vavilapalli added a comment - Vinod, please reference the Jira here that will test this. HADOOP-5799

          People

          • Assignee:
            Vinod Kumar Vavilapalli
            Reporter:
            Vinod Kumar Vavilapalli
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development