Issue Details (XML | Word | Printable)

Key: HADOOP-5349
Type: Bug Bug
Status: Resolved Resolved
Resolution: Fixed
Priority: Major Major
Assignee: Vinod K V
Reporter: Vinod K V
Votes: 0
Watchers: 2
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

When the size required for a path is -1, LocalDirAllocator.getLocalPathForWrite fails with a DiskCheckerException when the disk it selects is bad.

Created: 27/Feb/09 12:39 PM   Updated: 08/Jul/09 04:53 PM
Return to search
Component/s: None
Affects Version/s: 0.20.0
Fix Version/s: 0.20.1

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works HADOOP-5349.txt 2009-05-06 08:57 AM Vinod K V 2 kB
Issue Links:
Reference
 

Hadoop Flags: Reviewed
Resolution Date: 08/May/09 08:04 AM


 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Vinod K V added a comment - 27/Feb/09 12:42 PM
When the size required is unknown, LocalDirAllocator selects one target disk in a round-robin fashion. Before returning this path, it tries to create the path. If the selected disk turns about to be bad, then LocalDirAllocator.getLocalPathForWrite fails with a DiskCheckerException. The fix for this is to make getLocalPathWrite to try other disks till it finds one, similar to the case when required size of the path is known.

Vinod K V added a comment - 06/May/09 08:57 AM
Seeing this issue many times on our clusters because of many tasks are getting failed.

Attaching patch that should fix this issue. The patch doesn't include any tests - as of now there are no tests verifying the algorithm of round-robin disk selection; writing these tests is a bit involving and can be done as part of another JIRA issue.


Vinod K V added a comment - 06/May/09 08:59 AM
Patch up for review. Running it through Hudson.

Vinod K V added a comment - 07/May/09 04:40 PM

ant test-patch results:

[exec] -1 overall.
     [exec]
     [exec]     +1 @author.  The patch does not contain any @author tags.
     [exec]
     [exec]     -1 tests included.  The patch doesn't appear to include any new or modified tests.
     [exec]                         Please justify why no tests are needed for this patch.
     [exec]
     [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
     [exec]
     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
     [exec]
     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
     [exec]
     [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
     [exec]
     [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.

As said above, tests for this fix will be added as part of another issue. Core and contrib tests also passed.


Devaraj Das added a comment - 08/May/09 08:04 AM
I just committed this. Thanks, Vinod!

Nigel Daley added a comment - 08/May/09 04:17 PM
Vinod, please reference the Jira here that will test this.

Hudson added a comment - 08/May/09 07:55 PM
Integrated in Hadoop-trunk #830 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/830/)
. Fixes a problem in LocalDirAllocator to check for the return path value that is returned for the case where the file we want to write is of an unknown size. Contributed by Vinod Kumar Vavilapalli.

Vinod K V added a comment - 11/May/09 06:04 AM

Vinod, please reference the Jira here that will test this.

HADOOP-5799