Issue Details (XML | Word | Printable)

Key: HADOOP-5958
Type: Bug Bug
Status: Patch Available Patch Available
Priority: Major Major
Assignee: Aaron Kimball
Reporter: Devaraj Das
Votes: 0
Watchers: 10
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

Use JDK 1.6 File APIs in DF.java wherever possible

Created: 02/Jun/09 03:54 PM   Updated: Today 07:26 AM
Return to search
Component/s: fs
Affects Version/s: None
Fix Version/s: 0.22.0

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works HADOOP-5958-hdfs.patch 2009-10-29 01:59 AM Aaron Kimball 1 kB
Text File Licensed for inclusion in ASF works HADOOP-5958-mapred.patch 2009-10-29 01:59 AM Aaron Kimball 0.6 kB
Text File Licensed for inclusion in ASF works HADOOP-5958.2.patch 2009-10-29 09:08 PM Aaron Kimball 18 kB
Text File Licensed for inclusion in ASF works HADOOP-5958.3.patch 2009-10-29 10:13 PM Aaron Kimball 18 kB
Text File Licensed for inclusion in ASF works HADOOP-5958.4.patch 2009-11-09 11:15 PM Aaron Kimball 18 kB
Text File Licensed for inclusion in ASF works HADOOP-5958.5.patch 2009-11-28 02:07 AM Aaron Kimball 6 kB
Text File Licensed for inclusion in ASF works HADOOP-5958.6.patch 2009-11-28 06:59 AM Aaron Kimball 6 kB
Text File Licensed for inclusion in ASF works HADOOP-5958.patch 2009-10-29 02:00 AM Aaron Kimball 9 kB


 Description  « Hide
JDK 1.6 has File APIs like File.getFreeSpace() which should be used instead of spawning a command process for getting the various disk/partition related attributes. This would avoid spikes in memory consumption by tasks when things like LocalDirAllocator is used for creating paths on the filesystem.

 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Aaron Kimball added a comment - 22/Oct/09 07:20 PM
Do we want to use File.getFreeSpace() or File.getUsableSpace()? The API docs (http://java.sun.com/javase/6/docs/api/java/io/File.html) are pretty vague about what "usable" means. A quick test on my machine shows that they do return different values, by about 6%.

Aaron Kimball added a comment - 22/Oct/09 07:28 PM
Another question: o.a.h.fs.DF currently inherits the public interface of o.a.h.util.Shell; this would be irrelevant if usage of the "df" command itself were eliminated. Should we:

a) Remove the inheritance from Shell? (incompatible change)
b) Override all the Shell methods with dummies that return meaningless data (e.g., for getExitCode())
c) ...or?


Doug Cutting added a comment - 22/Oct/09 08:02 PM
> Remove the inheritance from Shell? (incompatible change)

I'd be surprised if this broke any applications. It would indeed be safer to add the dummy methods, deprecated, and file an issue to remove them attached to the next release, but that feels like overkill here. Does any code in Hadoop depend on it extending Shell?

Does java.io.File let you find the mount point? I thought we used that in places, but perhaps not. I guess if it's used then we might still need to use the 'df' command to support that, although perhaps we could avoid it for the other methods.


Aaron Kimball added a comment - 22/Oct/09 09:03 PM
Hm. A casual inspection shows that uses of DF do not seem to depend on superclass behavior. I concur that we're probably safe with just changing its interface outright and flagging the change as incompatible in JIRA.

As for your other question... not so easy.

File provides a listRoots() method which doesn't exactly do the same thing. A quick test shows that it just returns "/" for a file named /home/aaron/foo, even though I have /home on a separate partition than /.

The getMounts() method is used in LocalFileSystem.reportChecksumFailure() to move files to a bad_files directory on the same device as their original storage. Replacing getMounts() naively with listRoots() could result in actual IO transfer, vs. just metadata updates in the local fs.

In Java 7, the java.nio.file.FileSystem class (and friends like Path) will actually make all this trivial. It'll even let you change and inspect file ownership and permissions. But we're not there yet. ::sigh::

So we can't eliminate getMounts() without changing the interface. Fortunately, it seems as though this is only used in one method, so maybe changing LFS.reportChecksumFailure() is the right answer?

Here's an alternative strategy:

  • Make o.a.h.fs.DF an interface
  • The current DF implementation is renamed ShellDF
  • Add another implementation named JavaDF which uses File for what it can.
    • Throws IOException for things like getMounts(), etc.
  • The NativeFileSystem specifically instantiates a ShellDF for the situation where it needs mount-point info (inside reportChecksumFailure)
  • All other uses of DF change to JavaDF

Doug Cutting added a comment - 22/Oct/09 09:45 PM
I'm not a big fan of interfaces, since they make it hard to evolve the API compatibly. I also don't think we should encourage most code to specify the implementation when there's a clearly preferred implementation. So, an abstract base class with a factory method that most folks use?

Aaron Kimball added a comment - 22/Oct/09 09:55 PM
I'm not an interface fan myself. But the existing DF class already uses Shell as an abstract base class. That means we'd need to put most of the current DF's logic into some other class that extends Shell; maybe a private inner class?

+1 for a factory method.


Doug Cutting added a comment - 22/Oct/09 10:02 PM
> That means we'd need to put most of the current DF's logic into some other class that extends Shell; maybe a private inner class?

Works for me. Delegation.


Aaron Kimball added a comment - 29/Oct/09 02:00 AM
Attaching patch which adds JavaDF and PosixDF classes. DF is now an abstract class; it provides a getDF() factory method that returns a JavaDF instance.

Added tests in TestDFVariations that prove that JavaDF and PosixDF return the same values.

Also attached patches to mapred and hdfs that use the new API.


Aaron Kimball added a comment - 29/Oct/09 02:01 AM
Also, you'll need to run this to commit:
svn add src/java/org/apache/hadoop/fs/JavaDF.java
svn add src/java/org/apache/hadoop/fs/PosixDF.java

Hadoop QA added a comment - 29/Oct/09 02:59 AM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12423529/HADOOP-5958.patch
against trunk revision 829289.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

-1 javac. The patch appears to cause tar ant target to fail.

-1 findbugs. The patch appears to cause Findbugs to fail.

+1 release audit. The applied patch does not increase the total number of release audit warnings.

-1 core tests. The patch failed core unit tests.

-1 contrib tests. The patch failed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/111/testReport/
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/111/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/111/console

This message is automatically generated.


Aaron Kimball added a comment - 29/Oct/09 09:08 PM
Forgot to include PosixDF and JavaDF locally. (oops.) New patch.

Hadoop QA added a comment - 29/Oct/09 09:33 PM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12423623/HADOOP-5958.2.patch
against trunk revision 831070.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of release audit warnings.

-1 core tests. The patch failed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/114/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/114/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/114/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/114/console

This message is automatically generated.


Aaron Kimball added a comment - 29/Oct/09 10:13 PM
The failing part of that test is in keeping the pct used counts in sync between Java DF and Posix DF. I just removed this aspect of the test. There are no uses of getPercentUsed() in the actual Hadoop codebase. We only care about getUsed() and getAvailable() which are tested.

In general, the pct. used calculation is pretty fuzzy, so it's hard to make it agree exactly.


Hadoop QA added a comment - 29/Oct/09 10:30 PM
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12423630/HADOOP-5958.3.patch
against trunk revision 831070.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of release audit warnings.

+1 core tests. The patch passed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/20/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/20/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/20/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/20/console

This message is automatically generated.


Todd Lipcon added a comment - 07/Nov/09 01:59 AM
  • DF.java:37 - "has most of the functionality, and has better performance" - makes sense as a jira comment, but when that's the only implementation people may be left wondering "Better than what?" Best to specifically compare with PosixDF
  • Now that there are two getDFs, one with a conf and one without, shouldn't one either be marked deprecated or private? I'd say we should leave the one that takes a Configuration and just ignore the configuration variable, unless we're certain we'll never want Configuration here again.
  • We used to have a limit on how often df would be called. That's gone with the new implementation - I dunno if the interval was due to the fork overhead or actually some overhead in the calls themselves. Are the j.io.File implementations fast enough that we don't have to worry about it, or should JavaDF do some caching?
  • For the unit test, you could use http://junit.org/apidocs/org/junit/Assert.html#assertEquals%28java.lang.String,%20double,%20double,%20double%29 though what you're doing now is fine too.

Aaron Kimball added a comment - 09/Nov/09 10:51 PM
DF.java:37 - "has most of the functionality, and has better performance" - makes sense as a jira comment, but when that's the only implementation people may be left wondering "Better than what?" Best to specifically compare with PosixDF

Agreed.

Now that there are two getDFs, one with a conf and one without, shouldn't one either be marked deprecated or private? I'd say we should leave the one that takes a Configuration and just ignore the configuration variable, unless we're certain we'll never want Configuration here again.

getDF() never existed before; I created those as a replacement for the DF() constructors, now that DF itself is abstract. I'm ok with providing the Configuration-handling version only.

We used to have a limit on how often df would be called. That's gone with the new implementation - I dunno if the interval was due to the fork overhead or actually some overhead in the calls themselves. Are the j.io.File implementations fast enough that we don't have to worry about it, or should JavaDF do some caching?

I just ran a quick benchmark of calling File.getFreeSpace() a million times; an individual call to f = new File(); f.getFreeSpace() takes on average 45 microseconds. By comparison, forking the df executable takes 2.83 milliseconds. I don't think we need to worry about caching in JavaDF.


Todd Lipcon added a comment - 09/Nov/09 10:59 PM
Cool, thanks for the benchmark. +1 from me after those other changes, then.

Aaron Kimball added a comment - 09/Nov/09 11:15 PM
new patch; addresses code review items above.

Hadoop QA added a comment - 09/Nov/09 11:37 PM
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12424412/HADOOP-5958.4.patch
against trunk revision 833553.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of release audit warnings.

+1 core tests. The patch passed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/131/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/131/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/131/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/131/console

This message is automatically generated.


Todd Lipcon added a comment - 11/Nov/09 01:58 AM
Reviewed the new patch. Looks good to me, +1

Steve Loughran added a comment - 18/Nov/09 03:54 PM
@Aaron, out of curiosity, what happens when you run that same benchmark against an NFS drive? I am worried that the caching is going to be useful for anyone serving up content from a remote FS, even through the Java APIs

Aaron Kimball added a comment - 19/Nov/09 04:38 AM
Steve,

Excellent question. I'm out of the office for the week, so I don't have access to my networked resources to run that test right now. But now that you've raised the issue, I'm concerned too. Will check on this next week.


Chris Douglas added a comment - 21/Nov/09 03:49 AM

what happens when you run that same benchmark against an NFS drive?

Every class using DF assumes- reasonably- that the resource behaves like a local drive. Configuring e.g. LocalDirAllocator or FSDataset to use a remote FS and then worrying that DF might take milliseconds instead of microseconds is focusing on the noise.

I don't understand the virtue of the current approach, as DF seems like a fixed set of functionality not meriting an abstract class, factory, etc. Is PosixDF needed for anything but getMount and getFilesystem? The latter has no callers and the former has one that creates an instance and discards all but the mount. This suggests two approaches:

  1. If nearly all uses of DF involve delegating to a java.io.File, is there any reason not to simply replace uses of DF with File? Everywhere DF is used, a local FS is assumed. As Steve points out, if this were otherwise, other designs would be preferred.
  2. If DF retained Shell as a subtype and used the java.io.File methods where appropriate, is the cost really prohibitive? Few of these are created and other than a faster implementation of most calls, nothing else changes. The costs incurred for keeping everything intact appear trivial.

Neither of these is an incompatible change, assuming java.io.File is correctly implemented. They're not even mutually exclusive.

A few nits:

  • Incompatibly moving DF::main seems unnecessary
  • The comment on DF_INTERVAL_DEFAULT should be javadoc
  • While the original didn't have it either, DF methods should have javadoc
  • In getPercentUsed, using cap in the calculation of used avoids the second call to getCapacity
  • If the current design is retained (because some architecture has a faulty java.io.File impl?), it should be possible to use PosixDF exclusively using the config passed to DF::getDF (could be named DF::get).
  • The current patch also requires changes to HDFS that must be committed with these. If retained, please open an issue and link
  • getFilesystem and getMounts should probably be deprecated, even removed from DF since one needs to explicitly instantiate PosixDF to make the call. The only reason getMounts is there is because that's the command it's scraped from, anyway.

Aaron Kimball added a comment - 27/Nov/09 11:31 PM
If java.io.File is somehow "faulty," then there are probably much, much larger problems with the given platform. This case is not worth worrying about in detail.

You're correct that we could simply stitch the java.io.File methods directly into the existing DF implementation. But then there's still the subprocess being launched via fork() and not vfork(); the intent of this issue seems to be to eliminate spurious memory overallocation spikes when a df process is exec'd, as well as to eliminate dependencies on platform-specific tools when possible.

So maybe we should do both of the things you suggest:

  1. Replace uses of DF with direct uses of File so that places that don't need to shell out to a subprocess, don't.
  2. Replace the relevant internal methods of DF with uses of File so that the result of using DF always matches the result returned by File.

This should avoid incompatible changes and get us closer to platform independence. (Currently there's no way to get the mount points within Java, so we can't just ditch DF wholesale. Maybe with Java7....)


Aaron Kimball added a comment - 28/Nov/09 02:07 AM
I actually realized that run() isn't called anywhere in Shell itself, so unless DF explicitly calls it, we'll never invoke a subprocess. So I've only had to change the DF class to eliminate subprocesses; run() is only called in getFilesystem() and getMounts() now.

Hadoop QA added a comment - 28/Nov/09 02:47 AM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12426311/HADOOP-5958.5.patch
against trunk revision 884428.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

-1 javac. The applied patch generated 173 javac compiler warnings (more than the trunk's current 172 warnings).

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of release audit warnings.

-1 core tests. The patch failed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/151/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/151/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/151/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/151/console

This message is automatically generated.


Aaron Kimball added a comment - 28/Nov/09 06:59 AM
new patch, fixes javac warning. Despite hudson reporting "-1 core tests", there wasn't a core test failure in the actual test report (bug in Hudson script?)

Hadoop QA added a comment - 28/Nov/09 07:26 AM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12426316/HADOOP-5958.6.patch
against trunk revision 884428.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of release audit warnings.

-1 core tests. The patch failed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/152/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/152/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/152/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/152/console

This message is automatically generated.