Hadoop Common
  1. Hadoop Common
  2. HADOOP-2567

add FileSystem#getHomeDirectory() method

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.16.0
    • Component/s: fs
    • Labels:
      None

      Description

      The FileSystem API would benefit from a getHomeDirectory() method.

      The default implementation would return "/user/$USER/".

      RawLocalFileSystem would return System.getProperty("user.home").

      HADOOP-2514 can use this to implement per-user trash.

      1. HADOOP-2567-tests.patch
        1 kB
        Doug Cutting
      2. HADOOP-2567-sortvalidate.patch
        0.8 kB
        Doug Cutting
      3. 2567-3.patch
        5 kB
        Chris Douglas
      4. HADOOP-2567-2.patch
        4 kB
        Doug Cutting
      5. HADOOP-2567-1.patch
        4 kB
        Doug Cutting
      6. HADOOP-2567.patch
        3 kB
        Doug Cutting

        Issue Links

          Activity

          Hide
          Hudson added a comment -
          Show
          Hudson added a comment - Integrated in Hadoop-trunk #379 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/379/ )
          Hide
          Doug Cutting added a comment -

          I have committed the tests.

          Show
          Doug Cutting added a comment - I have committed the tests.
          Hide
          Doug Cutting added a comment -

          Yes, I think the tests are all that remain for this. I'll commit them.

          Show
          Doug Cutting added a comment - Yes, I think the tests are all that remain for this. I'll commit them.
          Hide
          Chris Douglas added a comment -

          +1 for the unit tests; is that all that remains for this issue (since the sortvalidate patch was moved to HADOOP-2646)?

          Show
          Chris Douglas added a comment - +1 for the unit tests; is that all that remains for this issue (since the sortvalidate patch was moved to HADOOP-2646 )?
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12373442/HADOOP-2567-tests.patch
          against trunk revision r613069.

          @author +1. The patch does not contain any @author tags.

          javadoc +1. The javadoc tool did not generate any warning messages.

          javac +1. The applied patch does not generate any new compiler warnings.

          findbugs +1. The patch does not introduce any new Findbugs warnings.

          core tests +1. The patch passed core unit tests.

          contrib tests -1. The patch failed contrib unit tests.

          Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1635/testReport/
          Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1635/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1635/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1635/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12373442/HADOOP-2567-tests.patch against trunk revision r613069. @author +1. The patch does not contain any @author tags. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new compiler warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests -1. The patch failed contrib unit tests. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1635/testReport/ Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1635/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1635/artifact/trunk/build/test/checkstyle-errors.html Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1635/console This message is automatically generated.
          Hide
          Doug Cutting added a comment -

          HADOOP-2646 has been added to address the SortValidator issue.

          Show
          Doug Cutting added a comment - HADOOP-2646 has been added to address the SortValidator issue.
          Hide
          Doug Cutting added a comment -

          Adding tests of new getHomeDirectory() method.

          Show
          Doug Cutting added a comment - Adding tests of new getHomeDirectory() method.
          Hide
          Doug Cutting added a comment -

          I am unable to reproduce this failure. The single-machine instructions you gave above generates four input files and one output file. I modified the sort command line so that four output files are used, since the code in question involves determining whether a given input to the validator is a sort input or output, but that still validated correctly.

          Perhaps Arun, who originally wrote the validator, could have a look at this?

          Show
          Doug Cutting added a comment - I am unable to reproduce this failure. The single-machine instructions you gave above generates four input files and one output file. I modified the sort command line so that four output files are used, since the code in question involves determining whether a given input to the validator is a sort input or output, but that still validated correctly. Perhaps Arun, who originally wrote the validator, could have a look at this?
          Hide
          Amar Kamat added a comment -

          Yes. I restarted the cluster. I ran trunk + sort-validation-patch. Yes the errors were same. But this patch solves the problem on single machine.

          Show
          Amar Kamat added a comment - Yes. I restarted the cluster. I ran trunk + sort-validation-patch. Yes the errors were same. But this patch solves the problem on single machine.
          Hide
          Doug Cutting added a comment -

          > Seems that the tests are still failing. Earlier I tried on a single machine and it worked.

          Did you restart the cluster running the patched code? That may be required.

          Did it fail with the same error?

          Show
          Doug Cutting added a comment - > Seems that the tests are still failing. Earlier I tried on a single machine and it worked. Did you restart the cluster running the patched code? That may be required. Did it fail with the same error?
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12373214/HADOOP-2567-sortvalidate.patch
          against trunk revision r612500.

          @author +1. The patch does not contain any @author tags.

          javadoc +1. The javadoc tool did not generate any warning messages.

          javac +1. The applied patch does not generate any new compiler warnings.

          findbugs +1. The patch does not introduce any new Findbugs warnings.

          core tests +1. The patch passed core unit tests.

          contrib tests +1. The patch passed contrib unit tests.

          Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1610/testReport/
          Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1610/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1610/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1610/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12373214/HADOOP-2567-sortvalidate.patch against trunk revision r612500. @author +1. The patch does not contain any @author tags. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new compiler warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1610/testReport/ Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1610/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1610/artifact/trunk/build/test/checkstyle-errors.html Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1610/console This message is automatically generated.
          Hide
          Amar Kamat added a comment -

          I tried testing this patch on a large cluster. Seems that the tests are still failing. Earlier I tried on a single machine and it worked.

          Show
          Amar Kamat added a comment - I tried testing this patch on a large cluster. Seems that the tests are still failing. Earlier I tried on a single machine and it worked.
          Hide
          Amar Kamat added a comment -

          Doug, sort-validation works fine now. +1.

          Show
          Amar Kamat added a comment - Doug, sort-validation works fine now. +1.
          Hide
          Doug Cutting added a comment -

          The attached patch fixes the sort validator.

          Amar, can you please confirm that this fixes things for you? Thanks!

          Show
          Doug Cutting added a comment - The attached patch fixes the sort validator. Amar, can you please confirm that this fixes things for you? Thanks!
          Hide
          Amar Kamat added a comment -

          The log messages are as follows

          bash$ bin/hadoop jar build/hadoop-*-test.jar testmapredsort -sortInput input-randomwrite-test -sortOutput output-randomwrite-test
          
          SortValidator.RecordStatsChecker: Validate sort from hdfs://localhost:9000/user/user/input-randomwrite-test (4 files), hdfs://localhost:9000/user/user/output-randomwrite-test (1 files) into hdfs://localhost:9000/tmp/sortvalidate/recordstatschecker with 1 reducer.
          Job started: Tue Jan 15 10:27:00 IST 2008
          08/01/15 10:27:01 INFO mapred.FileInputFormat: Total input paths to process : 5
          08/01/15 10:27:01 INFO mapred.JobClient: Running job: job_200801151024_0002
          08/01/15 10:27:02 INFO mapred.JobClient:  map 0% reduce 0%
          08/01/15 10:27:06 INFO mapred.JobClient:  map 20% reduce 0%
          08/01/15 10:27:11 INFO mapred.JobClient: Task Id : task_200801151024_0002_m_0000 01_0, Status : FAILED
          java.io.IOException: Partitions do not match for record# 0 ! - '2' v/s '0'
                  at org.apache.hadoop.mapred.SortValidator$RecordStatsChecker$Map.map(Sor tValidator.java:267)
                  at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
                  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:208)
                  at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2049 )
          
          task_200801151024_0002_m_000001_0: rawClass: class org.apache.hadoop.io.BytesWri table
          task_200801151024_0002_m_000001_0: Returning class org.apache.hadoop.mapred.Sort Validator$RecordStatsChecker$RawBytesWritable
          task_200801151024_0002_m_000001_0: rawClass: class org.apache.hadoop.io.BytesWri table
          task_200801151024_0002_m_000001_0: Returning class org.apache.hadoop.mapred.Sort Validator$RecordStatsChecker$RawBytesWritable
          08/01/15 10:27:12 INFO mapred.JobClient:  map 40% reduce 0%
          
          Show
          Amar Kamat added a comment - The log messages are as follows bash$ bin/hadoop jar build/hadoop-*-test.jar testmapredsort -sortInput input-randomwrite-test -sortOutput output-randomwrite-test SortValidator.RecordStatsChecker: Validate sort from hdfs: //localhost:9000/user/user/input-randomwrite-test (4 files), hdfs://localhost:9000/user/user/output-randomwrite-test (1 files) into hdfs://localhost:9000/tmp/sortvalidate/recordstatschecker with 1 reducer. Job started: Tue Jan 15 10:27:00 IST 2008 08/01/15 10:27:01 INFO mapred.FileInputFormat: Total input paths to process : 5 08/01/15 10:27:01 INFO mapred.JobClient: Running job: job_200801151024_0002 08/01/15 10:27:02 INFO mapred.JobClient: map 0% reduce 0% 08/01/15 10:27:06 INFO mapred.JobClient: map 20% reduce 0% 08/01/15 10:27:11 INFO mapred.JobClient: Task Id : task_200801151024_0002_m_0000 01_0, Status : FAILED java.io.IOException: Partitions do not match for record# 0 ! - '2' v/s '0' at org.apache.hadoop.mapred.SortValidator$RecordStatsChecker$Map.map(Sor tValidator.java:267) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:208) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2049 ) task_200801151024_0002_m_000001_0: rawClass: class org.apache.hadoop.io.BytesWri table task_200801151024_0002_m_000001_0: Returning class org.apache.hadoop.mapred.Sort Validator$RecordStatsChecker$RawBytesWritable task_200801151024_0002_m_000001_0: rawClass: class org.apache.hadoop.io.BytesWri table task_200801151024_0002_m_000001_0: Returning class org.apache.hadoop.mapred.Sort Validator$RecordStatsChecker$RawBytesWritable 08/01/15 10:27:12 INFO mapred.JobClient: map 40% reduce 0%
          Hide
          Amar Kamat added a comment -

          +1 for sort benchmark in unit tests.
          To reproduce the issue on
          single machine : With a distributed setup, generate 24MB of total data with 5MB per map using random writer, run sort with 40 maps and default reducers, run validation, i.e

          bash$ bin/hadoop jar build/hadoop-0.16.0-dev-examples.jar randomwriter 
          -Dtest.randomwrite.total_bytes=24000000 -Dtest.randomwrite.bytes_per_map=5000000 
          -Dtest.min_key=100 -Dtest.max_key=100 -Dtest.min_value=0 -Dtest.max_value=0 
          input-randomwrite-test
          bash$ bin/hadoop jar build/hadoop-0.16.0-dev-examples.jar sort  -m 40 
          input-randomwrite-test output-randomwrite-test
          bash$ bin/hadoop jar build/hadoop-0.16.0-dev-test.jar testmapredsort -sortInput 
          input-randomwrite-test -sortOutput output-randomwrite-test
          

          cluster of machines : Just run a sort benchmark i.e randomwriter (default params) and sort (default params), sort-validator.

          Show
          Amar Kamat added a comment - +1 for sort benchmark in unit tests. To reproduce the issue on single machine : With a distributed setup, generate 24MB of total data with 5MB per map using random writer, run sort with 40 maps and default reducers, run validation, i.e bash$ bin/hadoop jar build/hadoop-0.16.0-dev-examples.jar randomwriter -Dtest.randomwrite.total_bytes=24000000 -Dtest.randomwrite.bytes_per_map=5000000 -Dtest.min_key=100 -Dtest.max_key=100 -Dtest.min_value=0 -Dtest.max_value=0 input-randomwrite-test bash$ bin/hadoop jar build/hadoop-0.16.0-dev-examples.jar sort -m 40 input-randomwrite-test output-randomwrite-test bash$ bin/hadoop jar build/hadoop-0.16.0-dev-test.jar testmapredsort -sortInput input-randomwrite-test -sortOutput output-randomwrite-test cluster of machines : Just run a sort benchmark i.e randomwriter (default params) and sort (default params), sort-validator.
          Hide
          Doug Cutting added a comment -

          > Currently the trunk does not pass the sort validation tests.

          Can you please attach details, like a log or stack trace? Or at least instructions on how to reproduce this. Thanks!

          Also, it might be good to run a scaled-down version of the sort benchmark & validation during unit testing, so that we exercise those codepaths and find things like this sooner.

          Show
          Doug Cutting added a comment - > Currently the trunk does not pass the sort validation tests. Can you please attach details, like a log or stack trace? Or at least instructions on how to reproduce this. Thanks! Also, it might be good to run a scaled-down version of the sort benchmark & validation during unit testing, so that we exercise those codepaths and find things like this sooner.
          Hide
          Amar Kamat added a comment -

          HADOOP-2567 breaks sort validation. Currently the trunk does not pass the sort validation tests. But with HADOOP-2567's patch removed, the trunk works fine. So reopening HADOOP-2567 until the issue gets solved.

          Show
          Amar Kamat added a comment - HADOOP-2567 breaks sort validation. Currently the trunk does not pass the sort validation tests. But with HADOOP-2567 's patch removed, the trunk works fine. So reopening HADOOP-2567 until the issue gets solved.
          Hide
          Hudson added a comment -
          Show
          Hudson added a comment - Integrated in Hadoop-Nightly #363 (See http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/363/ )
          Hide
          Doug Cutting added a comment -

          I just committed this.

          Show
          Doug Cutting added a comment - I just committed this.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12372855/2567-3.patch
          against trunk revision r610921.

          @author +1. The patch does not contain any @author tags.

          javadoc +1. The javadoc tool did not generate any warning messages.

          javac +1. The applied patch does not generate any new compiler warnings.

          findbugs +1. The patch does not introduce any new Findbugs warnings.

          core tests +1. The patch passed core unit tests.

          contrib tests +1. The patch passed contrib unit tests.

          Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1539/testReport/
          Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1539/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1539/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1539/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12372855/2567-3.patch against trunk revision r610921. @author +1. The patch does not contain any @author tags. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new compiler warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1539/testReport/ Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1539/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1539/artifact/trunk/build/test/checkstyle-errors.html Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1539/console This message is automatically generated.
          Hide
          Chris Douglas added a comment -

          OK, then.

          +1 on Doug's latest patch

          Show
          Chris Douglas added a comment - OK, then. +1 on Doug's latest patch
          Hide
          Doug Cutting added a comment -

          > Would it be make sense to use UserGroupInformation to determine the home dir?

          Yes, someday. Long-term, username's should be filesystem-specific. But we don't yet have an API to get the username for a particular filesystem. Once that's added, it should be returned as a UserGroupInformation and used to determine the home directory, but until then, I think this is not worth adding.

          Note that this patch does not change how the home directory in HDFS is computed, it only adds a method to expose the home directory already implicit in HDFS. Changing how we compute it should perhaps be the subject of another issue.

          Show
          Doug Cutting added a comment - > Would it be make sense to use UserGroupInformation to determine the home dir? Yes, someday. Long-term, username's should be filesystem-specific. But we don't yet have an API to get the username for a particular filesystem. Once that's added, it should be returned as a UserGroupInformation and used to determine the home directory, but until then, I think this is not worth adding. Note that this patch does not change how the home directory in HDFS is computed, it only adds a method to expose the home directory already implicit in HDFS. Changing how we compute it should perhaps be the subject of another issue.
          Hide
          Chris Douglas added a comment - - edited

          Would it be make sense to use UserGroupInformation to determine the home dir? Something like:

          (DistributedFileSystem)
          public void initialize(URI uri, Configuration conf) throws IOException {
            ...
            try {
              this.workingDir = getHomeDirectory(UserGroupInformation.login(conf));
            } catch (LoginException e) {
              throw (IOException)new IOException("Could not set working dir").initCause(e);
              // this.workingDir = getHomeDirectory(); // ?
            }
          }
          
          (FileSystem)
          Path getHomeDirectory(UserGroupInformation ugi) {
            return new Path("/user/" + ugi.getUserName()).makeQualified(this);
          }
          

          A failed login could also set the default as it is now, i.e. using System.getProperty. I'm not sure of the best option in that case.

          The best reason for this: agents can just use the credentials/conf from the user to resolve relative paths as in: setWorkingDirectory(getHomeDirectory(ticket)).

          This would probably only apply to DistributedFileSystem. The attached patch retains FileSystem::getHomeDirectory(), but a null ugi and/or overrides that ignore it would probably be at least as clean.

          [ edit - formatting ]

          Show
          Chris Douglas added a comment - - edited Would it be make sense to use UserGroupInformation to determine the home dir? Something like: (DistributedFileSystem) public void initialize(URI uri, Configuration conf) throws IOException { ... try { this .workingDir = getHomeDirectory(UserGroupInformation.login(conf)); } catch (LoginException e) { throw (IOException) new IOException( "Could not set working dir" ).initCause(e); // this .workingDir = getHomeDirectory(); // ? } } (FileSystem) Path getHomeDirectory(UserGroupInformation ugi) { return new Path( "/user/" + ugi.getUserName()).makeQualified( this ); } A failed login could also set the default as it is now, i.e. using System.getProperty. I'm not sure of the best option in that case. The best reason for this: agents can just use the credentials/conf from the user to resolve relative paths as in: setWorkingDirectory(getHomeDirectory(ticket)). This would probably only apply to DistributedFileSystem. The attached patch retains FileSystem::getHomeDirectory(), but a null ugi and/or overrides that ignore it would probably be at least as clean. [ edit - formatting ]
          Hide
          Doug Cutting added a comment -

          Fix another place that assumed working directory wasn't fully qualified.

          Show
          Doug Cutting added a comment - Fix another place that assumed working directory wasn't fully qualified.
          Hide
          Doug Cutting added a comment -

          Fix a test case that assumed getWorkingDir() was not fully qualified.

          Note that because of this change (working dirs are now fully qualified) this change should probably be included in the "incompatible" section.

          Show
          Doug Cutting added a comment - Fix a test case that assumed getWorkingDir() was not fully qualified. Note that because of this change (working dirs are now fully qualified) this change should probably be included in the "incompatible" section.
          Hide
          Doug Cutting added a comment -

          Patch that implements this. Also makes both home and working dirs fully qualified.

          Show
          Doug Cutting added a comment - Patch that implements this. Also makes both home and working dirs fully qualified.

            People

            • Assignee:
              Doug Cutting
              Reporter:
              Doug Cutting
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development