Hadoop Common
  1. Hadoop Common
  2. HADOOP-7659

fs -getmerge isn't guaranteed to work well over non-HDFS filesystems

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.20.204.0
    • Fix Version/s: 3.0.0
    • Component/s: fs
    • Labels:
      None
    • Release Note:
      Documented that the "fs -getmerge" shell command may not work properly over non HDFS-filesystem implementations due to platform-varying file list ordering.

      Description

      When you use fs -getmerge with HDFS, you are guaranteed file list sorting (part-00000, part-00001, onwards). When you use the same with other FSes we bundle, the ordering of listing is not guaranteed at all. This is cause of http://download.oracle.com/javase/6/docs/api/java/io/File.html#list() which we use internally for native file listing.

      This should either be documented as a known issue on -getmerge help pages/mans, or a consistent ordering (similar to HDFS) must be applied atop the listing. I suspect the latter only makes it worthy for what we include - while other FSes out there still have to deal with this issue. Perhaps we need a recommendation doc note added to our API?

        Activity

        Harsh J created issue -
        Harsh J made changes -
        Field Original Value New Value
        Assignee Harsh J [ qwertymaniac ]
        Hide
        Harsh J added a comment -

        Going the doc way.

        Show
        Harsh J added a comment - Going the doc way.
        Harsh J made changes -
        Attachment HADOOP-7659.patch [ 12506909 ]
        Harsh J made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12506909/HADOOP-7659.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        -1 javadoc. The javadoc tool appears to have generated 5 warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in .

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/465//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/465//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12506909/HADOOP-7659.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated 5 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/465//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/465//console This message is automatically generated.
        Hide
        Harsh J added a comment -

        Given that this is a trivial change that merely adds a helpful javadoc note, if no one has any comments on this, am going to commit it in by monday.

        Show
        Harsh J added a comment - Given that this is a trivial change that merely adds a helpful javadoc note, if no one has any comments on this, am going to commit it in by monday.
        Harsh J made changes -
        Fix Version/s 0.24.0 [ 12317652 ]
        Target Version/s 3.0.0 [ 12320357 ]
        Hide
        Harsh J added a comment -

        In hindsight, isn't a bug but rather an improvement (i.e. we can document what to expect)

        Show
        Harsh J added a comment - In hindsight, isn't a bug but rather an improvement (i.e. we can document what to expect)
        Harsh J made changes -
        Issue Type Bug [ 1 ] Improvement [ 4 ]
        Hide
        Harsh J added a comment -

        -1 javadoc. The javadoc tool appears to have generated 5 warning messages.

        Was probably something else in trunk at the time. See command log below for mvn javadoc:javadoc, which I made sure to do again now before committing:

        
        ➜  trunk  svn diff
        Index: hadoop-common-project/hadoop-common/CHANGES.txt
        ===================================================================
        --- hadoop-common-project/hadoop-common/CHANGES.txt	(revision 1342586)
        +++ hadoop-common-project/hadoop-common/CHANGES.txt	(working copy)
        @@ -76,6 +76,9 @@
             HADOOP-8415. Add getDouble() and setDouble() in
             org.apache.hadoop.conf.Configuration (Jan van der Lugt via harsh)
         
        +    HADOOP-7659. fs -getmerge isn't guaranteed to work well over non-HDFS
        +    filesystems (harsh)
        +
           BUG FIXES
         
             HADOOP-8177. MBeans shouldn't try to register when it fails to create MBeanName.
        Index: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/RawLocalFileSystem.java
        ===================================================================
        --- hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/RawLocalFileSystem.java(revision 1342586)
        +++ hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/RawLocalFileSystem.java(working copy)
        @@ -307,6 +307,12 @@
             return FileUtil.fullyDelete(f);
           }
          
        +  /**
        +   * {@inheritDoc}
        +   *
        +   * (<b>Note</b>: Returned list is not sorted in any given order,
        +   * due to reliance on Java's {@link File#list()} API.)
        +   */
           public FileStatus[] listStatus(Path f) throws IOException {
             File localf = pathToFile(f);
             FileStatus[] results;
        ➜  trunk  cd hadoop-common-project/hadoop-common 
        ➜  hadoop-common  mvn javadoc:javadoc
        [INFO] Scanning for projects...
        [INFO]                                                                         
        [INFO] ------------------------------------------------------------------------
        [INFO] Building Apache Hadoop Common 3.0.0-SNAPSHOT
        [INFO] ------------------------------------------------------------------------
        [INFO] 
        [INFO] >>> maven-javadoc-plugin:2.8.1:javadoc (default-cli) @ hadoop-common >>>
        [INFO] 
        [INFO] --- maven-antrun-plugin:1.6:run (create-testdirs) @ hadoop-common ---
        [INFO] Executing tasks
        
        main:
        [INFO] Executed tasks
        [INFO] 
        [INFO] --- build-helper-maven-plugin:1.5:add-source (add-source) @ hadoop-common ---
        [INFO] Source directory: /Users/harshchouraria/Work/code/apache/root-hadoop/trunk/hadoop-common-project/hadoop-common/target/generated-sources/java added.
        [INFO] 
        [INFO] --- build-helper-maven-plugin:1.5:add-test-source (add-test-source) @ hadoop-common ---
        [INFO] Test Source directory: /Users/harshchouraria/Work/code/apache/root-hadoop/trunk/hadoop-common-project/hadoop-common/target/generated-test-sources/java added.
        [INFO] 
        [INFO] --- maven-antrun-plugin:1.6:run (compile-proto) @ hadoop-common ---
        [INFO] Executing tasks
        
        main:
        [INFO] Executed tasks
        [INFO] 
        [INFO] --- maven-antrun-plugin:1.6:run (save-version) @ hadoop-common ---
        [INFO] Executing tasks
        
        main:
        [INFO] Executed tasks
        [INFO] 
        [INFO] --- maven-dependency-plugin:2.1:build-classpath (build-classpath) @ hadoop-common ---
        [INFO] Skipped writing classpath file '/Users/harshchouraria/Work/code/apache/root-hadoop/trunk/hadoop-common-project/hadoop-common/target/classes/mrapp-generated-classpath'.  No changes found.
        [INFO] 
        [INFO] <<< maven-javadoc-plugin:2.8.1:javadoc (default-cli) @ hadoop-common <<<
        [INFO] 
        [INFO] --- maven-javadoc-plugin:2.8.1:javadoc (default-cli) @ hadoop-common ---
        [INFO] 
        ExcludePrivateAnnotationsStandardDoclet
        [INFO] ------------------------------------------------------------------------
        [INFO] BUILD SUCCESS
        [INFO] ------------------------------------------------------------------------
        [INFO] Total time: 23.042s
        [INFO] Finished at: Fri May 25 18:14:53 GMT+05:30 2012
        [INFO] Final Memory: 11M/81M
        [INFO] ------------------------------------------------------------------------
        
        Show
        Harsh J added a comment - -1 javadoc. The javadoc tool appears to have generated 5 warning messages. Was probably something else in trunk at the time. See command log below for mvn javadoc:javadoc , which I made sure to do again now before committing: ➜ trunk svn diff Index: hadoop-common-project/hadoop-common/CHANGES.txt =================================================================== --- hadoop-common-project/hadoop-common/CHANGES.txt (revision 1342586) +++ hadoop-common-project/hadoop-common/CHANGES.txt (working copy) @@ -76,6 +76,9 @@ HADOOP-8415. Add getDouble() and setDouble() in org.apache.hadoop.conf.Configuration (Jan van der Lugt via harsh) + HADOOP-7659. fs -getmerge isn't guaranteed to work well over non-HDFS + filesystems (harsh) + BUG FIXES HADOOP-8177. MBeans shouldn't try to register when it fails to create MBeanName. Index: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/RawLocalFileSystem.java =================================================================== --- hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/RawLocalFileSystem.java(revision 1342586) +++ hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/RawLocalFileSystem.java(working copy) @@ -307,6 +307,12 @@ return FileUtil.fullyDelete(f); } + /** + * {@inheritDoc} + * + * (<b>Note</b>: Returned list is not sorted in any given order, + * due to reliance on Java's {@link File#list()} API.) + */ public FileStatus[] listStatus(Path f) throws IOException { File localf = pathToFile(f); FileStatus[] results; ➜ trunk cd hadoop-common-project/hadoop-common ➜ hadoop-common mvn javadoc:javadoc [INFO] Scanning for projects... [INFO] [INFO] ------------------------------------------------------------------------ [INFO] Building Apache Hadoop Common 3.0.0-SNAPSHOT [INFO] ------------------------------------------------------------------------ [INFO] [INFO] >>> maven-javadoc-plugin:2.8.1:javadoc ( default -cli) @ hadoop-common >>> [INFO] [INFO] --- maven-antrun-plugin:1.6:run (create-testdirs) @ hadoop-common --- [INFO] Executing tasks main: [INFO] Executed tasks [INFO] [INFO] --- build-helper-maven-plugin:1.5:add-source (add-source) @ hadoop-common --- [INFO] Source directory: /Users/harshchouraria/Work/code/apache/root-hadoop/trunk/hadoop-common-project/hadoop-common/target/generated-sources/java added. [INFO] [INFO] --- build-helper-maven-plugin:1.5:add-test-source (add-test-source) @ hadoop-common --- [INFO] Test Source directory: /Users/harshchouraria/Work/code/apache/root-hadoop/trunk/hadoop-common-project/hadoop-common/target/generated-test-sources/java added. [INFO] [INFO] --- maven-antrun-plugin:1.6:run (compile-proto) @ hadoop-common --- [INFO] Executing tasks main: [INFO] Executed tasks [INFO] [INFO] --- maven-antrun-plugin:1.6:run (save-version) @ hadoop-common --- [INFO] Executing tasks main: [INFO] Executed tasks [INFO] [INFO] --- maven-dependency-plugin:2.1:build-classpath (build-classpath) @ hadoop-common --- [INFO] Skipped writing classpath file '/Users/harshchouraria/Work/code/apache/root-hadoop/trunk/hadoop-common-project/hadoop-common/target/classes/mrapp-generated-classpath'. No changes found. [INFO] [INFO] <<< maven-javadoc-plugin:2.8.1:javadoc ( default -cli) @ hadoop-common <<< [INFO] [INFO] --- maven-javadoc-plugin:2.8.1:javadoc ( default -cli) @ hadoop-common --- [INFO] ExcludePrivateAnnotationsStandardDoclet [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 23.042s [INFO] Finished at: Fri May 25 18:14:53 GMT+05:30 2012 [INFO] Final Memory: 11M/81M [INFO] ------------------------------------------------------------------------
        Hide
        Harsh J added a comment -

        Committed revision 1342600 to trunk.

        Show
        Harsh J added a comment - Committed revision 1342600 to trunk.
        Harsh J made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Release Note Documented that the "fs -getmerge" shell command may not work properly over non HDFS-filesystem implementations due to platform-varying file list ordering.
        Target Version/s 3.0.0 [ 12320357 ]
        Fix Version/s 3.0.0 [ 12320357 ]
        Resolution Fixed [ 1 ]
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #1090 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1090/)
        HADOOP-7659. fs -getmerge isn't guaranteed to work well over non-HDFS filesystems (harsh) (Revision 1342600)

        Result = ABORTED
        harsh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1342600
        Files :

        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/RawLocalFileSystem.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1090 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1090/ ) HADOOP-7659 . fs -getmerge isn't guaranteed to work well over non-HDFS filesystems (harsh) (Revision 1342600) Result = ABORTED harsh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1342600 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/RawLocalFileSystem.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk-Commit #2361 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2361/)
        HADOOP-7659. fs -getmerge isn't guaranteed to work well over non-HDFS filesystems (harsh) (Revision 1342600)

        Result = SUCCESS
        harsh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1342600
        Files :

        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/RawLocalFileSystem.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #2361 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2361/ ) HADOOP-7659 . fs -getmerge isn't guaranteed to work well over non-HDFS filesystems (harsh) (Revision 1342600) Result = SUCCESS harsh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1342600 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/RawLocalFileSystem.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Common-trunk-Commit #2288 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2288/)
        HADOOP-7659. fs -getmerge isn't guaranteed to work well over non-HDFS filesystems (harsh) (Revision 1342600)

        Result = SUCCESS
        harsh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1342600
        Files :

        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/RawLocalFileSystem.java
        Show
        Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #2288 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2288/ ) HADOOP-7659 . fs -getmerge isn't guaranteed to work well over non-HDFS filesystems (harsh) (Revision 1342600) Result = SUCCESS harsh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1342600 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/RawLocalFileSystem.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk-Commit #2307 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2307/)
        HADOOP-7659. fs -getmerge isn't guaranteed to work well over non-HDFS filesystems (harsh) (Revision 1342600)

        Result = FAILURE
        harsh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1342600
        Files :

        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/RawLocalFileSystem.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #2307 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2307/ ) HADOOP-7659 . fs -getmerge isn't guaranteed to work well over non-HDFS filesystems (harsh) (Revision 1342600) Result = FAILURE harsh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1342600 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/RawLocalFileSystem.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk #1057 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1057/)
        HADOOP-7659. fs -getmerge isn't guaranteed to work well over non-HDFS filesystems (harsh) (Revision 1342600)

        Result = SUCCESS
        harsh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1342600
        Files :

        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/RawLocalFileSystem.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1057 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1057/ ) HADOOP-7659 . fs -getmerge isn't guaranteed to work well over non-HDFS filesystems (harsh) (Revision 1342600) Result = SUCCESS harsh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1342600 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/RawLocalFileSystem.java
        Hide
        Suresh Srinivas added a comment -

        Harsh, given that this is a trivial change, can you please merge this to branch-2 and branch-2.1.0-beta. That way the delta between trunk and 2.1.0-beta is small.

        Show
        Suresh Srinivas added a comment - Harsh, given that this is a trivial change, can you please merge this to branch-2 and branch-2.1.0-beta. That way the delta between trunk and 2.1.0-beta is small.
        Hide
        Konstantin Boudnik added a comment -

        Harsh, do you need a help on this? I can backport the patch into branch-2 unless you're planning to do it.

        Show
        Konstantin Boudnik added a comment - Harsh, do you need a help on this? I can backport the patch into branch-2 unless you're planning to do it.
        Hide
        Harsh J added a comment -

        Sorry guys, missed this note. Please do go ahead Konstantin, thanks for helping on this!

        Show
        Harsh J added a comment - Sorry guys, missed this note. Please do go ahead Konstantin, thanks for helping on this!

          People

          • Assignee:
            Harsh J
            Reporter:
            Harsh J
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development