Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1806

CombineFileInputFormat does not work with paths not on default FS

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.22.0, 0.23.1
    • Fix Version/s: 1.2.0, 2.0.3-alpha, 0.23.5
    • Component/s: harchive
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      In generating the splits in CombineFileInputFormat, the scheme and authority are stripped out. This creates problems when trying to access the files while generating the splits, as without the har:/, the file won't be accessed through the HarFileSystem.

      1. MAPREDUCE-1806.rev4.patch
        4 kB
        Sandy Ryza
      2. MAPREDUCE-1806-branch-1.rev1.patch
        3 kB
        Sandy Ryza
      3. MAPREDUCE-1806-branch-1.patch
        3 kB
        Sandy Ryza
      4. MAPREDUCE-1806.rev3.patch
        3 kB
        Gera Shegalov
      5. MAPREDUCE-1806.rev2.patch
        3 kB
        Gera Shegalov
      6. MAPREDUCE-1806.patch
        1.0 kB
        Gera Shegalov

        Issue Links

          Activity

          Hide
          Todd Lipcon added a comment -

          Updated summary to reflect that this isn't specific to HAR. CombineFileInputFormat won't work on any FS except the default FS – eg you can't use it against a remote cluster either.

          Show
          Todd Lipcon added a comment - Updated summary to reflect that this isn't specific to HAR. CombineFileInputFormat won't work on any FS except the default FS – eg you can't use it against a remote cluster either.
          Hide
          Gera Shegalov added a comment -

          Please review this patch that preserves the original file system scheme.

          Show
          Gera Shegalov added a comment - Please review this patch that preserves the original file system scheme.
          Hide
          Tom White added a comment -

          Can you just do fs.makeQualified(paths[i]) rather than reconstructing the Path? Also, we should have a unit test for this.

          Show
          Tom White added a comment - Can you just do fs.makeQualified(paths [i] ) rather than reconstructing the Path? Also, we should have a unit test for this.
          Hide
          Gera Shegalov added a comment -

          Thanks for suggestions, Tom! Revised patch with the unit test is attached.

          Show
          Gera Shegalov added a comment - Thanks for suggestions, Tom! Revised patch with the unit test is attached.
          Hide
          Gera Shegalov added a comment -

          one more JUnit assert to make sure there is a split

          Show
          Gera Shegalov added a comment - one more JUnit assert to make sure there is a split
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12546603/MAPREDUCE-1806.rev3.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient:

          org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2880//testReport/
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2880//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12546603/MAPREDUCE-1806.rev3.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient: org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2880//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2880//console This message is automatically generated.
          Hide
          Ivan Mitic added a comment -

          Gera, are you working on this patch? It seems that one of the existing unit tests should also be updated.

          BTW, would it be ok to backport this to branch-1 as well?

          Show
          Ivan Mitic added a comment - Gera, are you working on this patch? It seems that one of the existing unit tests should also be updated. BTW, would it be ok to backport this to branch-1 as well?
          Hide
          Gera Shegalov added a comment -

          Hi Ivan, thanks for looking at the patch. I was not working on it in the meantime. Backport should be straightforward and I'll check out the failing test.

          Show
          Gera Shegalov added a comment - Hi Ivan, thanks for looking at the patch. I was not working on it in the meantime. Backport should be straightforward and I'll check out the failing test.
          Hide
          Sandy Ryza added a comment -

          It looks like this is already in branch-1 via HADOOP-7539?

          Show
          Sandy Ryza added a comment - It looks like this is already in branch-1 via HADOOP-7539 ?
          Hide
          Sandy Ryza added a comment -

          Nevermind I lied

          Show
          Sandy Ryza added a comment - Nevermind I lied
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12550366/MAPREDUCE-1806-branch-1.patch
          against trunk revision .

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2959//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12550366/MAPREDUCE-1806-branch-1.patch against trunk revision . -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2959//console This message is automatically generated.
          Hide
          Sandy Ryza added a comment -

          Reuploading trunk version to force test-patch to pick it up

          Show
          Sandy Ryza added a comment - Reuploading trunk version to force test-patch to pick it up
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12551257/MAPREDUCE-1806.rev4.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient:

          org.apache.hadoop.mapred.TestClusterMRNotification

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2971//testReport/
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2971//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12551257/MAPREDUCE-1806.rev4.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient: org.apache.hadoop.mapred.TestClusterMRNotification +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2971//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2971//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12551273/MAPREDUCE-1806.rev4.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient:

          org.apache.hadoop.mapred.TestClusterMRNotification

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2972//testReport/
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2972//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12551273/MAPREDUCE-1806.rev4.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient: org.apache.hadoop.mapred.TestClusterMRNotification +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2972//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2972//console This message is automatically generated.
          Hide
          Alejandro Abdelnur added a comment -

          +1. test failure seems unrelated.

          Show
          Alejandro Abdelnur added a comment - +1. test failure seems unrelated.
          Hide
          Alejandro Abdelnur added a comment -

          Thanks Gera and Sandy. Committed to trunk, branch-1 and branch-2.

          Show
          Alejandro Abdelnur added a comment - Thanks Gera and Sandy. Committed to trunk, branch-1 and branch-2.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-trunk-Commit #2941 (See https://builds.apache.org/job/Hadoop-trunk-Commit/2941/)
          MAPREDUCE-1806. CombineFileInputFormat does not work with paths not on default FS. (Gera Shegalov via tucu) (Revision 1403614)

          Result = SUCCESS
          tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1403614
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/CombineFileInputFormat.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineFileInputFormat.java
          Show
          Hudson added a comment - Integrated in Hadoop-trunk-Commit #2941 (See https://builds.apache.org/job/Hadoop-trunk-Commit/2941/ ) MAPREDUCE-1806 . CombineFileInputFormat does not work with paths not on default FS. (Gera Shegalov via tucu) (Revision 1403614) Result = SUCCESS tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1403614 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/CombineFileInputFormat.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineFileInputFormat.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Yarn-trunk #21 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/21/)
          MAPREDUCE-1806. CombineFileInputFormat does not work with paths not on default FS. (Gera Shegalov via tucu) (Revision 1403614)

          Result = SUCCESS
          tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1403614
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/CombineFileInputFormat.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineFileInputFormat.java
          Show
          Hudson added a comment - Integrated in Hadoop-Yarn-trunk #21 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/21/ ) MAPREDUCE-1806 . CombineFileInputFormat does not work with paths not on default FS. (Gera Shegalov via tucu) (Revision 1403614) Result = SUCCESS tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1403614 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/CombineFileInputFormat.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineFileInputFormat.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #1211 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1211/)
          MAPREDUCE-1806. CombineFileInputFormat does not work with paths not on default FS. (Gera Shegalov via tucu) (Revision 1403614)

          Result = SUCCESS
          tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1403614
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/CombineFileInputFormat.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineFileInputFormat.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1211 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1211/ ) MAPREDUCE-1806 . CombineFileInputFormat does not work with paths not on default FS. (Gera Shegalov via tucu) (Revision 1403614) Result = SUCCESS tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1403614 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/CombineFileInputFormat.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineFileInputFormat.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #1241 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1241/)
          MAPREDUCE-1806. CombineFileInputFormat does not work with paths not on default FS. (Gera Shegalov via tucu) (Revision 1403614)

          Result = FAILURE
          tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1403614
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/CombineFileInputFormat.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineFileInputFormat.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1241 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1241/ ) MAPREDUCE-1806 . CombineFileInputFormat does not work with paths not on default FS. (Gera Shegalov via tucu) (Revision 1403614) Result = FAILURE tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1403614 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/CombineFileInputFormat.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineFileInputFormat.java
          Hide
          Thomas Graves added a comment -

          I pulled this into branch-0.23

          Show
          Thomas Graves added a comment - I pulled this into branch-0.23
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-0.23-Build #421 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/421/)
          MAPREDUCE-1806. CombineFileInputFormat does not work with paths not on default FS (Gera Shegalov via tgraves) (Revision 1403761)

          Result = SUCCESS
          tgraves : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1403761
          Files :

          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/CombineFileInputFormat.java
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineFileInputFormat.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-0.23-Build #421 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/421/ ) MAPREDUCE-1806 . CombineFileInputFormat does not work with paths not on default FS (Gera Shegalov via tgraves) (Revision 1403761) Result = SUCCESS tgraves : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1403761 Files : /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/CombineFileInputFormat.java /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineFileInputFormat.java

            People

            • Assignee:
              Gera Shegalov
              Reporter:
              Paul Yang
            • Votes:
              0 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development