Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-3952

In MR2, when Total input paths to process == 1, CombinefileInputFormat.getSplits() returns 0 split.

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.24.0, 0.23.3
    • Fix Version/s: 2.0.0-alpha, 0.23.7
    • Component/s: mrv2
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Hive get unexpected result when using MR2(When using MR1, always get expected result).

      In MR2, when Total input paths to process == 1, CombinefileInputFormat.getSplits() returns 0 split.

      The calling code in Hive, in Hadoop23Shims.java:

      InputSplit[] splits = super.getSplits(job, numSplits);

      this get splits.length == 0.

      In MR1, everything goes fine, the calling code in Hive, in Hadoop20Shims.java:

      CombineFileSplit[] splits = (CombineFileSplit[]) super.getSplits(job, numSplits);

      this get splits.length == 1.

      1. MAPREDUCE-3952-1.patch
        3 kB
        Bhallamudi Venkata Siva Kamesh
      2. MAPREDUCE-3952.patch
        3 kB
        Bhallamudi Venkata Siva Kamesh

        Activity

        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-0.23-Build #513 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/513/)
        MAPREDUCE-3952. In MR2, when Total input paths to process == 1,CombinefileInputFormat.getSplits() returns 0 split. (Bhallamudi Venkata Siva Kamesh via tgraves) (Revision 1441492)

        Result = SUCCESS
        tgraves : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1441492
        Files :

        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/CombineFileInputFormat.java
        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineFileInputFormat.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-0.23-Build #513 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/513/ ) MAPREDUCE-3952 . In MR2, when Total input paths to process == 1,CombinefileInputFormat.getSplits() returns 0 split. (Bhallamudi Venkata Siva Kamesh via tgraves) (Revision 1441492) Result = SUCCESS tgraves : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1441492 Files : /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/CombineFileInputFormat.java /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineFileInputFormat.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #1011 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1011/)
        MAPREDUCE-3952. In MR2, when Total input paths to process == 1, CombinefileInputFormat.getSplits() returns 0 split. (zhenxiao via tucu) (Revision 1297293)

        Result = FAILURE
        tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1297293
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/CombineFileInputFormat.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineFileInputFormat.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1011 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1011/ ) MAPREDUCE-3952 . In MR2, when Total input paths to process == 1, CombinefileInputFormat.getSplits() returns 0 split. (zhenxiao via tucu) (Revision 1297293) Result = FAILURE tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1297293 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/CombineFileInputFormat.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineFileInputFormat.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-0.23-Build #217 (See https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Build/217/)
        Merge -r 1297292:1297293 from trunk to branch. FIXES: MAPREDUCE-3952 (Revision 1297299)

        Result = FAILURE
        tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1297299
        Files :

        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/CombineFileInputFormat.java
        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineFileInputFormat.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-0.23-Build #217 (See https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Build/217/ ) Merge -r 1297292:1297293 from trunk to branch. FIXES: MAPREDUCE-3952 (Revision 1297299) Result = FAILURE tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1297299 Files : /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/CombineFileInputFormat.java /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineFileInputFormat.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk #976 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/976/)
        MAPREDUCE-3952. In MR2, when Total input paths to process == 1, CombinefileInputFormat.getSplits() returns 0 split. (zhenxiao via tucu) (Revision 1297293)

        Result = SUCCESS
        tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1297293
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/CombineFileInputFormat.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineFileInputFormat.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #976 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/976/ ) MAPREDUCE-3952 . In MR2, when Total input paths to process == 1, CombinefileInputFormat.getSplits() returns 0 split. (zhenxiao via tucu) (Revision 1297293) Result = SUCCESS tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1297293 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/CombineFileInputFormat.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineFileInputFormat.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-0.23-Build #189 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/189/)
        Merge -r 1297292:1297293 from trunk to branch. FIXES: MAPREDUCE-3952 (Revision 1297299)

        Result = SUCCESS
        tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1297299
        Files :

        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/CombineFileInputFormat.java
        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineFileInputFormat.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-0.23-Build #189 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/189/ ) Merge -r 1297292:1297293 from trunk to branch. FIXES: MAPREDUCE-3952 (Revision 1297299) Result = SUCCESS tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1297299 Files : /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/CombineFileInputFormat.java /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineFileInputFormat.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Common-trunk-Commit #1838 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1838/)
        MAPREDUCE-3952. In MR2, when Total input paths to process == 1, CombinefileInputFormat.getSplits() returns 0 split. (zhenxiao via tucu) (Revision 1297293)

        Result = SUCCESS
        tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1297293
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/CombineFileInputFormat.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineFileInputFormat.java
        Show
        Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #1838 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1838/ ) MAPREDUCE-3952 . In MR2, when Total input paths to process == 1, CombinefileInputFormat.getSplits() returns 0 split. (zhenxiao via tucu) (Revision 1297293) Result = SUCCESS tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1297293 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/CombineFileInputFormat.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineFileInputFormat.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk-Commit #1845 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1845/)
        MAPREDUCE-3952. In MR2, when Total input paths to process == 1, CombinefileInputFormat.getSplits() returns 0 split. (zhenxiao via tucu) (Revision 1297293)

        Result = FAILURE
        tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1297293
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/CombineFileInputFormat.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineFileInputFormat.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #1845 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1845/ ) MAPREDUCE-3952 . In MR2, when Total input paths to process == 1, CombinefileInputFormat.getSplits() returns 0 split. (zhenxiao via tucu) (Revision 1297293) Result = FAILURE tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1297293 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/CombineFileInputFormat.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineFileInputFormat.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-0.23-Commit #643 (See https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/643/)
        Merge -r 1297292:1297293 from trunk to branch. FIXES: MAPREDUCE-3952 (Revision 1297299)

        Result = FAILURE
        tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1297299
        Files :

        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/CombineFileInputFormat.java
        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineFileInputFormat.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-0.23-Commit #643 (See https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/643/ ) Merge -r 1297292:1297293 from trunk to branch. FIXES: MAPREDUCE-3952 (Revision 1297299) Result = FAILURE tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1297299 Files : /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/CombineFileInputFormat.java /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineFileInputFormat.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-0.23-Commit #632 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/632/)
        Merge -r 1297292:1297293 from trunk to branch. FIXES: MAPREDUCE-3952 (Revision 1297299)

        Result = SUCCESS
        tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1297299
        Files :

        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/CombineFileInputFormat.java
        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineFileInputFormat.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-0.23-Commit #632 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/632/ ) Merge -r 1297292:1297293 from trunk to branch. FIXES: MAPREDUCE-3952 (Revision 1297299) Result = SUCCESS tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1297299 Files : /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/CombineFileInputFormat.java /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineFileInputFormat.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk-Commit #1912 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1912/)
        MAPREDUCE-3952. In MR2, when Total input paths to process == 1, CombinefileInputFormat.getSplits() returns 0 split. (zhenxiao via tucu) (Revision 1297293)

        Result = SUCCESS
        tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1297293
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/CombineFileInputFormat.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineFileInputFormat.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #1912 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1912/ ) MAPREDUCE-3952 . In MR2, when Total input paths to process == 1, CombinefileInputFormat.getSplits() returns 0 split. (zhenxiao via tucu) (Revision 1297293) Result = SUCCESS tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1297293 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/CombineFileInputFormat.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineFileInputFormat.java
        Hide
        Alejandro Abdelnur added a comment -

        by mistake I've assigned to the reporter, assigning it to Bhallamudi

        Show
        Alejandro Abdelnur added a comment - by mistake I've assigned to the reporter, assigning it to Bhallamudi
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Common-0.23-Commit #642 (See https://builds.apache.org/job/Hadoop-Common-0.23-Commit/642/)
        Merge -r 1297292:1297293 from trunk to branch. FIXES: MAPREDUCE-3952 (Revision 1297299)

        Result = SUCCESS
        tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1297299
        Files :

        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/CombineFileInputFormat.java
        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineFileInputFormat.java
        Show
        Hudson added a comment - Integrated in Hadoop-Common-0.23-Commit #642 (See https://builds.apache.org/job/Hadoop-Common-0.23-Commit/642/ ) Merge -r 1297292:1297293 from trunk to branch. FIXES: MAPREDUCE-3952 (Revision 1297299) Result = SUCCESS tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1297299 Files : /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/CombineFileInputFormat.java /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineFileInputFormat.java
        Hide
        Alejandro Abdelnur added a comment -

        Thanks Zhenxiao Luo. Committed to trunk and branch-0.23

        Show
        Alejandro Abdelnur added a comment - Thanks Zhenxiao Luo. Committed to trunk and branch-0.23
        Hide
        Alejandro Abdelnur added a comment -

        +1

        Show
        Alejandro Abdelnur added a comment - +1
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12517052/MAPREDUCE-3952-1.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in .

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1997//testReport/
        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1997//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12517052/MAPREDUCE-3952-1.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1997//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1997//console This message is automatically generated.
        Hide
        Bhallamudi Venkata Siva Kamesh added a comment -

        Thanks Ahmed for reviewing the patch. Attaching a patch with proper formatting.

        Show
        Bhallamudi Venkata Siva Kamesh added a comment - Thanks Ahmed for reviewing the patch. Attaching a patch with proper formatting.
        Hide
        Ahmed Radwan added a comment -

        +1 Thanks Bhallamudi!
        Only a minor comment about two trailing whitespaces in the added testcase.

        Show
        Ahmed Radwan added a comment - +1 Thanks Bhallamudi! Only a minor comment about two trailing whitespaces in the added testcase.
        Hide
        Bhallamudi Venkata Siva Kamesh added a comment -

        Please review the patch

        Show
        Bhallamudi Venkata Siva Kamesh added a comment - Please review the patch
        Hide
        Zhenxiao Luo added a comment -

        @Bhallamudi

        Yes. This is what I observed when running Hive testcases on both MR1 and MR2. This is kind of incompatible changes. Can we fix it so that MR2 has the same behavior as MR1?

        @Ahmed

        Yes. This is exactly this JIRA ticket for. Thanks so much.

        Show
        Zhenxiao Luo added a comment - @Bhallamudi Yes. This is what I observed when running Hive testcases on both MR1 and MR2. This is kind of incompatible changes. Can we fix it so that MR2 has the same behavior as MR1? @Ahmed Yes. This is exactly this JIRA ticket for. Thanks so much.
        Hide
        Ahmed Radwan added a comment -

        Yes, But I think this shouldn't be the case since it is not backward compatibile and will cause applications (like Hive) expecting the old behavior to break.

        Show
        Ahmed Radwan added a comment - Yes, But I think this shouldn't be the case since it is not backward compatibile and will cause applications (like Hive) expecting the old behavior to break.
        Hide
        Bhallamudi Venkata Siva Kamesh added a comment -

        @Zhenxiao

        Is it the case that, in MR1, an empty file get a split size of 1, and then the record reader will emit no records. And in MR2, an empty file get a split size of 0?

        Yes.
        In MR1, in CombineFileInputFormat#OneFileInfo, OneBlockInfo object has been created even if the file's size is 0 and this leads to the creation of a file split of lenght 0. Where as in MR2, we are skipping the creation of this object and hence no file split(s).

        Show
        Bhallamudi Venkata Siva Kamesh added a comment - @Zhenxiao Is it the case that, in MR1, an empty file get a split size of 1, and then the record reader will emit no records. And in MR2, an empty file get a split size of 0? Yes. In MR1, in CombineFileInputFormat#OneFileInfo, OneBlockInfo object has been created even if the file's size is 0 and this leads to the creation of a file split of lenght 0. Where as in MR2, we are skipping the creation of this object and hence no file split(s).
        Hide
        Zhenxiao Luo added a comment -

        @Ahmed

        I think so.

        Is it the case that, in MR1, an empty file get a split size of 1, and then the record reader will emit no records.

        And in MR2, an empty file get a split size of 0?

        Show
        Zhenxiao Luo added a comment - @Ahmed I think so. Is it the case that, in MR1, an empty file get a split size of 1, and then the record reader will emit no records. And in MR2, an empty file get a split size of 0?
        Hide
        Ahmed Radwan added a comment -

        Is it the case that hive is expecting for an empty file a split size of 1, and then the record reader will emit no records?

        Show
        Ahmed Radwan added a comment - Is it the case that hive is expecting for an empty file a split size of 1, and then the record reader will emit no records?
        Hide
        Zhenxiao Luo added a comment -

        @Bhallamudi

        Yes. Seems the input file is an empty file from execution log:

        2012-02-28 15:56:37,219 INFO exec.ExecDriver (ExecDriver.java:addInputPath(829)) - Changed input file to file:/tmp/cloudera/hive_2012-02-28_15-56-37_188_1216173472421796708/-mr-10000/1
        2012-02-28 15:56:37,226 INFO util.NativeCodeLoader (NativeCodeLoader.java:<clinit>(50)) - Loaded the native-hadoop library
        2012-02-28 15:56:37,610 INFO jvm.JvmMetrics (JvmMetrics.java:init(76)) - Initializing JVM Metrics with processName=JobTracker, sessionId=
        2012-02-28 15:56:37,626 INFO exec.ExecDriver (ExecDriver.java:createTmpDirs(234)) - Making Temp Directory: file:/tmp/cloudera/hive_2012-02-28_15-56-26_431_554636048819260524/-mr-10003
        2012-02-28 15:56:37,657 INFO jvm.JvmMetrics (JvmMetrics.java:init(71)) - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
        2012-02-28 15:56:37,684 WARN mapreduce.JobSubmitter (JobSubmitter.java:copyAndConfigureFiles(139)) - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
        2012-02-28 15:56:37,960 WARN snappy.LoadSnappy (LoadSnappy.java:<clinit>(36)) - Snappy native library is available
        2012-02-28 15:56:37,961 INFO snappy.LoadSnappy (LoadSnappy.java:<clinit>(44)) - Snappy native library loaded
        2012-02-28 15:56:37,969 INFO io.CombineHiveInputFormat (CombineHiveInputFormat.java:getSplits(370)) - CombineHiveInputSplit creating pool for file:/tmp/cloudera/hive_2012-02-28_15-56-37_188_1216173472421796708/-mr-10000/1; using filter path file:/tmp/cloudera/hive_2012-02-28_15-56-37_188_1216173472421796708/-mr-10000/1
        2012-02-28 15:56:37,970 WARN conf.Configuration (Configuration.java:handleDeprecation(326)) - mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
        2012-02-28 15:56:37,970 WARN conf.Configuration (Configuration.java:handleDeprecation(326)) - mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node
        2012-02-28 15:56:37,971 WARN conf.Configuration (Configuration.java:handleDeprecation(326)) - mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack
        2012-02-28 15:56:37,971 WARN conf.Configuration (Configuration.java:handleDeprecation(326)) - mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
        2012-02-28 15:56:37,977 INFO input.FileInputFormat (FileInputFormat.java:listStatus(245)) - Total input paths to process : 1
        2012-02-28 15:56:37,982 INFO io.CombineHiveInputFormat (CombineHiveInputFormat.java:getSplits(388)) - Arrays.asList iss
        2012-02-28 15:56:37,982 INFO io.CombineHiveInputFormat (CombineHiveInputFormat.java:getSplits(410)) - iss size: 0
        2012-02-28 15:56:37,983 INFO io.CombineHiveInputFormat (CombineHiveInputFormat.java:getSplits(417)) - number of splits 0

        And, in MR1, the log looks like:

        2012-02-28 14:09:54,554 INFO exec.ExecDriver (ExecDriver.java:addInputPath(829)) - Changed input file to file:/tmp/cloudera/hive_2012-02-28_14-09-54_515_1377575814725676804/-mr-10000/1
        2012-02-28 14:09:54,855 INFO jvm.JvmMetrics (JvmMetrics.java:init(71)) - Initializing JVM Metrics with processName=JobTracker, sessionId=
        2012-02-28 14:09:54,871 INFO exec.ExecDriver (ExecDriver.java:createTmpDirs(234)) - Making Temp Directory: file:/tmp/cloudera/hive_2012-02-28_14-09-44_700_3241431154033268523/-mr-10003
        2012-02-28 14:09:54,881 WARN mapred.JobClient (JobClient.java:configureCommandLineOptions(539)) - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
        2012-02-28 14:09:55,037 INFO io.CombineHiveInputFormat (CombineHiveInputFormat.java:getSplits(370)) - CombineHiveInputSplit creating pool for file:/tmp/cloudera/hive_2012-02-28_14-09-54_515_1377575814725676804/-mr-10000/1; using filter path file:/tmp/cloudera/hive_2012-02-28_14-09-54_515_1377575814725676804/-mr-10000/1
        2012-02-28 14:09:55,042 INFO mapred.FileInputFormat (FileInputFormat.java:listStatus(192)) - Total input paths to process : 1
        2012-02-28 14:09:55,056 INFO io.CombineHiveInputFormat (CombineHiveInputFormat.java:getSplits(406)) - iss size: 1
        2012-02-28 14:09:55,057 INFO io.CombineHiveInputFormat (CombineHiveInputFormat.java:getSplits(409)) - adding inputSplitShim into result: Paths:/tmp/cloudera/hive_2012-02-28_14-09-54_515_1377575814725676804/-mr-10000/1/emptyFile:0+0 Locations:/default-rack:; InputFormatClass: org.apache.hadoop.mapred.TextInputFormat

        2012-02-28 14:09:55,057 INFO io.CombineHiveInputFormat (CombineHiveInputFormat.java:getSplits(413)) - number of splits 1

        So, in MR1, submitting a job having empty file get split length == 1, while in MR2, submitting a job having empty file get split length == 0.

        The case happens in Hive(https://issues.apache.org/jira/browse/HIVE-2783), when trying to run the following query in Hive:

        select * from
        (
        select key, value, ds from t1_new
        union all
        select key, value, t1_old.ds from t1_old join t1_mapping
        on t1_old.keymap = t1_mapping.keymap and
        t1_old.ds = t1_mapping.ds
        ) subq
        where ds = '2011-10-13';

        And, the second MR job is trying to execute:

        select key, value, ds from t1_new

        which has an empty input file in the submitted job.

        My understanding might be wrong. Correct me if there is anything goes wrong.

        Thanks,
        Zhenxiao

        Show
        Zhenxiao Luo added a comment - @Bhallamudi Yes. Seems the input file is an empty file from execution log: 2012-02-28 15:56:37,219 INFO exec.ExecDriver (ExecDriver.java:addInputPath(829)) - Changed input file to file:/tmp/cloudera/hive_2012-02-28_15-56-37_188_1216173472421796708/-mr-10000/1 2012-02-28 15:56:37,226 INFO util.NativeCodeLoader (NativeCodeLoader.java:<clinit>(50)) - Loaded the native-hadoop library 2012-02-28 15:56:37,610 INFO jvm.JvmMetrics (JvmMetrics.java:init(76)) - Initializing JVM Metrics with processName=JobTracker, sessionId= 2012-02-28 15:56:37,626 INFO exec.ExecDriver (ExecDriver.java:createTmpDirs(234)) - Making Temp Directory: file:/tmp/cloudera/hive_2012-02-28_15-56-26_431_554636048819260524/-mr-10003 2012-02-28 15:56:37,657 INFO jvm.JvmMetrics (JvmMetrics.java:init(71)) - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2012-02-28 15:56:37,684 WARN mapreduce.JobSubmitter (JobSubmitter.java:copyAndConfigureFiles(139)) - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 2012-02-28 15:56:37,960 WARN snappy.LoadSnappy (LoadSnappy.java:<clinit>(36)) - Snappy native library is available 2012-02-28 15:56:37,961 INFO snappy.LoadSnappy (LoadSnappy.java:<clinit>(44)) - Snappy native library loaded 2012-02-28 15:56:37,969 INFO io.CombineHiveInputFormat (CombineHiveInputFormat.java:getSplits(370)) - CombineHiveInputSplit creating pool for file:/tmp/cloudera/hive_2012-02-28_15-56-37_188_1216173472421796708/-mr-10000/1 ; using filter path file:/tmp/cloudera/hive_2012-02-28_15-56-37_188_1216173472421796708/-mr-10000/1 2012-02-28 15:56:37,970 WARN conf.Configuration (Configuration.java:handleDeprecation(326)) - mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize 2012-02-28 15:56:37,970 WARN conf.Configuration (Configuration.java:handleDeprecation(326)) - mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node 2012-02-28 15:56:37,971 WARN conf.Configuration (Configuration.java:handleDeprecation(326)) - mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack 2012-02-28 15:56:37,971 WARN conf.Configuration (Configuration.java:handleDeprecation(326)) - mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize 2012-02-28 15:56:37,977 INFO input.FileInputFormat (FileInputFormat.java:listStatus(245)) - Total input paths to process : 1 2012-02-28 15:56:37,982 INFO io.CombineHiveInputFormat (CombineHiveInputFormat.java:getSplits(388)) - Arrays.asList iss 2012-02-28 15:56:37,982 INFO io.CombineHiveInputFormat (CombineHiveInputFormat.java:getSplits(410)) - iss size: 0 2012-02-28 15:56:37,983 INFO io.CombineHiveInputFormat (CombineHiveInputFormat.java:getSplits(417)) - number of splits 0 And, in MR1, the log looks like: 2012-02-28 14:09:54,554 INFO exec.ExecDriver (ExecDriver.java:addInputPath(829)) - Changed input file to file:/tmp/cloudera/hive_2012-02-28_14-09-54_515_1377575814725676804/-mr-10000/1 2012-02-28 14:09:54,855 INFO jvm.JvmMetrics (JvmMetrics.java:init(71)) - Initializing JVM Metrics with processName=JobTracker, sessionId= 2012-02-28 14:09:54,871 INFO exec.ExecDriver (ExecDriver.java:createTmpDirs(234)) - Making Temp Directory: file:/tmp/cloudera/hive_2012-02-28_14-09-44_700_3241431154033268523/-mr-10003 2012-02-28 14:09:54,881 WARN mapred.JobClient (JobClient.java:configureCommandLineOptions(539)) - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 2012-02-28 14:09:55,037 INFO io.CombineHiveInputFormat (CombineHiveInputFormat.java:getSplits(370)) - CombineHiveInputSplit creating pool for file:/tmp/cloudera/hive_2012-02-28_14-09-54_515_1377575814725676804/-mr-10000/1 ; using filter path file:/tmp/cloudera/hive_2012-02-28_14-09-54_515_1377575814725676804/-mr-10000/1 2012-02-28 14:09:55,042 INFO mapred.FileInputFormat (FileInputFormat.java:listStatus(192)) - Total input paths to process : 1 2012-02-28 14:09:55,056 INFO io.CombineHiveInputFormat (CombineHiveInputFormat.java:getSplits(406)) - iss size: 1 2012-02-28 14:09:55,057 INFO io.CombineHiveInputFormat (CombineHiveInputFormat.java:getSplits(409)) - adding inputSplitShim into result: Paths:/tmp/cloudera/hive_2012-02-28_14-09-54_515_1377575814725676804/-mr-10000/1/emptyFile:0+0 Locations:/default-rack:; InputFormatClass: org.apache.hadoop.mapred.TextInputFormat 2012-02-28 14:09:55,057 INFO io.CombineHiveInputFormat (CombineHiveInputFormat.java:getSplits(413)) - number of splits 1 So, in MR1, submitting a job having empty file get split length == 1, while in MR2, submitting a job having empty file get split length == 0. The case happens in Hive( https://issues.apache.org/jira/browse/HIVE-2783 ), when trying to run the following query in Hive: select * from ( select key, value, ds from t1_new union all select key, value, t1_old.ds from t1_old join t1_mapping on t1_old.keymap = t1_mapping.keymap and t1_old.ds = t1_mapping.ds ) subq where ds = '2011-10-13'; And, the second MR job is trying to execute: select key, value, ds from t1_new which has an empty input file in the submitted job. My understanding might be wrong. Correct me if there is anything goes wrong. Thanks, Zhenxiao
        Hide
        Bhallamudi Venkata Siva Kamesh added a comment -

        I think, input file might be an empty file.

        If so, CombineFileInputFormat#getSplits just returns empty splits list. Where as FileInputFormat#getSplits creats a file split of length 0 for each empty file and adds it to splits list. So the number of file splits is atleast 1.

        BTW Why do we need to submit a job having an empty input file(s) to framework?

        Show
        Bhallamudi Venkata Siva Kamesh added a comment - I think , input file might be an empty file. If so, CombineFileInputFormat#getSplits just returns empty splits list. Where as FileInputFormat#getSplits creats a file split of length 0 for each empty file and adds it to splits list. So the number of file splits is atleast 1. BTW Why do we need to submit a job having an empty input file(s) to framework?

          People

          • Assignee:
            Bhallamudi Venkata Siva Kamesh
            Reporter:
            Zhenxiao Luo
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development