Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-2779

JobSplitWriter.java can't handle large job.split file

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.20.205.0, 0.22.0, 0.23.0
    • Fix Version/s: 0.22.0, 0.23.0
    • Component/s: job submission
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      We use cascading MultiInputFormat. MultiInputFormat sometimes generates big job.split used internally by hadoop, sometimes it can go beyond 2GB.

      In JobSplitWriter.java, the function that generates such file uses 32bit signed integer to compute offset into job.split.

      writeNewSplits
      ...
      int prevCount = out.size();
      ...
      int currCount = out.size();

      writeOldSplits
      ...
      long offset = out.size();
      ...
      int currLen = out.size();

      1. MAPREDUCE-2779-0.22.patch
        2 kB
        Ming Ma
      2. MAPREDUCE-2779-trunk.patch
        2 kB
        Konstantin Shvachko
      3. MAPREDUCE-2779-trunk.patch
        2 kB
        Ming Ma

        Issue Links

          Activity

          Hide
          Joep Rottinghuis added a comment -

          Patch looks good.
          Affects 0.20-security-* branches as well.

          FSDataOutputStream.getPos is not thread safe but then again DataOutPutStream.size does not seem to be thread safe either.
          Even through the DataOutPutStream.write method is synchronized, FSDataOutputStrem.write is not synchronized.
          This does not seem to be an issue in the current code path because createSplitFiles does not expose out.

          Show
          Joep Rottinghuis added a comment - Patch looks good. Affects 0.20-security-* branches as well. FSDataOutputStream.getPos is not thread safe but then again DataOutPutStream.size does not seem to be thread safe either. Even through the DataOutPutStream.write method is synchronized, FSDataOutputStrem.write is not synchronized. This does not seem to be an issue in the current code path because createSplitFiles does not expose out.
          Hide
          Arun C Murthy added a comment -

          Is this tested against 0.20.205 and trunk?

          Show
          Arun C Murthy added a comment - Is this tested against 0.20.205 and trunk?
          Hide
          Arun C Murthy added a comment -

          Sorry, hit the wrong button - assigning to Ming.

          Show
          Arun C Murthy added a comment - Sorry, hit the wrong button - assigning to Ming.
          Hide
          Ming Ma added a comment -

          It is tested on 0.20-security-* branches. Testing on 0.22 will be conducted later.

          Show
          Ming Ma added a comment - It is tested on 0.20-security-* branches. Testing on 0.22 will be conducted later.
          Hide
          Arun C Murthy added a comment -

          Ming, I can put this into 0.20.205 only after commit to trunk... unless this issue doesn't exist in trunk. Help, pls?

          Show
          Arun C Murthy added a comment - Ming, I can put this into 0.20.205 only after commit to trunk... unless this issue doesn't exist in trunk. Help, pls?
          Hide
          Ming Ma added a comment -

          Arun, the bug is still in the trunk. Thanks.

          Show
          Ming Ma added a comment - Arun, the bug is still in the trunk. Thanks.
          Hide
          Ming Ma added a comment -

          Here is the patch for 0.22. It passes all unit tests except for known buggy test.

          [junit] Test org.apache.hadoop.raid.TestRaidNode FAILED

          Note, the previous patch for trunk is no longer applicable to trunk, given there is a major restructuring in trunk since.

          Show
          Ming Ma added a comment - Here is the patch for 0.22. It passes all unit tests except for known buggy test. [junit] Test org.apache.hadoop.raid.TestRaidNode FAILED Note, the previous patch for trunk is no longer applicable to trunk, given there is a major restructuring in trunk since.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12497098/MAPREDUCE-2779-0.22.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/903//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12497098/MAPREDUCE-2779-0.22.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/903//console This message is automatically generated.
          Hide
          Konstantin Shvachko added a comment -

          Adjusted the patch for the new trunk.

          Show
          Konstantin Shvachko added a comment - Adjusted the patch for the new trunk.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12497108/MAPREDUCE-2779-trunk.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in .

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/904//testReport/
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/904//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12497108/MAPREDUCE-2779-trunk.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/904//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/904//console This message is automatically generated.
          Hide
          Konstantin Shvachko added a comment -

          I just committed this to 0.22, 0.23, and trunk.
          Thank you Ming.

          Show
          Konstantin Shvachko added a comment - I just committed this to 0.22, 0.23, and trunk. Thank you Ming.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk-Commit #993 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/993/)
          MAPREDUCE-2779. JobSplitWriter.java can't handle large job.split file. Contributed by Ming Ma.

          shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1177779
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/split/JobSplitWriter.java
          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #993 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/993/ ) MAPREDUCE-2779 . JobSplitWriter.java can't handle large job.split file. Contributed by Ming Ma. shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1177779 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/split/JobSplitWriter.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #1071 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1071/)
          MAPREDUCE-2779. JobSplitWriter.java can't handle large job.split file. Contributed by Ming Ma.

          shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1177779
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/split/JobSplitWriter.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #1071 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1071/ ) MAPREDUCE-2779 . JobSplitWriter.java can't handle large job.split file. Contributed by Ming Ma. shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1177779 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/split/JobSplitWriter.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-0.23-Build #32 (See https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Build/32/)
          MAPREDUCE-2779. JobSplitWriter.java can't handle large job.split file. Contributed by Ming Ma.

          shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1177783
          Files :

          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/split/JobSplitWriter.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-0.23-Build #32 (See https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Build/32/ ) MAPREDUCE-2779 . JobSplitWriter.java can't handle large job.split file. Contributed by Ming Ma. shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1177783 Files : /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/split/JobSplitWriter.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #846 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/846/)
          MAPREDUCE-2779. JobSplitWriter.java can't handle large job.split file. Contributed by Ming Ma.

          shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1177779
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/split/JobSplitWriter.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #846 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/846/ ) MAPREDUCE-2779 . JobSplitWriter.java can't handle large job.split file. Contributed by Ming Ma. shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1177779 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/split/JobSplitWriter.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #1013 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1013/)
          MAPREDUCE-2779. JobSplitWriter.java can't handle large job.split file. Contributed by Ming Ma.

          shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1177779
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/split/JobSplitWriter.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #1013 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1013/ ) MAPREDUCE-2779 . JobSplitWriter.java can't handle large job.split file. Contributed by Ming Ma. shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1177779 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/split/JobSplitWriter.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-22-branch #79 (See https://builds.apache.org/job/Hadoop-Mapreduce-22-branch/79/)
          MAPREDUCE-2779. JobSplitWriter.java can't handle large job.split file. Contributed by Ming Ma.

          shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1177787
          Files :

          • /hadoop/common/branches/branch-0.22/mapreduce/CHANGES.txt
          • /hadoop/common/branches/branch-0.22/mapreduce/src/java/org/apache/hadoop/mapreduce/split/JobSplitWriter.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-22-branch #79 (See https://builds.apache.org/job/Hadoop-Mapreduce-22-branch/79/ ) MAPREDUCE-2779 . JobSplitWriter.java can't handle large job.split file. Contributed by Ming Ma. shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1177787 Files : /hadoop/common/branches/branch-0.22/mapreduce/CHANGES.txt /hadoop/common/branches/branch-0.22/mapreduce/src/java/org/apache/hadoop/mapreduce/split/JobSplitWriter.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #817 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/817/)
          MAPREDUCE-2779. JobSplitWriter.java can't handle large job.split file. Contributed by Ming Ma.

          shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1177779
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/split/JobSplitWriter.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #817 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/817/ ) MAPREDUCE-2779 . JobSplitWriter.java can't handle large job.split file. Contributed by Ming Ma. shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1177779 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/split/JobSplitWriter.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-0.23-Build #26 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/26/)
          MAPREDUCE-2779. JobSplitWriter.java can't handle large job.split file. Contributed by Ming Ma.

          shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1177783
          Files :

          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/split/JobSplitWriter.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-0.23-Build #26 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/26/ ) MAPREDUCE-2779 . JobSplitWriter.java can't handle large job.split file. Contributed by Ming Ma. shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1177783 Files : /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/split/JobSplitWriter.java

            People

            • Assignee:
              Ming Ma
              Reporter:
              Ming Ma
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development