Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-2779

JobSplitWriter.java can't handle large job.split file

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.20.205.0, 0.22.0, 0.23.0
    • Fix Version/s: 0.22.0, 0.23.0
    • Component/s: job submission
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      We use cascading MultiInputFormat. MultiInputFormat sometimes generates big job.split used internally by hadoop, sometimes it can go beyond 2GB.

      In JobSplitWriter.java, the function that generates such file uses 32bit signed integer to compute offset into job.split.

      writeNewSplits
      ...
      int prevCount = out.size();
      ...
      int currCount = out.size();

      writeOldSplits
      ...
      long offset = out.size();
      ...
      int currLen = out.size();

        Attachments

        1. MAPREDUCE-2779-0.22.patch
          2 kB
          Ming Ma
        2. MAPREDUCE-2779-trunk.patch
          2 kB
          Konstantin Shvachko
        3. MAPREDUCE-2779-trunk.patch
          2 kB
          Ming Ma

          Issue Links

            Activity

              People

              • Assignee:
                mingma Ming Ma
                Reporter:
                mingma Ming Ma
              • Votes:
                0 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: