Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-2779

JobSplitWriter.java can't handle large job.split file

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.20.205.0, 0.22.0, 0.23.0
    • 0.22.0, 0.23.0
    • job submission
    • None
    • Reviewed

    Description

      We use cascading MultiInputFormat. MultiInputFormat sometimes generates big job.split used internally by hadoop, sometimes it can go beyond 2GB.

      In JobSplitWriter.java, the function that generates such file uses 32bit signed integer to compute offset into job.split.

      writeNewSplits
      ...
      int prevCount = out.size();
      ...
      int currCount = out.size();

      writeOldSplits
      ...
      long offset = out.size();
      ...
      int currLen = out.size();

      Attachments

        1. MAPREDUCE-2779-0.22.patch
          2 kB
          Ming Ma
        2. MAPREDUCE-2779-trunk.patch
          2 kB
          Konstantin Shvachko
        3. MAPREDUCE-2779-trunk.patch
          2 kB
          Ming Ma

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            mingma Ming Ma
            mingma Ming Ma
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Issue deployment