Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-5862

Line records longer than 2x split size aren't handled correctly

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 2.3.0
    • Fix Version/s: 2.5.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Suppose this split (100-200) is in the middle of a record (90-240):

         0              100            200             300
         |---- split ----|---- curr ----|---- split ----|
                       <------- record ------->
                       90                     240
      

      Currently, the first split would read the entire record, up to offset 240, which is good. But the 2nd split has a bug in producing a phantom record of (200, 240).

        Attachments

          Activity

            People

            • Assignee:
              bcwalrus bc Wong
              Reporter:
              bcwalrus bc Wong
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: