Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-5948

org.apache.hadoop.mapred.LineRecordReader does not handle multibyte record delimiters well

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 0.20.2, 0.23.9, 2.2.0
    • Fix Version/s: 2.8.0, 2.7.2, 2.6.3, 3.0.0-alpha1
    • Component/s: None
    • Labels:
      None
    • Environment:

      CDH3U2 Redhat linux 5.7

    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      Having defined a recorddelimiter of multiple bytes in a new InputFileFormat sometimes has the effect of skipping records from the input.

      This happens when the input splits are split off just after a recordseparator. Starting point for the next split would be non zero and skipFirstLine would be true. A seek into the file is done to start - 1 and the text until the first recorddelimiter is ignored (due to the presumption that this record is already handled by the previous maptask). Since the re ord delimiter is multibyte the seek only got the last byte of the delimiter into scope and its not recognized as a full delimiter. So the text is skipped until the next delimiter (ignoring a full record!!)

        Attachments

        1. HADOOP-9867.patch
          19 kB
          Rushabh S Shah
        2. HADOOP-9867.patch
          14 kB
          Vinayakumar B
        3. HADOOP-9867.patch
          10 kB
          Vinayakumar B
        4. HADOOP-9867.patch
          9 kB
          Vinayakumar B
        5. MAPREDUCE-5948.002.patch
          20 kB
          Akira Ajisaka
        6. MAPREDUCE-5948.003.patch
          20 kB
          Akira Ajisaka

          Issue Links

            Activity

              People

              • Assignee:
                ajisakaa Akira Ajisaka
                Reporter:
                krisgeus Kris Geusebroek
              • Votes:
                1 Vote for this issue
                Watchers:
                12 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: