Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-5948

org.apache.hadoop.mapred.LineRecordReader does not handle multibyte record delimiters well

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 0.20.2, 0.23.9, 2.2.0
    • Fix Version/s: 2.8.0, 2.7.2, 2.6.3, 3.0.0-alpha1
    • Component/s: None
    • Labels:
      None
    • Environment:

      CDH3U2 Redhat linux 5.7

    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      Having defined a recorddelimiter of multiple bytes in a new InputFileFormat sometimes has the effect of skipping records from the input.

      This happens when the input splits are split off just after a recordseparator. Starting point for the next split would be non zero and skipFirstLine would be true. A seek into the file is done to start - 1 and the text until the first recorddelimiter is ignored (due to the presumption that this record is already handled by the previous maptask). Since the re ord delimiter is multibyte the seek only got the last byte of the delimiter into scope and its not recognized as a full delimiter. So the text is skipped until the next delimiter (ignoring a full record!!)

      1. HADOOP-9867.patch
        19 kB
        Rushabh S Shah
      2. HADOOP-9867.patch
        14 kB
        Vinayakumar B
      3. HADOOP-9867.patch
        10 kB
        Vinayakumar B
      4. HADOOP-9867.patch
        9 kB
        Vinayakumar B
      5. MAPREDUCE-5948.002.patch
        20 kB
        Akira Ajisaka
      6. MAPREDUCE-5948.003.patch
        20 kB
        Akira Ajisaka

        Issue Links

          Activity

          Hide
          krisgeus Kris Geusebroek added a comment -

          I created a Fix by adding the following code:

          } else {
          if (start != 0) {
          skipFirstLine = true;
          + for (int i=0; i < recordDelimiter.length; i++)

          { --start; + }

          fileIn.seek(start);
          }

          currently I'm testing this with a custom created subclass of LineRecordReader. If testing is OK, I'm willing to create a patch file if needed.

          Show
          krisgeus Kris Geusebroek added a comment - I created a Fix by adding the following code: } else { if (start != 0) { skipFirstLine = true; + for (int i=0; i < recordDelimiter.length; i++) { --start; + } fileIn.seek(start); } currently I'm testing this with a custom created subclass of LineRecordReader. If testing is OK, I'm willing to create a patch file if needed.
          Hide
          jlowe Jason Lowe added a comment -

          Ran across this JIRA while discussing the intricacies of HADOOP-9622. There's a relatively straightforward testcase that demonstrates the issue. With the following plaintext input

          customdeliminput.txt
          abcxxx
          defxxx
          ghixxx
          jklxxx
          mnoxxx
          pqrxxx
          stuxxx
          vw xxx
          xyzxxx
          

          run a wordcount job like this:

          hadoop jar $HADOOP_PREFIX/share/hadoop/mapreduce/hadoop-mapreduce-examples*.jar wordcount -Dmapreduce.input.fileinputformat.split.maxsize=33 -Dtextinputformat.record.delimiter=xxx customdeliminput.txt wcout
          

          and we can see that one of the records was dropped due to incorrect split processing:

          $ hadoop fs -cat wcout/part-r-00000               
          abc	1
          def	1
          ghi	1
          jkl	1
          mno	1
          stu	1
          vw	1
          xyz	1
          

          I don't think rewinding the seek position by the delimiter length is correct in all cases. I believe that will lead to duplicate records rather than dropped records (e.g.: split ends exactly when a delimiter ends, and both splits end up processing the record after that delimiter).

          Instead we can get correct behavior by treating any split in the middle of a multibyte custom delimiter as if the delimiter ended exactly at the end of the split, i.e.: the consumer of the prior split is responsible for processing the divided delimiter and the subsequent record. The consumer of the next split then tosses the first record up to the first full delimiter as usual (i.e.: including the partial delimiter at the beginning of the split) and proceeds to process any subsequent records. That way we don't get any dropped records or duplicate records.

          I think one way of accomplishing this is to have the LineReader for multibyte custom delimiters report the current position as the end of the record data without the delimiter bytes. Then any record that ends exactly at the end of the split or whose delimiter straddles the split boundary will cause the prior split to consume the extra record necessary.

          Show
          jlowe Jason Lowe added a comment - Ran across this JIRA while discussing the intricacies of HADOOP-9622 . There's a relatively straightforward testcase that demonstrates the issue. With the following plaintext input customdeliminput.txt abcxxx defxxx ghixxx jklxxx mnoxxx pqrxxx stuxxx vw xxx xyzxxx run a wordcount job like this: hadoop jar $HADOOP_PREFIX/share/hadoop/mapreduce/hadoop-mapreduce-examples*.jar wordcount -Dmapreduce.input.fileinputformat.split.maxsize=33 -Dtextinputformat.record.delimiter=xxx customdeliminput.txt wcout and we can see that one of the records was dropped due to incorrect split processing: $ hadoop fs -cat wcout/part-r-00000 abc 1 def 1 ghi 1 jkl 1 mno 1 stu 1 vw 1 xyz 1 I don't think rewinding the seek position by the delimiter length is correct in all cases. I believe that will lead to duplicate records rather than dropped records (e.g.: split ends exactly when a delimiter ends, and both splits end up processing the record after that delimiter). Instead we can get correct behavior by treating any split in the middle of a multibyte custom delimiter as if the delimiter ended exactly at the end of the split, i.e.: the consumer of the prior split is responsible for processing the divided delimiter and the subsequent record. The consumer of the next split then tosses the first record up to the first full delimiter as usual (i.e.: including the partial delimiter at the beginning of the split) and proceeds to process any subsequent records. That way we don't get any dropped records or duplicate records. I think one way of accomplishing this is to have the LineReader for multibyte custom delimiters report the current position as the end of the record data without the delimiter bytes. Then any record that ends exactly at the end of the split or whose delimiter straddles the split boundary will cause the prior split to consume the extra record necessary.
          Hide
          jlowe Jason Lowe added a comment -

          Raising severity since this involves loss of data. Also I confirmed this is an issue on recent Hadoop versions as well.

          Show
          jlowe Jason Lowe added a comment - Raising severity since this involves loss of data. Also I confirmed this is an issue on recent Hadoop versions as well.
          Hide
          vinayrpet Vinayakumar B added a comment -

          Attaching a patch with the test mentioned by Jason.

          Reading one more record if the split ends between the delimiter bytes.

          Please review.

          Show
          vinayrpet Vinayakumar B added a comment - Attaching a patch with the test mentioned by Jason. Reading one more record if the split ends between the delimiter bytes. Please review.
          Hide
          vinayrpet Vinayakumar B added a comment -

          Updated possible NPE

          Show
          vinayrpet Vinayakumar B added a comment - Updated possible NPE
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12614864/HADOOP-9867.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient:

          org.apache.hadoop.mapred.TestJobCleanup

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3302//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3302//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12614864/HADOOP-9867.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient: org.apache.hadoop.mapred.TestJobCleanup +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3302//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3302//console This message is automatically generated.
          Hide
          jlowe Jason Lowe added a comment -

          Thanks for the patch, Vinay. I think this approach can work when the input is uncompressed, however I don't think it will work for block-compressed inputs. Block codecs often report the file position as being the start of the codec block and then it "teleports" to the byte position of the next block once the first byte of the next block is consumed. See HADOOP-9622 for a similar issue with the default delimiter and how it's being addressed. Also getFilePosition() for a compressed input is returning a compressed stream offset, so if we try to do math on that with an uncompressed delimiter length we're mixing different units.

          Since LineRecordReader::getFilePosition() can mean different things for different inputs, I think a better approach would be to change LineReader (not LineRecordReader) so the reported file position for multi-byte custom delimiters is the file position after the record but not including its delimiter. Either that or wait for HADOOP-9622 to be committed and update the SplitLineReader interface from the HADOOP-9622 patch so the uncompressed input reader would indicate an additional record needs to be read if the split ends mid-delimiter.

          Show
          jlowe Jason Lowe added a comment - Thanks for the patch, Vinay. I think this approach can work when the input is uncompressed, however I don't think it will work for block-compressed inputs. Block codecs often report the file position as being the start of the codec block and then it "teleports" to the byte position of the next block once the first byte of the next block is consumed. See HADOOP-9622 for a similar issue with the default delimiter and how it's being addressed. Also getFilePosition() for a compressed input is returning a compressed stream offset, so if we try to do math on that with an uncompressed delimiter length we're mixing different units. Since LineRecordReader::getFilePosition() can mean different things for different inputs, I think a better approach would be to change LineReader (not LineRecordReader) so the reported file position for multi-byte custom delimiters is the file position after the record but not including its delimiter. Either that or wait for HADOOP-9622 to be committed and update the SplitLineReader interface from the HADOOP-9622 patch so the uncompressed input reader would indicate an additional record needs to be read if the split ends mid-delimiter.
          Hide
          vinayrpet Vinayakumar B added a comment -

          Thanks Jason, I prefer waiting for HADOOP-9622 to be committed.
          Meanwhile I will try to update SplitLineReader offline.

          Show
          vinayrpet Vinayakumar B added a comment - Thanks Jason, I prefer waiting for HADOOP-9622 to be committed. Meanwhile I will try to update SplitLineReader offline.
          Hide
          vinayrpet Vinayakumar B added a comment -

          Attaching the updated patch based on HADOOP-9622 changes

          Show
          vinayrpet Vinayakumar B added a comment - Attaching the updated patch based on HADOOP-9622 changes
          Hide
          hadoopqa Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12617969/HADOOP-9867.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3350//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3350//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12617969/HADOOP-9867.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3350//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3350//console This message is automatically generated.
          Hide
          jlowe Jason Lowe added a comment -

          Thanks for updating the patch, Vinay. Comments:

          • I don't think LineReader is the best place to put split-specific code. Its sole purpose is to read lines from an input stream regardless of split boundaries. There are users of this class that are not necessarily processing splits. That's why I created SplitLineReader in MapReduce, and I believe this logic is better placed there.
          • I don't think we want to change Math.max(maxBytesToConsume(pos), maxLineLength)) to Math.min(maxBytesToConsume(pos), maxLineLength)). We need to be able to read a record past the end of the split when the record crosses the split boundary, but I think this change could allow a truncated record to be returned for an uncompressed input stream. e.g.: fillBuffer happens to return data only up to the end of the split, record is incomplete (no delimiter found), but maxBytesToConsume keeps us from filling the buffer with more data and a truncated record is returned.

          I think a more straightforward approach would be to have SplitLineReader be aware of the end of the split and track it in fillBuffer() much like CompressedLineSplitReader does. The fillBuffer callback already indicates whether we're mid-delimiter or not, so we can simply check if fillBuffer is being called after the split has ended but we're mid-delimiter. In that case we need an additional record.

          Show
          jlowe Jason Lowe added a comment - Thanks for updating the patch, Vinay. Comments: I don't think LineReader is the best place to put split-specific code. Its sole purpose is to read lines from an input stream regardless of split boundaries. There are users of this class that are not necessarily processing splits. That's why I created SplitLineReader in MapReduce, and I believe this logic is better placed there. I don't think we want to change Math.max(maxBytesToConsume(pos), maxLineLength)) to Math.min(maxBytesToConsume(pos), maxLineLength)). We need to be able to read a record past the end of the split when the record crosses the split boundary, but I think this change could allow a truncated record to be returned for an uncompressed input stream. e.g.: fillBuffer happens to return data only up to the end of the split, record is incomplete (no delimiter found), but maxBytesToConsume keeps us from filling the buffer with more data and a truncated record is returned. I think a more straightforward approach would be to have SplitLineReader be aware of the end of the split and track it in fillBuffer() much like CompressedLineSplitReader does. The fillBuffer callback already indicates whether we're mid-delimiter or not, so we can simply check if fillBuffer is being called after the split has ended but we're mid-delimiter. In that case we need an additional record.
          Hide
          vinayrpet Vinayakumar B added a comment -

          Hi jason
          I was trying to implement the proposed solution you suggested.
          But I was facing issues.
          If you know the exact changes, can you please provide the patch.
          Thanks

          Show
          vinayrpet Vinayakumar B added a comment - Hi jason I was trying to implement the proposed solution you suggested. But I was facing issues. If you know the exact changes, can you please provide the patch. Thanks
          Hide
          shahrs87 Rushabh S Shah added a comment -

          I am tracking this jira for a while.
          I read all the comments by Jason.
          I guess this patch will address the Jason's comments.
          I have used the test case provided in Vinayakumar's patch and modified a little bit to test exhaustively.

          Show
          shahrs87 Rushabh S Shah added a comment - I am tracking this jira for a while. I read all the comments by Jason. I guess this patch will address the Jason's comments. I have used the test case provided in Vinayakumar's patch and modified a little bit to test exhaustively.
          Hide
          hadoopqa Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12652422/HADOOP-9867.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/4168//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/4168//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12652422/HADOOP-9867.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/4168//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/4168//console This message is automatically generated.
          Hide
          vinayrpet Vinayakumar B added a comment -

          Thanks Rushabh S Shah for trying out the patch.

          I got test failure when the input string specified in your test is as follows with separator as "xxx" with split length as 46.

              String inputData = "abcxxxdefxxxghixxx"
                  + "jklxxxmnoxxxpqrxxxstuxxxvw xxxxyz";

          Can you check again?

          Show
          vinayrpet Vinayakumar B added a comment - Thanks Rushabh S Shah for trying out the patch. I got test failure when the input string specified in your test is as follows with separator as "xxx" with split length as 46. String inputData = "abcxxxdefxxxghixxx" + "jklxxxmnoxxxpqrxxxstuxxxvw xxxxyz" ; Can you check again?
          Hide
          shahrs87 Rushabh S Shah added a comment -

          Hey Vinayakaumar,
          Thanks for checking out the patch and providing valuable feedback.
          I did ran into this test case while solving this jira.
          I am going to file another jira for this specific test case (and a couple of more which I came across) since the test case you mentioned is not in the scope of this jira.

          Show
          shahrs87 Rushabh S Shah added a comment - Hey Vinayakaumar, Thanks for checking out the patch and providing valuable feedback. I did ran into this test case while solving this jira. I am going to file another jira for this specific test case (and a couple of more which I came across) since the test case you mentioned is not in the scope of this jira.
          Hide
          vinayrpet Vinayakumar B added a comment -

          I feel this case is related to thia jira also.
          Refer the example given by by jason in one of the above comments.

          Show
          vinayrpet Vinayakumar B added a comment - I feel this case is related to thia jira also. Refer the example given by by jason in one of the above comments.
          Hide
          jlowe Jason Lowe added a comment -

          Actually I agree with Rushabh that there are at least two somewhat different problems here. The original problem reported in the JIRA has to do with records being dropped with uncompressed inputs. We should fix that issue so we don't drop data when using an uncompressed input. I'm assuming Rushabh's patch solves that issue, but I haven't looked at it in detail just yet.

          There's another issue related to mistaken record delimiter recognition where the subsequent split reader can accidentally think it found a delimiter when in fact the real record delimiter is somewhere else. If the subsequent split reader sees 'xxxxyzxxx' at the beginning of its split then it will toss out the first record (i.e.: the first 'xxx') then read 'xyz' as the next record. However that may or may not be the correct behavior, because with that kind of delimiter and data the correct behavior depends upon the previous split's data. If the previous split ended with 'abc' then the behavior was correct and there are two records in the stream: 'abc' and 'xyz'. If the previous split ended with 'abcx' then that's the incorrect behavior. The records should be 'abc' and 'xxyz' but the second split reader will report an 'xyz' record that shouldn't exist.

          To solve that problem either a split reader would have to examine the prior-split's data to distinguish this case, or the split reader would have to realize it's an ambiguous situation and leave the record processing to the previous split reader to handle. The former can be very expensive if the prior split is compressed, as it has to potentially unpack the entire split. Also this can get very tricky and a reader may need to read more than one other split to resolve it. For example, if the data stream is 'axxxxxxxxxxxxx......xxxxxxbxxxxxx......xxxxxcxxxxxx' then a reader may have to scan far down into subsequent splits since only it knows where the true record boundaries are. Simply tacking on an extra character at the beginning of that input changes where the record boundaries are and the record contents even the last split in the input. Solving this requires a different high-level algorithm to split processing than what we have today (i.e.: throw away the first record and go), so I believe that's something better left to a followup JIRA.

          It'd be nice to solve the dropped-record problem for scenarios where we don't have to worry about mistaken record delimiter recognition in the data, as that's an incremental improvement from where we are today. I'll try to get some time to review the latest patch and provide comments soon.

          Show
          jlowe Jason Lowe added a comment - Actually I agree with Rushabh that there are at least two somewhat different problems here. The original problem reported in the JIRA has to do with records being dropped with uncompressed inputs. We should fix that issue so we don't drop data when using an uncompressed input. I'm assuming Rushabh's patch solves that issue, but I haven't looked at it in detail just yet. There's another issue related to mistaken record delimiter recognition where the subsequent split reader can accidentally think it found a delimiter when in fact the real record delimiter is somewhere else. If the subsequent split reader sees 'xxxxyzxxx' at the beginning of its split then it will toss out the first record (i.e.: the first 'xxx') then read 'xyz' as the next record. However that may or may not be the correct behavior, because with that kind of delimiter and data the correct behavior depends upon the previous split's data. If the previous split ended with 'abc' then the behavior was correct and there are two records in the stream: 'abc' and 'xyz'. If the previous split ended with 'abcx' then that's the incorrect behavior. The records should be 'abc' and 'xxyz' but the second split reader will report an 'xyz' record that shouldn't exist. To solve that problem either a split reader would have to examine the prior-split's data to distinguish this case, or the split reader would have to realize it's an ambiguous situation and leave the record processing to the previous split reader to handle. The former can be very expensive if the prior split is compressed, as it has to potentially unpack the entire split. Also this can get very tricky and a reader may need to read more than one other split to resolve it. For example, if the data stream is 'axxxxxxxxxxxxx......xxxxxxbxxxxxx......xxxxxcxxxxxx' then a reader may have to scan far down into subsequent splits since only it knows where the true record boundaries are. Simply tacking on an extra character at the beginning of that input changes where the record boundaries are and the record contents even the last split in the input. Solving this requires a different high-level algorithm to split processing than what we have today (i.e.: throw away the first record and go), so I believe that's something better left to a followup JIRA. It'd be nice to solve the dropped-record problem for scenarios where we don't have to worry about mistaken record delimiter recognition in the data, as that's an incremental improvement from where we are today. I'll try to get some time to review the latest patch and provide comments soon.
          Hide
          hadoopqa Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12652422/HADOOP-9867.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4695//testReport/
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4695//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12652422/HADOOP-9867.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4695//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4695//console This message is automatically generated.
          Hide
          jlowe Jason Lowe added a comment -

          Moved this to MAPREDUCE since that's were the change should go. Thanks for the patch, Rushabh. Some comments on the patch:

          • LineReader: I'm not sure changing LineReader's byteConsumed semantics will be OK. LineReader is used by things besides MapReduce, so changing the return result for some corner cases may be problematic for other downstream projects. Rather than rely on the bytes consumed result from LineReader it seems we could track this using the existing totalBytesRead value. We simply need to track how many bytes from the input stream we've read before we try to read the next line. If we've read past the end of the split (i.e.: bytes > split length) then we can set the finished flag.
          • UncompressedSplitLineReader: Given how complicated split processing can be and all the corner cases, it'd be nice to have some comments explaining what's going on similar to what's in CompressedSplitLineReader. Otherwise many will wonder why we're going out of our way to make sure we stop right at the split when we fill the buffer, etc.
          • bufferPosition is not really a position in the buffer but a position within the split, and that's similar to totalBytesRead. I'm not sure we can completely combine them, but it'd be nice if their names were more specific to what they're tracking (one is the total bytes read from the input stream and another is total bytes consumed by the line reader).
          • (inDelimiter && actualBytesRead > 0) can just be (inDelimiter) because we already know actualBytesRead > 0 at that point due to the previous condition check.
          • Test doesn't cleanup the temporary files it creates when it's done. (Actually just noticed the tests are creating files in the wrong directory, should be something like TestLineRecordReader.class.getName() and not TestTextInputFormat.)
          • Nit: whitespace between if and ( and lines formatted to 80 columns to follow the coding standard
          Show
          jlowe Jason Lowe added a comment - Moved this to MAPREDUCE since that's were the change should go. Thanks for the patch, Rushabh. Some comments on the patch: LineReader: I'm not sure changing LineReader's byteConsumed semantics will be OK. LineReader is used by things besides MapReduce, so changing the return result for some corner cases may be problematic for other downstream projects. Rather than rely on the bytes consumed result from LineReader it seems we could track this using the existing totalBytesRead value. We simply need to track how many bytes from the input stream we've read before we try to read the next line. If we've read past the end of the split (i.e.: bytes > split length) then we can set the finished flag. UncompressedSplitLineReader: Given how complicated split processing can be and all the corner cases, it'd be nice to have some comments explaining what's going on similar to what's in CompressedSplitLineReader. Otherwise many will wonder why we're going out of our way to make sure we stop right at the split when we fill the buffer, etc. bufferPosition is not really a position in the buffer but a position within the split, and that's similar to totalBytesRead. I'm not sure we can completely combine them, but it'd be nice if their names were more specific to what they're tracking (one is the total bytes read from the input stream and another is total bytes consumed by the line reader). (inDelimiter && actualBytesRead > 0) can just be (inDelimiter) because we already know actualBytesRead > 0 at that point due to the previous condition check. Test doesn't cleanup the temporary files it creates when it's done. (Actually just noticed the tests are creating files in the wrong directory, should be something like TestLineRecordReader.class.getName() and not TestTextInputFormat.) Nit: whitespace between if and ( and lines formatted to 80 columns to follow the coding standard
          Hide
          vinayrpet Vinayakumar B added a comment -

          Thanks Jason for the detail in prev comment. Now I agree that is a separate issue.
          Thanks Rushabh for taking this up. Lets go ahead with Rushab's patch. Assigning the issue to Rushabh.

          Show
          vinayrpet Vinayakumar B added a comment - Thanks Jason for the detail in prev comment. Now I agree that is a separate issue. Thanks Rushabh for taking this up. Lets go ahead with Rushab's patch. Assigning the issue to Rushabh.
          Hide
          Markovich Rivkin Andrey added a comment -

          Today, faced with this problem. In Bz2 files multibyte record delimiter leads to record duplicates. In uncommpresed files leads to record drop.
          Hadoop version 2.5.0.
          Also we can't set hex values, for example /x01 in hadoop conf. ("textinputformat.record.delimiter").

          Show
          Markovich Rivkin Andrey added a comment - Today, faced with this problem. In Bz2 files multibyte record delimiter leads to record duplicates. In uncommpresed files leads to record drop. Hadoop version 2.5.0. Also we can't set hex values, for example /x01 in hadoop conf. ("textinputformat.record.delimiter").
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          -1 patch 0m 0s The patch command could not apply the patch during dryrun.



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12652422/HADOOP-9867.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / c8d7290
          Console output https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5443/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment -1 patch 0m 0s The patch command could not apply the patch during dryrun. Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12652422/HADOOP-9867.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / c8d7290 Console output https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5443/console This message was automatically generated.
          Hide
          jlowe Jason Lowe added a comment -

          Sorry this fell off my radar. Canceling the patch since it no longer applies.

          Rushabh S Shah could you update the patch and address the recent review comments?

          Show
          jlowe Jason Lowe added a comment - Sorry this fell off my radar. Canceling the patch since it no longer applies. Rushabh S Shah could you update the patch and address the recent review comments?
          Hide
          shahrs87 Rushabh S Shah added a comment -

          Sorry this fell off my radar too.
          I don't have enough cycles to work on this right now.
          We can move this to next release.
          Or if someone is interested to work on this, I am more than happy to let him/her take.

          Show
          shahrs87 Rushabh S Shah added a comment - Sorry this fell off my radar too. I don't have enough cycles to work on this right now. We can move this to next release. Or if someone is interested to work on this, I am more than happy to let him/her take.
          Hide
          ajisakaa Akira Ajisaka added a comment -

          Moving the target version to 2.8.0 since this is not a regression in 2.7.0.

          Show
          ajisakaa Akira Ajisaka added a comment - Moving the target version to 2.8.0 since this is not a regression in 2.7.0.
          Hide
          ajisakaa Akira Ajisaka added a comment -

          Hi Vinayakumar B, are you willing to take over this issue? If you are not interested in the work, I'd like to rebase the patch and address the review comments.

          Show
          ajisakaa Akira Ajisaka added a comment - Hi Vinayakumar B , are you willing to take over this issue? If you are not interested in the work, I'd like to rebase the patch and address the review comments.
          Hide
          vinayrpet Vinayakumar B added a comment -

          Sure, Go ahead

          Show
          vinayrpet Vinayakumar B added a comment - Sure, Go ahead
          Hide
          hadoopqa Hadoop QA added a comment -



          +1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 18m 24s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 2 new or modified test files.
          +1 javac 7m 47s There were no new javac warning messages.
          +1 javadoc 9m 46s There were no new javadoc warning messages.
          +1 release audit 0m 24s The applied patch does not increase the total number of release audit warnings.
          +1 checkstyle 1m 52s There were no new checkstyle issues.
          +1 whitespace 0m 1s The patch has no lines that end in whitespace.
          +1 install 1m 39s mvn install still works.
          +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse.
          +1 findbugs 3m 21s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
          +1 common tests 24m 27s Tests passed in hadoop-common.
          +1 mapreduce tests 1m 44s Tests passed in hadoop-mapreduce-client-core.
              70m 2s  



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12738701/MAPREDUCE-5948.002.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / 3c2397c
          hadoop-common test log https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5787/artifact/patchprocess/testrun_hadoop-common.txt
          hadoop-mapreduce-client-core test log https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5787/artifact/patchprocess/testrun_hadoop-mapreduce-client-core.txt
          Test Results https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5787/testReport/
          Java 1.7.0_55
          uname Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5787/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 pre-patch 18m 24s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 2 new or modified test files. +1 javac 7m 47s There were no new javac warning messages. +1 javadoc 9m 46s There were no new javadoc warning messages. +1 release audit 0m 24s The applied patch does not increase the total number of release audit warnings. +1 checkstyle 1m 52s There were no new checkstyle issues. +1 whitespace 0m 1s The patch has no lines that end in whitespace. +1 install 1m 39s mvn install still works. +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse. +1 findbugs 3m 21s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 common tests 24m 27s Tests passed in hadoop-common. +1 mapreduce tests 1m 44s Tests passed in hadoop-mapreduce-client-core.     70m 2s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12738701/MAPREDUCE-5948.002.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 3c2397c hadoop-common test log https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5787/artifact/patchprocess/testrun_hadoop-common.txt hadoop-mapreduce-client-core test log https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5787/artifact/patchprocess/testrun_hadoop-mapreduce-client-core.txt Test Results https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5787/testReport/ Java 1.7.0_55 uname Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5787/console This message was automatically generated.
          Hide
          ajisakaa Akira Ajisaka added a comment -

          v3 patch

          • Refactored the code.
          • Added comments for the code.
          Show
          ajisakaa Akira Ajisaka added a comment - v3 patch Refactored the code. Added comments for the code.
          Hide
          hadoopqa Hadoop QA added a comment -



          +1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 17m 35s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 2 new or modified test files.
          +1 javac 7m 32s There were no new javac warning messages.
          +1 javadoc 9m 37s There were no new javadoc warning messages.
          +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings.
          +1 checkstyle 1m 46s There were no new checkstyle issues.
          +1 whitespace 0m 0s The patch has no lines that end in whitespace.
          +1 install 1m 33s mvn install still works.
          +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse.
          +1 findbugs 3m 15s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
          +1 common tests 23m 18s Tests passed in hadoop-common.
          +1 mapreduce tests 1m 43s Tests passed in hadoop-mapreduce-client-core.
              67m 24s  



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12738857/MAPREDUCE-5948.003.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / c7729ef
          hadoop-common test log https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5789/artifact/patchprocess/testrun_hadoop-common.txt
          hadoop-mapreduce-client-core test log https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5789/artifact/patchprocess/testrun_hadoop-mapreduce-client-core.txt
          Test Results https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5789/testReport/
          Java 1.7.0_55
          uname Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5789/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 pre-patch 17m 35s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 2 new or modified test files. +1 javac 7m 32s There were no new javac warning messages. +1 javadoc 9m 37s There were no new javadoc warning messages. +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings. +1 checkstyle 1m 46s There were no new checkstyle issues. +1 whitespace 0m 0s The patch has no lines that end in whitespace. +1 install 1m 33s mvn install still works. +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse. +1 findbugs 3m 15s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 common tests 23m 18s Tests passed in hadoop-common. +1 mapreduce tests 1m 43s Tests passed in hadoop-mapreduce-client-core.     67m 24s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12738857/MAPREDUCE-5948.003.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / c7729ef hadoop-common test log https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5789/artifact/patchprocess/testrun_hadoop-common.txt hadoop-mapreduce-client-core test log https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5789/artifact/patchprocess/testrun_hadoop-mapreduce-client-core.txt Test Results https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5789/testReport/ Java 1.7.0_55 uname Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5789/console This message was automatically generated.
          Hide
          jlowe Jason Lowe added a comment -

          Thanks for taking this up, Akira. Will try to review the patch shortly.

          Rivkin Andrey could you provide more details on the duplicate records with bz2? A similar problem was reported in MAPREDUCE-6299 but there are no details to work with. Note that the latest patch will not address any issues with bz2, as it only fixes the handling of duplicate records with uncompressed input.

          Show
          jlowe Jason Lowe added a comment - Thanks for taking this up, Akira. Will try to review the patch shortly. Rivkin Andrey could you provide more details on the duplicate records with bz2? A similar problem was reported in MAPREDUCE-6299 but there are no details to work with. Note that the latest patch will not address any issues with bz2, as it only fixes the handling of duplicate records with uncompressed input.
          Hide
          jlowe Jason Lowe added a comment -

          +1 for the latest patch. This should resolve the dropped/duplicate problems with uncompressed input. We can tackle the reported duplicate records for bz2 in MAPREDUCE-6299.

          Will commit this early next week if there are no objections.

          Show
          jlowe Jason Lowe added a comment - +1 for the latest patch. This should resolve the dropped/duplicate problems with uncompressed input. We can tackle the reported duplicate records for bz2 in MAPREDUCE-6299 . Will commit this early next week if there are no objections.
          Hide
          jlowe Jason Lowe added a comment -

          Thanks to Vinayakumar, Rushabh, and Akira for the contribution! I committed this to trunk and branch-2.

          Show
          jlowe Jason Lowe added a comment - Thanks to Vinayakumar, Rushabh, and Akira for the contribution! I committed this to trunk and branch-2.
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #8048 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8048/)
          MAPREDUCE-5948. org.apache.hadoop.mapred.LineRecordReader does not handle multibyte record delimiters well. Contributed by Vinayakumar B, Rushabh Shah, and Akira AJISAKA (jlowe: rev 077250d8d7b4b757543a39a6ce8bb6e3be356c6f)

          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/LineRecordReader.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestLineRecordReader.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/LineRecordReader.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapred/TestLineRecordReader.java
          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LineReader.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/UncompressedSplitLineReader.java
          • hadoop-mapreduce-project/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #8048 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8048/ ) MAPREDUCE-5948 . org.apache.hadoop.mapred.LineRecordReader does not handle multibyte record delimiters well. Contributed by Vinayakumar B, Rushabh Shah, and Akira AJISAKA (jlowe: rev 077250d8d7b4b757543a39a6ce8bb6e3be356c6f) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/LineRecordReader.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestLineRecordReader.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/LineRecordReader.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapred/TestLineRecordReader.java hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LineReader.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/UncompressedSplitLineReader.java hadoop-mapreduce-project/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk #967 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/967/)
          MAPREDUCE-5948. org.apache.hadoop.mapred.LineRecordReader does not handle multibyte record delimiters well. Contributed by Vinayakumar B, Rushabh Shah, and Akira AJISAKA (jlowe: rev 077250d8d7b4b757543a39a6ce8bb6e3be356c6f)

          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapred/TestLineRecordReader.java
          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LineReader.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestLineRecordReader.java
          • hadoop-mapreduce-project/CHANGES.txt
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/LineRecordReader.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/UncompressedSplitLineReader.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/LineRecordReader.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk #967 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/967/ ) MAPREDUCE-5948 . org.apache.hadoop.mapred.LineRecordReader does not handle multibyte record delimiters well. Contributed by Vinayakumar B, Rushabh Shah, and Akira AJISAKA (jlowe: rev 077250d8d7b4b757543a39a6ce8bb6e3be356c6f) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapred/TestLineRecordReader.java hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LineReader.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestLineRecordReader.java hadoop-mapreduce-project/CHANGES.txt hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/LineRecordReader.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/UncompressedSplitLineReader.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/LineRecordReader.java
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #237 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/237/)
          MAPREDUCE-5948. org.apache.hadoop.mapred.LineRecordReader does not handle multibyte record delimiters well. Contributed by Vinayakumar B, Rushabh Shah, and Akira AJISAKA (jlowe: rev 077250d8d7b4b757543a39a6ce8bb6e3be356c6f)

          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestLineRecordReader.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/UncompressedSplitLineReader.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/LineRecordReader.java
          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LineReader.java
          • hadoop-mapreduce-project/CHANGES.txt
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/LineRecordReader.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapred/TestLineRecordReader.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #237 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/237/ ) MAPREDUCE-5948 . org.apache.hadoop.mapred.LineRecordReader does not handle multibyte record delimiters well. Contributed by Vinayakumar B, Rushabh Shah, and Akira AJISAKA (jlowe: rev 077250d8d7b4b757543a39a6ce8bb6e3be356c6f) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestLineRecordReader.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/UncompressedSplitLineReader.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/LineRecordReader.java hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LineReader.java hadoop-mapreduce-project/CHANGES.txt hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/LineRecordReader.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapred/TestLineRecordReader.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk #2165 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2165/)
          MAPREDUCE-5948. org.apache.hadoop.mapred.LineRecordReader does not handle multibyte record delimiters well. Contributed by Vinayakumar B, Rushabh Shah, and Akira AJISAKA (jlowe: rev 077250d8d7b4b757543a39a6ce8bb6e3be356c6f)

          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/LineRecordReader.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestLineRecordReader.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/LineRecordReader.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/UncompressedSplitLineReader.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapred/TestLineRecordReader.java
          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LineReader.java
          • hadoop-mapreduce-project/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #2165 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2165/ ) MAPREDUCE-5948 . org.apache.hadoop.mapred.LineRecordReader does not handle multibyte record delimiters well. Contributed by Vinayakumar B, Rushabh Shah, and Akira AJISAKA (jlowe: rev 077250d8d7b4b757543a39a6ce8bb6e3be356c6f) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/LineRecordReader.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestLineRecordReader.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/LineRecordReader.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/UncompressedSplitLineReader.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapred/TestLineRecordReader.java hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LineReader.java hadoop-mapreduce-project/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #226 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/226/)
          MAPREDUCE-5948. org.apache.hadoop.mapred.LineRecordReader does not handle multibyte record delimiters well. Contributed by Vinayakumar B, Rushabh Shah, and Akira AJISAKA (jlowe: rev 077250d8d7b4b757543a39a6ce8bb6e3be356c6f)

          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/LineRecordReader.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/LineRecordReader.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/UncompressedSplitLineReader.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestLineRecordReader.java
          • hadoop-mapreduce-project/CHANGES.txt
          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LineReader.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapred/TestLineRecordReader.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #226 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/226/ ) MAPREDUCE-5948 . org.apache.hadoop.mapred.LineRecordReader does not handle multibyte record delimiters well. Contributed by Vinayakumar B, Rushabh Shah, and Akira AJISAKA (jlowe: rev 077250d8d7b4b757543a39a6ce8bb6e3be356c6f) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/LineRecordReader.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/LineRecordReader.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/UncompressedSplitLineReader.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestLineRecordReader.java hadoop-mapreduce-project/CHANGES.txt hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LineReader.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapred/TestLineRecordReader.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #235 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/235/)
          MAPREDUCE-5948. org.apache.hadoop.mapred.LineRecordReader does not handle multibyte record delimiters well. Contributed by Vinayakumar B, Rushabh Shah, and Akira AJISAKA (jlowe: rev 077250d8d7b4b757543a39a6ce8bb6e3be356c6f)

          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestLineRecordReader.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapred/TestLineRecordReader.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/LineRecordReader.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/UncompressedSplitLineReader.java
          • hadoop-mapreduce-project/CHANGES.txt
          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LineReader.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/LineRecordReader.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #235 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/235/ ) MAPREDUCE-5948 . org.apache.hadoop.mapred.LineRecordReader does not handle multibyte record delimiters well. Contributed by Vinayakumar B, Rushabh Shah, and Akira AJISAKA (jlowe: rev 077250d8d7b4b757543a39a6ce8bb6e3be356c6f) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestLineRecordReader.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapred/TestLineRecordReader.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/LineRecordReader.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/UncompressedSplitLineReader.java hadoop-mapreduce-project/CHANGES.txt hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LineReader.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/LineRecordReader.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk #2183 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2183/)
          MAPREDUCE-5948. org.apache.hadoop.mapred.LineRecordReader does not handle multibyte record delimiters well. Contributed by Vinayakumar B, Rushabh Shah, and Akira AJISAKA (jlowe: rev 077250d8d7b4b757543a39a6ce8bb6e3be356c6f)

          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/LineRecordReader.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/UncompressedSplitLineReader.java
          • hadoop-mapreduce-project/CHANGES.txt
          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LineReader.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestLineRecordReader.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapred/TestLineRecordReader.java
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/LineRecordReader.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #2183 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2183/ ) MAPREDUCE-5948 . org.apache.hadoop.mapred.LineRecordReader does not handle multibyte record delimiters well. Contributed by Vinayakumar B, Rushabh Shah, and Akira AJISAKA (jlowe: rev 077250d8d7b4b757543a39a6ce8bb6e3be356c6f) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/LineRecordReader.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/UncompressedSplitLineReader.java hadoop-mapreduce-project/CHANGES.txt hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LineReader.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestLineRecordReader.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapred/TestLineRecordReader.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/LineRecordReader.java
          Hide
          jlowe Jason Lowe added a comment -

          I committed this to branch-2.7 as well.

          Show
          jlowe Jason Lowe added a comment - I committed this to branch-2.7 as well.
          Hide
          jlowe Jason Lowe added a comment -

          I committed this to branch-2.6.

          Show
          jlowe Jason Lowe added a comment - I committed this to branch-2.6.

            People

            • Assignee:
              ajisakaa Akira Ajisaka
              Reporter:
              krisgeus Kris Geusebroek
            • Votes:
              1 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development