Uploaded image for project: 'Flume'
  1. Flume
  2. FLUME-2538

TestResettableFileInputStream fails on JDK 8

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.5.0.1
    • Fix Version/s: 1.6.0
    • Component/s: None
    • Labels:
      None

      Description

      TestResettableFileInputStream.testUtf8DecodeErrorHandlingReplace fails in JDK 8
      "testUtf8DecodeErrorHandlingReplace(org.apache.flume.serialization.TestResettableFileInputStream) Time elapsed: 6 sec <<< FAILURE!
      org.junit.ComparisonFailure: expected:<...(���)
      NonUnicode: (�[])
      > but was:<...(���)
      NonUnicode: (�[����]) "

      Charsetdecoder.decode has changed in its behavior, as to how it handles CodingErrorAction.Replace policy
      Will submit a patch today.

      1. FLUME-2538-1.patch
        1 kB
        Johny Rufus
      2. FLUME-2538.patch
        1 kB
        Johny Rufus

        Issue Links

          Activity

          Hide
          jrufus Johny Rufus added a comment - - edited

          One way to see the difference in the behavior is by running the below code in Jdk 7 and 8
          @Test
          public void testTest() {
          CharsetDecoder decoder = Charsets.UTF_8.newDecoder();

          decoder.onMalformedInput(CodingErrorAction.REPLACE);
          decoder.onUnmappableCharacter(CodingErrorAction.REPLACE);

          ByteBuffer buf = ByteBuffer.allocate(20);
          buf.put(new byte[]

          { (byte)0xf8, (byte)0xa1, (byte)0xa1, (byte)0xa1, (byte)0xa1 }

          );
          buf.flip();

          CharBuffer cbuf = CharBuffer.allocate(1);
          CoderResult res = decoder.decode(buf, cbuf, false);
          System.out.println(decoder.getClass().getName());
          System.out.println("Pos — "buf.position() " cbuf pos --"+cbuf.position());
          }

          Jdk 7 output -->Pos — 5 cbuf pos --1
          Jdk 8 output -->Pos — 1 cbuf pos --1

          In Jdk7: If there is a invalid byte sequence and CodingErrorAction.Replace is specified, then the complete set of invalid bye sequence is treated as one malformed character and replaced by one replacement character in the output buffer [Hence the position is advanced by 5 as seen in the output as its a 5 byte invalid sequence]

          In Jdk8: Each invalid byte in the sequence is treated as a malformed character and hence we see the buffer being advanced by only one position. So for every malformed character, we see the replacement character included in the output buffer

          Attaching a patch that accommodates the above modified behavior

          Show
          jrufus Johny Rufus added a comment - - edited One way to see the difference in the behavior is by running the below code in Jdk 7 and 8 @Test public void testTest() { CharsetDecoder decoder = Charsets.UTF_8.newDecoder(); decoder.onMalformedInput(CodingErrorAction.REPLACE); decoder.onUnmappableCharacter(CodingErrorAction.REPLACE); ByteBuffer buf = ByteBuffer.allocate(20); buf.put(new byte[] { (byte)0xf8, (byte)0xa1, (byte)0xa1, (byte)0xa1, (byte)0xa1 } ); buf.flip(); CharBuffer cbuf = CharBuffer.allocate(1); CoderResult res = decoder.decode(buf, cbuf, false); System.out.println(decoder.getClass().getName()); System.out.println("Pos — " buf.position() " cbuf pos --"+cbuf.position()); } Jdk 7 output -->Pos — 5 cbuf pos --1 Jdk 8 output -->Pos — 1 cbuf pos --1 In Jdk7: If there is a invalid byte sequence and CodingErrorAction.Replace is specified, then the complete set of invalid bye sequence is treated as one malformed character and replaced by one replacement character in the output buffer [Hence the position is advanced by 5 as seen in the output as its a 5 byte invalid sequence] In Jdk8: Each invalid byte in the sequence is treated as a malformed character and hence we see the buffer being advanced by only one position. So for every malformed character, we see the replacement character included in the output buffer Attaching a patch that accommodates the above modified behavior
          Hide
          hshreedharan Hari Shreedharan added a comment -

          Wondering why that was not documented - without documentation, this test is hacky at best (Java folks figure out the regression and fix it – breaks the test on only new versions of Java 8). Can you file a bug for Java, if this is not actually documented?

          Show
          hshreedharan Hari Shreedharan added a comment - Wondering why that was not documented - without documentation, this test is hacky at best (Java folks figure out the regression and fix it – breaks the test on only new versions of Java 8). Can you file a bug for Java, if this is not actually documented?
          Hide
          jrufus Johny Rufus added a comment -

          Atlast I was able to find the bug that caused this change
          https://bugs.openjdk.java.net/browse/JDK-7096080

          The relevant portion to this bug taken from - http://mail.openjdk.java.net/pipermail/core-libs-dev/2011-September/007722.html [which is a link mentioned in this bug report]

          "Another corner case is how to deal with the old 5-6 bytes byte sequence,
          such as
          "fc 80 80 8f bf bf", we are now treating them as 1 malformed utf-8 byte
          sequence, so any
          of these 5-6 bytes "old" formed will be treated one malformed character
          and then replaced
          by one "\ufffd". But according to the new "best practice"
          recommendation, it probably should
          be replaced by 6 \ufffd."

          Show
          jrufus Johny Rufus added a comment - Atlast I was able to find the bug that caused this change https://bugs.openjdk.java.net/browse/JDK-7096080 The relevant portion to this bug taken from - http://mail.openjdk.java.net/pipermail/core-libs-dev/2011-September/007722.html [which is a link mentioned in this bug report] "Another corner case is how to deal with the old 5-6 bytes byte sequence, such as "fc 80 80 8f bf bf", we are now treating them as 1 malformed utf-8 byte sequence, so any of these 5-6 bytes "old" formed will be treated one malformed character and then replaced by one "\ufffd". But according to the new "best practice" recommendation, it probably should be replaced by 6 \ufffd."
          Hide
          hshreedharan Hari Shreedharan added a comment -

          I like the patch, but I'd recommend not accepting both. We should make sure that pre-JDK 8 the old string is valid and for JDK 8 it will be the new string. You can get the java version from the system property - java.version.

          Show
          hshreedharan Hari Shreedharan added a comment - I like the patch, but I'd recommend not accepting both. We should make sure that pre-JDK 8 the old string is valid and for JDK 8 it will be the new string. You can get the java version from the system property - java.version.
          Hide
          jrufus Johny Rufus added a comment -

          Thanks Hari, attaching the modified patch

          Show
          jrufus Johny Rufus added a comment - Thanks Hari, attaching the modified patch
          Hide
          hshreedharan Hari Shreedharan added a comment -

          This looks good. Can you link the OpenJDK bug in the jira? Also, can you see if there is a corresponding bug on bugs.java.com?

          I will run the tests and commit it (I am also renaming the expectedString param to JDK8expectedString)

          Show
          hshreedharan Hari Shreedharan added a comment - This looks good. Can you link the OpenJDK bug in the jira? Also, can you see if there is a corresponding bug on bugs.java.com? I will run the tests and commit it (I am also renaming the expectedString param to JDK8expectedString)
          Hide
          jrufus Johny Rufus added a comment -

          Thanks Hari. Linked both OpenJDK and bugs.java.com.

          Show
          jrufus Johny Rufus added a comment - Thanks Hari. Linked both OpenJDK and bugs.java.com.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 698f0ba2de9c697ea9fe52134e36a694abc28d88 in flume's branch refs/heads/trunk from Hari Shreedharan
          [ https://git-wip-us.apache.org/repos/asf?p=flume.git;h=698f0ba ]

          FLUME-2538. TestResettableFileInputStream fails on JDK 8.

          (Johny Rufus via Hari)

          Show
          jira-bot ASF subversion and git services added a comment - Commit 698f0ba2de9c697ea9fe52134e36a694abc28d88 in flume's branch refs/heads/trunk from Hari Shreedharan [ https://git-wip-us.apache.org/repos/asf?p=flume.git;h=698f0ba ] FLUME-2538 . TestResettableFileInputStream fails on JDK 8. (Johny Rufus via Hari)
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit c53339c028b8d30cf42b208dc259cf61375bbfd5 in flume's branch refs/heads/flume-1.6 from Hari Shreedharan
          [ https://git-wip-us.apache.org/repos/asf?p=flume.git;h=c53339c ]

          FLUME-2538. TestResettableFileInputStream fails on JDK 8.

          (Johny Rufus via Hari)

          Show
          jira-bot ASF subversion and git services added a comment - Commit c53339c028b8d30cf42b208dc259cf61375bbfd5 in flume's branch refs/heads/flume-1.6 from Hari Shreedharan [ https://git-wip-us.apache.org/repos/asf?p=flume.git;h=c53339c ] FLUME-2538 . TestResettableFileInputStream fails on JDK 8. (Johny Rufus via Hari)
          Hide
          hshreedharan Hari Shreedharan added a comment -

          Committed! Thanks Johny!

          Show
          hshreedharan Hari Shreedharan added a comment - Committed! Thanks Johny!
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in flume-trunk #697 (See https://builds.apache.org/job/flume-trunk/697/)
          FLUME-2538. TestResettableFileInputStream fails on JDK 8. (hshreedharan: http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=698f0ba2de9c697ea9fe52134e36a694abc28d88)

          • flume-ng-core/src/test/java/org/apache/flume/serialization/TestResettableFileInputStream.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in flume-trunk #697 (See https://builds.apache.org/job/flume-trunk/697/ ) FLUME-2538 . TestResettableFileInputStream fails on JDK 8. (hshreedharan: http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=698f0ba2de9c697ea9fe52134e36a694abc28d88 ) flume-ng-core/src/test/java/org/apache/flume/serialization/TestResettableFileInputStream.java
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Flume-trunk-hbase-98 #54 (See https://builds.apache.org/job/Flume-trunk-hbase-98/54/)
          FLUME-2538. TestResettableFileInputStream fails on JDK 8. (hshreedharan: http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=698f0ba2de9c697ea9fe52134e36a694abc28d88)

          • flume-ng-core/src/test/java/org/apache/flume/serialization/TestResettableFileInputStream.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Flume-trunk-hbase-98 #54 (See https://builds.apache.org/job/Flume-trunk-hbase-98/54/ ) FLUME-2538 . TestResettableFileInputStream fails on JDK 8. (hshreedharan: http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=698f0ba2de9c697ea9fe52134e36a694abc28d88 ) flume-ng-core/src/test/java/org/apache/flume/serialization/TestResettableFileInputStream.java

            People

            • Assignee:
              jrufus Johny Rufus
              Reporter:
              jrufus Johny Rufus
            • Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development