Commons Codec
  1. Commons Codec
  2. CODEC-99

Base64.encodeBase64String() shouldn't chunk

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.4
    • Fix Version/s: 1.5
    • Labels:
      None

      Description

      Base64.encodeBase64String() shouldn't chunk.

      Change this:

      public static String encodeBase64String(byte[] binaryData) {
          return StringUtils.newStringUtf8(encodeBase64(binaryData, true));
      }
      

      To this:

      public static String encodeBase64String(byte[] binaryData) {
          return StringUtils.newStringUtf8(encodeBase64(binaryData, false));
      }
      

      This will fix the following tests ggregory added a few minutes ago:

      //assertEquals("Zg==", Base64.encodeBase64String(StringUtils.getBytesUtf8("f")));
      //assertEquals("Zm8=", Base64.encodeBase64String(StringUtils.getBytesUtf8("fo")));
      //assertEquals("Zm9v", Base64.encodeBase64String(StringUtils.getBytesUtf8("foo")));
      //assertEquals("Zm9vYg==", Base64.encodeBase64String(StringUtils.getBytesUtf8("foob")));
      //assertEquals("Zm9vYmE=", Base64.encodeBase64String(StringUtils.getBytesUtf8("fooba")));
      //assertEquals("Zm9vYmFy", Base64.encodeBase64String(StringUtils.getBytesUtf8("foobar")));

      1. codec-99.patch
        0.6 kB
        Julius Davies
      2. codec-99-javadoc.patch
        1 kB
        Julius Davies
      3. codec-99-test-fixes.patch
        2 kB
        Julius Davies
      4. codec-99-tests.patch
        3 kB
        Julius Davies

        Activity

        Hide
        Julius Davies added a comment -

        Oops, I forgot to update the Javadoc. Thanks, Sebb, for pointing that out!

        Show
        Julius Davies added a comment - Oops, I forgot to update the Javadoc. Thanks, Sebb, for pointing that out!
        Hide
        ggregory@seagullsw.com added a comment - - edited

        WRT "Yes it's declared static but it CAN also be used as a normal non-static function. In that case it's not behaving correctly."

        Are we talking about the same thing?

        A static method will never use values from an instance.

        Might you be confusing run-time behavior and compiler behavior when you say "normal non-static function"?

        In your example:

        Base64 encoder = new Base64(0);
        encoder.encodeBase64String(binaryData);

        Is equivalent to (and should really be written in this style:

        Base64 encoder = new Base64(0);
        Base64.encodeBase64String(binaryData);

        Show
        ggregory@seagullsw.com added a comment - - edited WRT "Yes it's declared static but it CAN also be used as a normal non-static function. In that case it's not behaving correctly." Are we talking about the same thing? A static method will never use values from an instance. Might you be confusing run-time behavior and compiler behavior when you say "normal non-static function"? In your example: Base64 encoder = new Base64(0); encoder.encodeBase64String(binaryData); Is equivalent to (and should really be written in this style: Base64 encoder = new Base64(0); Base64.encodeBase64String(binaryData);
        Hide
        Marc Ende added a comment - - edited

        Yes it's declared static but it CAN also be used as a normal non-static function. In that case it's not behaving correctly.

        By the way it's using class-level fields when following the calls (buffer).

        Why not using the static - constructor. There you'll can set the standard settings which leads to the results of the static-calls
        When used as a normal function it can be modified using the standard constructors.

        Show
        Marc Ende added a comment - - edited Yes it's declared static but it CAN also be used as a normal non-static function. In that case it's not behaving correctly. By the way it's using class-level fields when following the calls (buffer). Why not using the static - constructor. There you'll can set the standard settings which leads to the results of the static-calls When used as a normal function it can be modified using the standard constructors.
        Hide
        ggregory@seagullsw.com added a comment -

        The example in https://issues.apache.org/jira/browse/CODEC-99?focusedCommentId=12924892&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12924892 does not make sense:

        Base64 encoder = new Base64(0);
        encoder.encodeBase64String(binaryData);

        The method encodeBase64String is a static method so instance variables will NEVER apply!

        Show
        ggregory@seagullsw.com added a comment - The example in https://issues.apache.org/jira/browse/CODEC-99?focusedCommentId=12924892&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12924892 does not make sense: Base64 encoder = new Base64(0); encoder.encodeBase64String(binaryData); The method encodeBase64String is a static method so instance variables will NEVER apply!
        Hide
        Julius Davies added a comment -

        I applied "codec-99.patch" and "codec-99-tests.patch". Quick summary of consequences:

        1. Another binary incompatibility with codec-1.4.jar is introduced with this.

        2. But now the String static encode in 1.4 is more consistent with the pre-existing byte[] static encode methods from 1.3, 1.2, 1.1.

        Show
        Julius Davies added a comment - I applied "codec-99.patch" and "codec-99-tests.patch". Quick summary of consequences: 1. Another binary incompatibility with codec-1.4.jar is introduced with this. 2. But now the String static encode in 1.4 is more consistent with the pre-existing byte[] static encode methods from 1.3, 1.2, 1.1.
        Hide
        Marc Ende added a comment -

        If I'd like to encode a small string, I'll agree:

        new Base64().encodeToString(...)

        But it doesn't matter in this point if this is chunked or not.

        Other people which like to encode larger strings won't use new Base64(0) or new Base64(-1) when they wanted chunked
        files. Otherwise they've read the docs which mentioned the way to use unchunked. I think it wouldn't break upwards compatibility because there is no really use case where you'll instantiate an Base64 encoder with 0 or less than 0 when it should be chunked. In these case even the "convenience methods" should behaviour correctly. Or otherwise (to keep upwards compatibility) it should be explicitly mentioned in the apidocs.

        Currently it looks like buying a car, choosing red as the color and only if you're opening the right door the car became green.

        Show
        Marc Ende added a comment - If I'd like to encode a small string, I'll agree: new Base64().encodeToString(...) But it doesn't matter in this point if this is chunked or not. Other people which like to encode larger strings won't use new Base64(0) or new Base64(-1) when they wanted chunked files. Otherwise they've read the docs which mentioned the way to use unchunked. I think it wouldn't break upwards compatibility because there is no really use case where you'll instantiate an Base64 encoder with 0 or less than 0 when it should be chunked. In these case even the "convenience methods" should behaviour correctly. Or otherwise (to keep upwards compatibility) it should be explicitly mentioned in the apidocs. Currently it looks like buying a car, choosing red as the color and only if you're opening the right door the car became green.
        Hide
        Jochen Wiedmann added a comment -

        You possibly wouldn't, because you are encoding a small string, like a password. Others would, because they are encoding a very large string. Whatever the default, if you need to be sure, don't use the convenience methods. Period!

        The only reason for changing the default would be upwards compatibility. If we change it for other reasons, we loose upwards compatibility.

        Show
        Jochen Wiedmann added a comment - You possibly wouldn't, because you are encoding a small string, like a password. Others would, because they are encoding a very large string. Whatever the default, if you need to be sure, don't use the convenience methods. Period! The only reason for changing the default would be upwards compatibility. If we change it for other reasons, we loose upwards compatibility.
        Hide
        Marc Ende added a comment - - edited

        When I do:

        Base64 encoder = new Base64(0);
        encoder.encodeBase64String(binaryData);

        I would expect that the encoded string isn't chunked, I've set the appropriate settings in the
        constructor before. But the current encodeBase64String() "overrides" the settings I've done.

        The methods encodeBase64String(..) and so on should respect the settings made in the constructor in my opinion.

        Show
        Marc Ende added a comment - - edited When I do: Base64 encoder = new Base64(0); encoder.encodeBase64String(binaryData); I would expect that the encoded string isn't chunked, I've set the appropriate settings in the constructor before. But the current encodeBase64String() "overrides" the settings I've done. The methods encodeBase64String(..) and so on should respect the settings made in the constructor in my opinion.
        Hide
        Julius Davies added a comment - - edited

        To respond to Gary, here is behaviour of Base64.encodeBase64Chunked(b) through all versions of commons-codec:

        package playground;
        import org.apache.commons.codec.binary.Base64;
        public class Base64ChunkTest {
        
            public static void main(String[] args) throws Exception {
                byte[] b = args[0].getBytes("UTF-8");
        
                b = Base64.encodeBase64Chunked(b);
        
                String s = new String(b, "UTF-8");
                s = s.replace("\n", "\\n").replace("\r", "\\r");
                System.out.println(s);
            }
        }
        
        $ java -cp build/classes:lib/commons-codec-1.1.jar playground.Base64ChunkTest Hello
        SGVsbG8=\n
        
        $ java -cp build/classes:lib/commons-codec-1.2.jar playground.Base64ChunkTest Hello
        SGVsbG8=\n
        
        $ java -cp build/classes:lib/commons-codec-1.3.jar playground.Base64ChunkTest Hello
        SGVsbG8=\r\n
        
        $ java -cp build/classes:lib/commons-codec-1.4.jar playground.Base64ChunkTest Hello
        SGVsbG8=\r\n
        

        Just to be clear, you want to change this now after 7 years (the files in the commons-codec-1.1.jar have '2003-04-29' timestamps)?

        ps. I don't think '=' should be part of the equation, because valid data can encode and not have any '=' characters, for example the string '123':

        $ java -cp build/classes:lib/commons-codec-1.4.jar playground.Base64ChunkTest 123
        MTIz\r\n
        
        Show
        Julius Davies added a comment - - edited To respond to Gary, here is behaviour of Base64.encodeBase64Chunked(b) through all versions of commons-codec: package playground; import org.apache.commons.codec.binary.Base64; public class Base64ChunkTest { public static void main( String [] args) throws Exception { byte [] b = args[0].getBytes( "UTF-8" ); b = Base64.encodeBase64Chunked(b); String s = new String (b, "UTF-8" ); s = s.replace( "\n" , "\\n" ).replace( "\r" , "\\r" ); System .out.println(s); } } $ java -cp build/classes:lib/commons-codec-1.1.jar playground.Base64ChunkTest Hello SGVsbG8=\n $ java -cp build/classes:lib/commons-codec-1.2.jar playground.Base64ChunkTest Hello SGVsbG8=\n $ java -cp build/classes:lib/commons-codec-1.3.jar playground.Base64ChunkTest Hello SGVsbG8=\r\n $ java -cp build/classes:lib/commons-codec-1.4.jar playground.Base64ChunkTest Hello SGVsbG8=\r\n Just to be clear, you want to change this now after 7 years (the files in the commons-codec-1.1.jar have '2003-04-29' timestamps)? ps. I don't think '=' should be part of the equation, because valid data can encode and not have any '=' characters, for example the string '123': $ java -cp build/classes:lib/commons-codec-1.4.jar playground.Base64ChunkTest 123 MTIz\r\n
        Hide
        ggregory@seagullsw.com added a comment -

        IMO, the issue is not chunk vs. not chunk but that the output should never encode and end in a CR LR, especially when there is no = at the end of the output

        Show
        ggregory@seagullsw.com added a comment - IMO, the issue is not chunk vs. not chunk but that the output should never encode and end in a CR LR, especially when there is no = at the end of the output
        Hide
        ggregory@seagullsw.com added a comment -
        Show
        ggregory@seagullsw.com added a comment - Applied patch https://issues.apache.org/jira/secure/attachment/12439949/codec-99-test-fixes.patch Thank you Julius.
        Hide
        Julius Davies added a comment -

        Some of the JUnit tests have a problem:

        "Zm9v" + Base64.CHUNK_SEPARATOR

        This results in:

        Zm9v[B@6bbc4459

        As opposed to:

        Zm9v\r\n

        codec-99-test-fixes.patch attached to fix this.

        Show
        Julius Davies added a comment - Some of the JUnit tests have a problem: "Zm9v" + Base64.CHUNK_SEPARATOR This results in: Zm9v[B@6bbc4459 As opposed to: Zm9v\r\n codec-99-test-fixes.patch attached to fix this.
        Hide
        Julius Davies added a comment -

        Changes to unit-tests also attached. Notice that two old tests assume the old, erroneous behavior, and so those need to be adjusted, too:

        -        assertEquals("byteToString static Hello World", "SGVsbG8gV29ybGQ=\r\n", Base64.encodeBase64String(b1));
        +        assertEquals("byteToString static Hello World", "SGVsbG8gV29ybGQ=", Base64.encodeBase64String(b1));
        
        
        -        assertEquals("byteToString static UUID", "K/fMJwH+Q5e0nr7tWsxwkA==\r\n", Base64.encodeBase64String(b4));
        +        assertEquals("byteToString static UUID", "K/fMJwH+Q5e0nr7tWsxwkA==", Base64.encodeBase64String(b4));
        
        Show
        Julius Davies added a comment - Changes to unit-tests also attached. Notice that two old tests assume the old, erroneous behavior, and so those need to be adjusted, too: - assertEquals( "byteToString static Hello World" , "SGVsbG8gV29ybGQ=\r\n" , Base64.encodeBase64String(b1)); + assertEquals( "byteToString static Hello World" , "SGVsbG8gV29ybGQ=" , Base64.encodeBase64String(b1)); - assertEquals( "byteToString static UUID" , "K/fMJwH+Q5e0nr7tWsxwkA==\r\n" , Base64.encodeBase64String(b4)); + assertEquals( "byteToString static UUID" , "K/fMJwH+Q5e0nr7tWsxwkA==" , Base64.encodeBase64String(b4));
        Hide
        Julius Davies added a comment -

        One-line patch to fix this attached.

        Show
        Julius Davies added a comment - One-line patch to fix this attached.

          People

          • Assignee:
            Unassigned
            Reporter:
            Julius Davies
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development