Commons Codec
  1. Commons Codec
  2. CODEC-140

isBase64 returns true for any UTF8 string

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Not A Problem
    • Affects Version/s: 1.6
    • Fix Version/s: None
    • Labels:
      None
    • Environment:

      windows/linux

      Description

      I just called Base64.isBase64("Hello") for instance and it returns true. I thought it would return true only if it is valid base64 encoded string.

        Activity

        Hide
        Sebb added a comment -

        Adding further validation to check the length etc would change the behaviour of the method, which is currently clearly documented to only check for valid characters in the Base64 alphabet.

        Not sure it's worth creating 3 other methods to add this extra validation.

        Show
        Sebb added a comment - Adding further validation to check the length etc would change the behaviour of the method, which is currently clearly documented to only check for valid characters in the Base64 alphabet. Not sure it's worth creating 3 other methods to add this extra validation.
        Hide
        btpka3 added a comment -

        Should enhance Base64.isBase64(byte[]) to check (1) lengh%4==0 after remove whitespace (2) '=' can not occured in the middle ?

        Show
        btpka3 added a comment - Should enhance Base64.isBase64(byte[]) to check (1) lengh%4==0 after remove whitespace (2) '=' can not occured in the middle ?
        Hide
        Julius Davies added a comment -

        I don't think it's a bug. The javadoc on isBase64() makes it clear that "Hello" should return true:

        "Tests a given String to see if it contains only valid characters within the Base64 alphabet. Currently the method treats whitespace as valid."

        http://commons.apache.org/codec/apidocs/org/apache/commons/codec/binary/Base64.html#isBase64%28java.lang.String%29

        Show
        Julius Davies added a comment - I don't think it's a bug. The javadoc on isBase64() makes it clear that "Hello" should return true: "Tests a given String to see if it contains only valid characters within the Base64 alphabet. Currently the method treats whitespace as valid." http://commons.apache.org/codec/apidocs/org/apache/commons/codec/binary/Base64.html#isBase64%28java.lang.String%29
        Hide
        Jochen Wiedmann added a comment -

        If the bug report should be confirmed, then I'd take it as the main issue that this isn't covered (and thus prevented) by the unit tests.

        Show
        Jochen Wiedmann added a comment - If the bug report should be confirmed, then I'd take it as the main issue that this isn't covered (and thus prevented) by the unit tests.
        Hide
        Julius Davies added a comment -

        Consider this:

            byte[] b = new byte[] { (byte) 29, (byte) -23, (byte) 101, (byte) -93 };
            System.out.println(Base64.encodeBase64URLSafeString(b));
        

        It prints "Hellow" on the console. So it makes sense for Base64.isBase64("Hellow") to return true.

        As for "Hello"... isBase64() just cares that the characters are in the base64 alphabet. Turns out it is impossible, though, to encode any binary sequence such that a 5-character encoding is created, since that would correspond to a 30 bit sequence. The final 6 bits of encoding (the 5th character) would always get discarded during the decode().

        So maybe one could make a case that isBase64("Hello") should return false because of the 5th character being meaningless, but so far that hasn't been our approach.

        Show
        Julius Davies added a comment - Consider this: byte [] b = new byte [] { ( byte ) 29, ( byte ) -23, ( byte ) 101, ( byte ) -93 }; System .out.println(Base64.encodeBase64URLSafeString(b)); It prints "Hellow" on the console. So it makes sense for Base64.isBase64("Hellow") to return true. As for "Hello"... isBase64() just cares that the characters are in the base64 alphabet. Turns out it is impossible, though, to encode any binary sequence such that a 5-character encoding is created, since that would correspond to a 30 bit sequence. The final 6 bits of encoding (the 5th character) would always get discarded during the decode(). So maybe one could make a case that isBase64("Hello") should return false because of the 5th character being meaningless, but so far that hasn't been our approach.
        Hide
        Mohit Anchlia added a comment -

        I thought isBase64 would return only true for those strings that were originally encoded to base64. For eg:

        String b = Base64.encodeBase64(bytes);
        Base64.isBase64(b); //return true expected
        Base64.isBase64("Hello"); // return false expected since it wasn't encoded to bsae64

        I think isBase64 doesn't work this way as I thought it would?

        Show
        Mohit Anchlia added a comment - I thought isBase64 would return only true for those strings that were originally encoded to base64. For eg: String b = Base64.encodeBase64(bytes); Base64.isBase64(b); //return true expected Base64.isBase64("Hello"); // return false expected since it wasn't encoded to bsae64 I think isBase64 doesn't work this way as I thought it would?
        Hide
        Julius Davies added a comment -

        ps. sorry to make you submit a new bug request just to have me close it right away! I hope it wasn't too much trouble!!!

        Show
        Julius Davies added a comment - ps. sorry to make you submit a new bug request just to have me close it right away! I hope it wasn't too much trouble!!!
        Hide
        Julius Davies added a comment -

        The characters in the word "Hello" are all valid according to the Base64 alphabet we are using. Take a look at "Page 24" of this document:

        http://www.ietf.org/rfc/rfc2045.txt

        Show
        Julius Davies added a comment - The characters in the word "Hello" are all valid according to the Base64 alphabet we are using. Take a look at "Page 24" of this document: http://www.ietf.org/rfc/rfc2045.txt
        Hide
        Julius Davies added a comment -

        "Hello" is a valid Base64 encoding. Not all UTF-8 strings return true. Consider the following:

            /*
              Prints the following:
        
              Hello: true
              Olé!   false
              Hello! false
            */
            System.out.println("Hello: " + Base64.isBase64("Hello"));
            System.out.println("Olé!   " + Base64.isBase64("Olé!"));
            System.out.println("Hello! " + Base64.isBase64("Hello!"));
        
        Show
        Julius Davies added a comment - "Hello" is a valid Base64 encoding. Not all UTF-8 strings return true. Consider the following: /* Prints the following: Hello: true Olé! false Hello! false */ System .out.println( "Hello: " + Base64.isBase64( "Hello" )); System .out.println( "Olé! " + Base64.isBase64( "Olé!" )); System .out.println( "Hello! " + Base64.isBase64( "Hello!" ));

          People

          • Assignee:
            Julius Davies
            Reporter:
            Mohit Anchlia
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development