Uploaded image for project: 'Commons Codec'
  1. Commons Codec
  2. CODEC-280

Base32/64 to allow optional strict/lenient decoding

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 1.14
    • Fix Version/s: 1.15
    • Labels:
      None

      Description

      Base32 decodes blocks of 8 characters.

      Base64 decodes blocks of 4 characters.

      At the end of decoding some extra characters may be left. They are decoded using the appropriate bits. The bits that do not sum to form a byte (i.e. less than 8 bits) are discarded.

      Currently if there are more than 8 bits left then the available bytes are extracted and the left over bits are validated to check they are zeros. If they are not zeros then an exception is raised. This functionality was added to ensure that a byte array that is decoded will be re-encoded to the exact same byte array (ignoring input padding).

      There are two issues:

      1. If the leftover bits are less than 8 then no attempt can be made to obtain the last bytes. However an exception is not raised indicating that the encoding was invalid (no left-over bits should be unaccounted for).
      2. This raising of exceptions for leftover bits is causing reports from users that codec is not working as it used to. This is true but only because the user has some badly encoded bytes they want to decode. Since other libraries allow this then it seems that two options for decoding are required.

      I suggest fixing the encoding so that it operates in two modes: strict and lenient.

      • Strict will throw an exception whenever there are unaccounted for bits.
      • Lenient will just discard the extra bits that cannot be used.

      Lenient is the default for backward compatibility restoring functionality of the class to versions prior to 1.13.

       Strict is enabled using a method:

      Base64 codec = new Base64();
      byte[] bytes = new byte{ 'E' };
      Assertions.assertArrayEquals(new byte[0] () -> codec.decode(bytes));
      codec.setStrictDecoding(true);
      Assertions.assertThrows(IllegalArgumentException.class, () -> codec.decode());
      

      Using strict encoding should ensure that a round trip returns the same bytes:

      byte[] bytes = ...; // Some valid encoding with no padding characters
      Base64 codec = new Base64();
      codec.setStrictDecoding(true);
      Assertions.assertArrayEquals(bytes, codec.encode(codec.decode(bytes)));
      

        Attachments

        Issue Links

          Activity

            People

            • Assignee:
              aherbert Alex Herbert
              Reporter:
              aherbert Alex Herbert

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 20m
                20m

                  Issue deployment