Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.3
    • Fix Version/s: 1.4
    • Labels:
      None

      Description

      Regular Base64 uses + and / for code point 62 and 63. URL-Safe Base64 uses - and _ instead. Also, URL-Safe base64 omits the == padding to help preserve space.

      http://en.wikipedia.org/wiki/Base64#URL_applications

      Approach:

      decode() should be able to seamlessly handle either situation. This means interpreting +- and /_ as equivalents at any time during the decode. decode() also needs to be more robust against missing padding characters.

      encode() should either emit +/ or -_ depending on a mode set during Base64 construction.

      Since URL-SAFE is all about URL's (e.g. HTTP GET) and most browsers are limited to 1KB of query string, I do not think we need to bother making the URL-MODE available in the stream-oriented classes. (Nonetheless, the streams should be able to decode URL-SAFE, but I don't think they should produce it).

      Some background information: I'm putting together a webapp that integrates with MS Active Directory. Users and Groups are keyed of objectGUID with is a 128 bit int. Would be nice to get some support for this kind of thing:

      "Click to view the profile of <a href='?u=_3-PwBzbRxqMi1qTBhg_6A'>Julius Davies</a>."

      Right now I'm using Hex:

      "Click to view the profile of <a href='?u=ff7f8fc01cdb471a8c8b5a9306183fe8'>Julius Davies</a>."

      Current Base64 class would output this, and I suspect the + would screw things up:

      "Click to view the profile of <a href='?u=/3+PwBzbRxqMi1qTBhg/6A=='>Julius Davies</a>."

      1. codec75.patch
        20 kB
        Julius Davies

        Issue Links

          Activity

          Hide
          ggregory@seagullsw.com added a comment -

          Patch applied and other changes made.

          Show
          ggregory@seagullsw.com added a comment - Patch applied and other changes made.
          Hide
          Julius Davies added a comment -

          Here's a comparison of commons-codec-1.3.jar vs. this patch on decoding some examples. The correct decoding is ff7f8fc01cdb471a8c8b5a9306183fe8, which codec-1.3 can only do when the proper final padding ("==") is present.

          codec-1.3 (base64 I used is on the right)
          --------------------
          ff7f8fc01cdb471a8c8b5a9306183f0000 "/3+PwBzbRxqMi1qTBhg/6A"
          ff7f8fc01cdb471a8c8b5a9306183f0000 "/3+PwBzbRxqMi1qTBhg/6A="
          ff7f8fc01cdb471a8c8b5a9306183fe8 "/3+PwBzbRxqMi1qTBhg/6A=="
          dcfc01cdb471a8c8b5a93061000000 "_3-PwBzbRxqMi1qTBhg_6A"

          codec-trunk + patch (base64 I used is on the right)
          --------------------
          ff7f8fc01cdb471a8c8b5a9306183fe8 "/3+PwBzbRxqMi1qTBhg/6A"
          ff7f8fc01cdb471a8c8b5a9306183fe8 "/3+PwBzbRxqMi1qTBhg/6A="
          ff7f8fc01cdb471a8c8b5a9306183fe8 "/3+PwBzbRxqMi1qTBhg/6A=="
          ff7f8fc01cdb471a8c8b5a9306183fe8 "_3-PwBzbRxqMi1qTBhg_6A"

          Note: that's a hex representation of the decoded value. The actual decoding is binary (byte[]).

          The patch includes this example in the new JUnit tests.

          Show
          Julius Davies added a comment - Here's a comparison of commons-codec-1.3.jar vs. this patch on decoding some examples. The correct decoding is ff7f8fc01cdb471a8c8b5a9306183fe8, which codec-1.3 can only do when the proper final padding ("==") is present. codec-1.3 (base64 I used is on the right) -------------------- ff7f8fc01cdb471a8c8b5a9306183f0000 "/3+PwBzbRxqMi1qTBhg/6A" ff7f8fc01cdb471a8c8b5a9306183f0000 "/3+PwBzbRxqMi1qTBhg/6A=" ff7f8fc01cdb471a8c8b5a9306183fe8 "/3+PwBzbRxqMi1qTBhg/6A==" dcfc01cdb471a8c8b5a93061000000 "_3-PwBzbRxqMi1qTBhg_6A" codec-trunk + patch (base64 I used is on the right) -------------------- ff7f8fc01cdb471a8c8b5a9306183fe8 "/3+PwBzbRxqMi1qTBhg/6A" ff7f8fc01cdb471a8c8b5a9306183fe8 "/3+PwBzbRxqMi1qTBhg/6A=" ff7f8fc01cdb471a8c8b5a9306183fe8 "/3+PwBzbRxqMi1qTBhg/6A==" ff7f8fc01cdb471a8c8b5a9306183fe8 "_3-PwBzbRxqMi1qTBhg_6A" Note: that's a hex representation of the decoded value. The actual decoding is binary (byte[]). The patch includes this example in the new JUnit tests.
          Hide
          Julius Davies added a comment - - edited

          Attached patch implements CODEC-75: URL-SAFE encoding/decoding now possible.

          Show
          Julius Davies added a comment - - edited Attached patch implements CODEC-75 : URL-SAFE encoding/decoding now possible.
          Hide
          Julius Davies added a comment -

          I have a patch ready that passes all JUnit (as well as adding a few more JUnit). I just have to work on Javadocs before I submit it. Should be attached within next 24 hours.

          Show
          Julius Davies added a comment - I have a patch ready that passes all JUnit (as well as adding a few more JUnit). I just have to work on Javadocs before I submit it. Should be attached within next 24 hours.

            People

            • Assignee:
              Gary Gregory
              Reporter:
              Julius Davies
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development