Issue Details (XML | Word | Printable)

Key: CODEC-59
Type: Improvement Improvement
Status: Closed Closed
Resolution: Fixed
Priority: Minor Minor
Assignee: Unassigned
Reporter: Pepijn Schmitz
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Commons Codec

Add methods to Base64 which work with String instead of byte[]

Created: 16/Nov/07 12:31 PM   Updated: 07/Aug/09 08:12 AM
Return to search
Component/s: None
Affects Version/s: None
Fix Version/s: 1.4

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works codec59-2009-07-28.patch 2009-07-28 07:24 PM Julius Davies 11 kB

Resolution Date: 01/Aug/09 04:16 AM


 Description  « Hide
It would be great if the Base64 class had String encode(byte[]) and byte[] decode(String) methods. The RFC is stated in terms of "characters" for the encoding, so there should be no problem with unwarranted assumptions with this. Currently everyone is having to convert to and from Strings themselves.

 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Joerg Schaible added a comment - 16/Nov/07 01:11 PM
As explained in CODEC-58 the term character refers 8-bit values in the RFC and 16-bit values in Java. The encoding can never receive a String, since it has no idea about the encoding of the String and the algorithm encodes arbitrary bytes. Same applies to the result of a decoding. The result of an encoding and the input of the decoding could be Strings in ASCII encoding.

Pepijn Schmitz added a comment - 16/Nov/07 02:35 PM
Right, that's exactly what I meant and wrote. The codec could encode bytes TO Strings (with a String encode(byte[]) method) and decode bytes FROM Strings (with a byte[] decode(String) method). These Strings would only contain characters from the base 64 character set (which is a subset of ASCII).

Gary Gregory added a comment - 16/Nov/07 05:59 PM
I've had a need for this in the past as well. This will happen much quicker if you submit a patch for code and unit tests with your request

Henri Yandell added a comment - 30/May/08 06:26 AM
No patches to date, so moving out of the 1.4 release plan.

Julius Davies added a comment - 17/Jul/09 11:37 PM - edited
codec59.patch attached. It introduces two new methods:

public String Base64.encodeToString( byte[] b )

public byte[] Base64.decodeFromString( String s )

The String the first method produces chunks the output into 76-character lines. If consumers want very-long single lines instead they can run .replaceAll("\r\n", "") against the output. (I chose chunking as default behaviour because starting with a very-long-line and breaking it into chunks is more of a pain).

Even if this patch is not accepted, I strongly recommend taking in these 5 lines, since they fix a very subtle and rare bug (newly introduced in CODEC-69 I suspect):

@@ -698,7 +715,11 @@
             len += 4 - mod;
         }
         if (isChunked) {
-            len += (1 + (len / CHUNK_SIZE)) * CHUNK_SEPARATOR.length;
+            boolean lenChunksPerfectly = len % CHUNK_SIZE == 0;
+            len += (len / CHUNK_SIZE) * CHUNK_SEPARATOR.length;
+            if (!lenChunksPerfectly) {
+                len += CHUNK_SEPARATOR.length;
+            }
         }

Gary Gregory added a comment - 19/Jul/09 03:16 PM
About the "5 lines" fix: can you provide a test that shows the bug. It would be good to see a test to address this fix specifically.

Julius Davies added a comment - 20/Jul/09 04:29 PM
The tests already included in the patch cover the bug. If you accept everything in the patch except Base64.java, I think you just need to add a test like this:

byte[] result Base64.encodeBase64(Base64TestData.DECODED, true);
byte[] expectedResult = Base64TestData.ENCODED_76_CHARS_PER_LINE.replaceAll("\n", "\r\n").getBytes("UTF-8");
assertTrue("lenChunksPerfectly", Arrays.equals(result, expectedResult));

Because Base64TestData.DECODED encodes into a multiple of 76 the original code creates an array that is slightly too big to hold it. The result array ends up having two 0 bytes at the very end.


Julius Davies added a comment - 27/Jul/09 04:50 AM
Attached file (codec59-but-only-after-codec81.patch) seems to work!

Julius Davies added a comment - 27/Jul/09 06:46 PM
Attached patch introduces following "String" methods to Base64.java:

Non-Static methods:
-------------------------------
1. public byte[] decode(String pArray)
2. public String encodeToString(byte[] pArray) {

Static methods:
-------------------------------
3. public static byte[] decodeBase64(String base64String)
4. public static String encodeBase64String(byte[] binaryData)
5. public static String encodeBase64URLSafeString(byte[] binaryData)

The patch can only be applied after both of the CODEC-81 patches have been applied.


Julius Davies added a comment - 28/Jul/09 07:24 PM
Attached codec59-2009-07-28.patch.

Introduces the String oriented methods previously mentioned in comments here.

Also improves StringUtils handling of null input, as well as Base64.decode() of null. The reset() calls added to Base64 are pretty important I think in case people try to re-use Base64 objects, e.g.:

Base64 b64 = new Base64();
b64.encode( someThing );
b64.encode( someOtherThing );

This will only work with the reset() calls I've added as part of this patch.


Gary Gregory added a comment - 01/Aug/09 04:16 AM
Patch applied with one added test code line to keep line code coverage at 100%. Branch coverage up to 92% from 91%.