Uploaded image for project: 'Commons Codec'
  1. Commons Codec
  2. CODEC-301

BaseNCodec: Reduce byte[] allocations by reusing buffers

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 1.15
    • 1.16
    • None

    Description

      BaseNCodec will encode or decode the input bytes into a byte[] buffer stored in a Context. The buffers are constantly reallocated when using the codecs via a BaseNCodecInputStream. 

      The Context buffer is set to null to signal no more bytes are available. This requires reallocation for the next chunk of input from the stream. The underlying stream is also read using a single use byte[] allocated inside the read loop:

              while (readLen == 0) {
                  if (!baseNCodec.hasData(context)) {
                      // *****
                      // This should be allocated once!
                      // *****
                      final byte[] buf = new byte[doEncode ? 4096 : 8192];
                      final int c = in.read(buf);
                      if (doEncode) {
                          baseNCodec.encode(buf, 0, c, context);
                      } else {
                          baseNCodec.decode(buf, 0, c, context);
                      }
                  }
                  readLen = baseNCodec.readResults(array, offset, len, context);
              }
      

      The code can be changed to hold a single buffer to read the underlying input stream at the class level. Changes can be made to BaseNCodec to not set the Context buffer to null as a signal. It can then be reused by the BaseNCodecInputStream. This requires updating the check for available bytes to use the position markers in BaseNCodec, for example (old code commented out for reference):

      /    int available(final Context context) {  // package protected for access from I/O streams
              return hasData(context) ? context.pos - context.readPos : 0;
              //return context.buffer != null ? context.pos - context.readPos : 0;
          }
      
          boolean hasData(final Context context) {  // package protected for access from I/O streams
              return context.pos > context.readPos;
              //return context.buffer != null;
          }
      
          int readResults(final byte[] b, final int bPos, final int bAvail, final Context context) {
              if (hasData(context)) {
              //if (context.buffer != null) {
                  final int len = Math.min(available(context), bAvail);
                  System.arraycopy(context.buffer, context.readPos, b, bPos, len);
                  context.readPos += len;
                  if (context.readPos >= context.pos) {
                      // All data read.
                      // Reset markers so hasData() will return false, and this method can return -1,
                      // but do not set buffer to null to allow reuse.
                      context.pos = context.readPos = 0;
                  //    context.buffer = null; // so hasData() will return false, and this method can return -1
                  }
                  return len;
              }
              return context.eof ? EOF : 0;
          }
      

      This change was suggested by Alexander Pinske.

      The reuse of byte buffers reduces byte[] allocations from 280MB to <4MB when reading a 133MB base64 stream.

      Measured with JFR, see https://github.com/apinske/playground-io.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              aherbert Alex Herbert
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: