The EventIdFirstSchemaRecordWriter that is used by the provenance repository has a writeRecord(ProvenanceEventRecord) method. Within this method, it serializes the given record into a byte array by serializing to a ByteArrayOutputStream (after wrapping the BAOS in a DataOutputStream). Once this is done, it calls toByteArray() on that BAOS so that it can write the byte directly to another OutputStream.
This can create a rather large amount of garbage to be collected. We can improve this significantly:
- Instead of creating a new ByteArrayOutputStream each time, create a pool of them. This avoids constantly having to garbage collect them.
- If said BAOS grows beyond a certain size, we should not return it to the pool because we don't want to keep a huge impact on the heap.
- Instead of wrapping the BAOS in a new DataOutputStream, the DataOutputStream should be pooled/recycled as well. Since it must create an internal byte for the writeUTF method, this can save a significant amount of garbage.
- Avoid calling ByteArrayOutputStream.toByteArray(). We can instead just use ByteArrayOutputStream.writeTo(OutputStream). This avoids both allocating that new array/copying the data, and the GC overhead.