TBinaryProtocol has a member buffer for reading and for serializing each of the integer types. It can get by with only a single one: 8 bytes long for the maximum length integer. This causes a significant reduction in allocations and GC in cases that create the TBinaryProtocol to serialize a single message, instead of reusing it.
This passes "ant test". I ran the existing SerializationBenchmark and there is no statistically significant difference (the average of 5 runs is smaller on my machine, but I don't think we should read much into that).
I have a JMH benchmark for this, and it shows that when the TBinaryProtocol is allocated each time, this change is better, but is more or less the same when you reuse the TBinaryProtocol in a loop.