I'm working on the HCatalog project. HCatalog uses a (slightly dated) version of Hive that in turn depends on libthrift-0.5.0. The HCatalog-server is a continuously running process that serves (meta)data over thrift. (The bug I describe is related to HCATALOG-183.)
We observed that on running the HCatalog-server with continuous client-requests, the memory footprint of the server grows steadily, until we see an OutOfMemoryError exception. I took a memory snapshot of the running process, to check for leaks. I noticed that the majority of the memory (over 1.3GB) was being consumed by the org.apache.thrift.transport.TSaslServerTransport$Factory::transportMap. There were over 52000 instances of WeakHashMap$Entry, consuming 3MB of shallow-heap, and 1.3GB of retained heap.
I suspect that entries in the WeakHashMap (transportMap) are not being collected during GC, as is expected in code. That would only be so if there are outstanding hard-references to the key in the map (TTransport).
From the code in TSaslTransport and TSaslServerTransport, it appears that there is an inadvertent cyclic reference that the runtime is unable to detect:
1. TSaslTransport has a (hard) back-reference to the "underlyingTransport", i.e. TTransport.
2. TSaslServerTransport::Factory::transportMap is a WeakHashMap< TTransport, TSaslServerTransport >. Here, the "underlyingTransport" is mapped back to the decorating TSaslServerTransport.
From #2, an entry can only be GCed if there's no outstanding hard-reference to the TTransport. But from #1, the hard-reference comes from the value-part of the hashmap entry. The runtime can't deduce that there's a cycle, presumably because it's not explicit.
(I'll be attaching a sample program to better illustrate the WeakHashMap behaviour, in case I've botched the explanation above.)
The simple solution would be to change the back-reference in #1 into a WeakReference. I'll attach a patch here that might be suitable.