This started as an internal hotfix where we found out that the Thrift server spawned 15000 threads. To bound the thread pool size I added a custom thread pool server implementation called HBaseThreadPoolServer into HBase codebase, and made the following parameters configurable from both command line and as config settings: minWorkerThreads, maxWorkerThreads, and maxQueuedRequests. Under an increasing load, the server creates new threads for every connection before the pool size reaches minWorkerThreads. After that, the server puts new connections into the queue and only creates a new thread when the queue is full. If an attempt to create a new thread fails, the server drops connection. The default TThreadPoolServer would crash in that case, but it never happened because the thread pool was unbounded, so the server would hang indefinitely, consume a lot of memory, and cause huge latency spikes on the client side.
Another part of this fix is refactoring and unit testing of the command-line part of the Thrift server. The logic there is sufficiently complicated, and the existing ThriftServer class does not test that part at all. The new TestThriftServerCmdLine test starts the Thrift server on a random port with various combinations of options and talks to it through the client API from another thread.