The JDK TPE creates all the threads in the pool. As a consequence, we create (by default) 256 threads even if we just need a few.
The attached TPE create threads only if we have something in the queue.
On a PE test with replica on, it improved the 99 latency percentile by 5%.
Warning: there are likely some race conditions, but I'm posting it here because there is may be an implementation available somewhere we can use, or a good reason not to do that. So feedback welcome as usual.