I have a cluster of 3 nodes. When the nodes are under heavy load, I notice that API requests sometimes complete in 10's to 100's of milliseconds but sometimes take several seconds (8+ seconds, at times).
After doing some investigation, it appears to be due to the fact that the default thread pool size of 10 is not sufficient anymore. In the 0.x baseline, it was okay because each time that a user clicks "Refresh" on the UI it was a single request. With the 1.x baseline, this results in 4 separate requests fired off simultaneously due to the multi-tenancy features added. As a result, these 4 requests need to be replicated to 3 nodes each, which is 12 web requests that have to occur. So even a simple Refresh on the UI cannot be fully done in parallel.
Changing my pool size from 10 to 30 resulted in far more consistent response times. Unfortunately, scaling the thread pool up to a large number of threads can have its cons, too. So will create a "cached" thread pool and expose properties for the "core pool size" and the "max pool size".