We (AWS SDK for Java) have been investigating reports of poor performance in the SDK and have narrowed it down to thread contention issues in PoolingHttpClientConnectionManager. Up to a certain TPS, performance is great and their is no issue. After a certain TPS (approx 8000 in our load tests), performance tanks hard and most threads end up stuck waiting on a lock in AbstractConnPool (in either lease or releaseConnection).
This quickly locks up the application as it tries to meet the incoming TPS. We have been able to workaround this and achieve much higher throughput but having multiple SDK clients and round robin selecting them to hand off to threads. This allowed us to easily scale up to 16, 000 TPS. We wanted to open up a dialog with the maintainers of the Apache HTTP client to see if this is a known issue/limitation and what options we have for getting around it. We aren’t opposed to re-implementing the connection manager to be more performant but since it’s a pretty sizable chunk of work we wanted to ensure that’s the best path forward before proceeding.