When load-balanced connections are used, we have a few different properties that we can configure. Specifically, the properties with their default values are:
If the max thread count is below the number of connections per node * number of nodes in the cluster, everything still works well when there are reasonably high data volumes across all connections that are load-balanced. However, if one of the connections has a low data volume, we can get into a situation where the load balanced connections stop pushing data for some period of time, typically approximately some multiple of the "comms.timeout" property.
This appears to be due to the fact that the server is using Socket IO and not NIO and once data has been received, it will check if more data is available. If it does not receive any indication for some period of time, it will time out. Only then does it add the socket connection back to a pool of connections to read from. This means that the thread can be stuck, waiting to receive more data, and blocking any progress from other connections on that thread.