Internal Impala network connections between nodes for query execution are not multiplexed. This means as the number of queries increase the number of network connections increases between Impala executors. With higher #nodes, the combination of query bursts and number of executors can lead to lots of new connections attempts. For example, a query with 10+joins on a 100-node cluster could require 1000+ connections simultaneously on coordinator. When the spike is too high or if there is not sufficient CPU available to handle the bursts, this causes connection failures.
The total number of connections does not seem to be the issue, but there is currently a practical limit on the number of simultaneous new concurrent connection TCP request spikes at once.
Impala caches backend connections and reuse them later. With cache, the simultaneous spikes of new connection request is only those above previous established maximum.