I am trying to deploy a Highly Available Nimbus using Docker. At the moment I am only deploying two services (nimbus-1 and nimbus-2), so the configuration file for Storm includes the following parameter: nimbus.seeds: [nimbus-1, nimbus-2]
The issue comes when the first of the services (nimbus-1) is down. For example trying to deploy a topology from nimbus-2 could take like 15 minutes. I have checked the code and it is because it loops through all nimbus.seeds hosts in order to check which one is the leader. And for each loop it tries to create a new NimbusClient (therefore a new ThriftClient) but always passing null as the timeout for the created socket. So it tries to connect to the host until a ConnectionTimeout is reached. Modifying the parameter storm.thrift.socket.timeout.ms does not change the socket timeout.
I think that the ThriftClient should also use the thrift socket timeout parameter (storm.thrift.socket.timeout.ms) just the same as the ThriftServer (or the transport plugin used in the communication) which was implemented in the Story 2254.
(This is my first issue + pull request, so sorry if something is wrong)