Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
0.9.0.1, 0.10.2.0
-
None
Description
In our test env, QA hang one of the connecting broker of the producer, then the producer will be stuck in send method, and throw the exception: fail to update metadata after request timeout.
I found the reason as follow: when the producer chose one of the broker to send metadata, it connect to the broker, but the broker is hang, the tcp is connected and Network client marks this broker is connected, but the SSL channel is not ready yet so the channel is not ready.
Then the Network client chooses the connected node in the leastLoadedNode every time to send the metadata, but the node's channel is not ready yet.
So the producer stuck in getting metadata and will not try another node to request metadata. The client should not stuck only one node is hung
Attachments
Issue Links
- links to