Description
The discovery ducktape test [1] has detected unexpected failure of another node.
Scenario:
The nodes have relative places in the ring: N and N+1. Node N detects failure of node N+1. Node N tries to connect to node N+2. Node N+2 checks backward connection to node N+1.
Problem:
Node N can fail too.
Cause:
The timeout on node N to recover connection to node N+2 appears shorter than timeout on node N+2 to check connection to N+1.
Fix:
Introduced a fundamental timeout value to check/recover connection based on current configuration. The mentioned timeouts have been turned relative. The timeout of backward connection check is now generally shorter than the timeout to recover connection.
Attachments
Issue Links
- links to