Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Invalid
-
None
-
None
-
None
-
None
-
Release Notes Required
Description
Proposal:
Do not check failed node second time. Double node checking prolongs node failure detection and gives no additional benefits. There are mesh and hardcoded values in this routine.
For the present, we have double checking of node availability. Let's imagine node 2 doesn't answer any more. Node 1 becomes unable to ping node 2 and asks Node 3 to establish permanent connection instead of node 2. Node 3 may try to check node 2 too. Or may not.
Possible long detection of node failure up to ServerImpl.CON_CHECK_INTERVAL + 2 * IgniteConfiguretion.failureDetectionTimeout + 300ms.
See:
- ‘NodeFailureResearch.patch'. It creates test 'FailureDetectionResearch' which emulates long answears on a failed node and measures failure detection delays.
- 'FailureDetectionResearch.txt' - results of the test.
- 'WostCaseStepByStep.txt' - description how the worst case happens.
Attachments
Attachments
Issue Links
- causes
-
IGNITE-13111 Simplify backward checking of node connection.
- Closed
- depends upon
-
IGNITE-13016 Fix backward checking of failed node.
- Resolved
- is duplicated by
-
IGNITE-13018 Get rid of duplicated checking of failed node.
- Resolved
- links to
1.
|
Fix backward checking of failed node. | Resolved | Vladimir Steshin |
|
||||||||
2.
|
Remove hardcoded delay from re-marking failed node as alive. | Resolved | Vladimir Steshin |
|
||||||||
3.
|
Get rid of duplicated checking of failed node. | Resolved | Vladimir Steshin |