[IGNITE-13014] Remove double checking of node availability. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Invalid
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
None

Ignite Flags:

Release Notes Required

Description

Proposal:
Do not check failed node second time. Double node checking prolongs node failure detection and gives no additional benefits. There are mesh and hardcoded values in this routine.

For the present, we have double checking of node availability. Let's imagine node 2 doesn't answer any more. Node 1 becomes unable to ping node 2 and asks Node 3 to establish permanent connection instead of node 2. Node 3 may try to check node 2 too. Or may not.

Possible long detection of node failure up to ServerImpl.CON_CHECK_INTERVAL + 2 * IgniteConfiguretion.failureDetectionTimeout + 300ms.

See:

‘NodeFailureResearch.patch'. It creates test 'FailureDetectionResearch' which emulates long answears on a failed node and measures failure detection delays.
'FailureDetectionResearch.txt' - results of the test.
'WostCaseStepByStep.txt' - description how the worst case happens.