[IGNITE-9738] Client node can suddenly fail on start - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.8
Component/s: None
Labels:
None

Description

If client joining to large topology it can to spend some time on waiting TcpDiscoveryNodeAddFinishedMessage, but in that time it can not to send TcpDiscoveryClientMetricsUpdateMessage. By that reason server can to reset client from topology.

Client node considered as unreachable and will be dropped from cluster, because no metrics update messages received in interval: TcpDiscoverySpi.clientFailureDetectionTimeout() ms. It may be caused by network problems or long GC pause on client node, try to increase this parameter. [nodeId=a3493895-7c13-403c-bf0d-94eab0000011, clientFailureDetectionTimeout=30000];

We should to sent TcpDiscoveryClientMetricsUpdateMessage as soon as possible without, waiting finish of join procedure.

Attachments

Issue Links

links to

GitHub Pull Request #4968

MTCGA

Activity

People

Assignee:: Vladislav Pyatkov

Reporter:: Vladislav Pyatkov

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 28/Sep/18 14:58

Updated:: 22/Oct/18 19:20

Resolved:: 22/Oct/18 15:20