[IGNITE-13663] Represent in the documenttion affection of several node addresses on failure detection v2. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 2.7.6, 2.9, 2.8.1
Fix Version/s: 2.10
Component/s: documentation
Labels:
- iep-45

Release Note:
Merged to the master and 2.9 docs.

Description

We should document that TcpDiscoverySpi prolongs detection of node failure if node has several addresses.

By default, all available addresses are assigned to node and node listens any address (0.0.0.0). Not first non-loopback addresses as the documentation says. Simple example on my ordinary Mac having WiFi, VPN and docker (from Ignite log): `Local node addresses: [192.168.1.42/0:0:0:0:0:0:0:1%lo0, /127.0.0.1, /192.168.1.42, /10.11.220.206]`.
It is cleary seen that `ServerImpl.TcpServer.srvrSock` binds to '0.0.0.0'.

And actual failure detection and connection restoring delay is: `failureDetectionTimeout * addresses_number + connRecoveryTimeout`. Which is usually unexpectable. This peculiarity was unearthed in [1], [2] and additionally confirmed in ducktape integration test [3].

To avoid this, user should assign `IgniteConfiguration.localHost` or `TcpDiscoverySpi.localAddress`. Unfortunately, users frequently skip this setting and allow node to activate all available IPs.

Often, middleware runs in environments with several IP addresses (virtualizations, containers, different networks). Node sends all obtained addresses with other node info to the cluster. Connection to node is established to first of its addresses. But if lost, other addresses are attempted to reconnect sequentially. If addresses do not belong to assumed node network, do not represent existing physical connection, processing them is just waste of time.

[1] https://issues.apache.org/jira/browse/IGNITE-13012
[2] https://issues.apache.org/jira/browse/IGNITE-13134
[3] https://github.com/apache/ignite/blob/ignite-ducktape/modules/ducktests/tests/ignitetest/tests/discovery_test.py

Attachments

Issue Links

is caused by

IGNITE-13206 Represent in the documenttion affection of several node addresses on failure detection.

Closed

relates to

IGNITE-13205 Represent in logs, javadoc affection of several node addresses on failure detection.

Resolved

links to

GitHub Pull Request #8424

Activity

People

Assignee:: Denis A. Magda

Reporter:: Vladimir Steshin

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 03/Nov/20 14:01

Updated:: 19/Nov/20 00:15

Resolved:: 19/Nov/20 00:15

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

1h 20m