Description
Main description:
Cluster nodes connect a ring.
For example: we have 6 nodes: A, B, C, D, E, F.
They can connect a ring in any possible way: A-B-C-D-E-F-A, or A-F-B-E-C-D-A, etc.
If some node leaves topology, adjacent nodes must reconnect.
If nodes A, B, C are in same physical place, nodes D, E, F are in other place, and places lost connect each other, we will have many ways of reconnections.
At best case, if we had a ring: A-B-CxD-E-FxA ('x' means disconnect) – then we have only one reconnect (C
will be connected to A or F will be connected to D – depends on what part of the cluster was alive.
Also, if we had a not ring: AxFxBxExCxDxA – then we have a lot of reconnections (A to B, B to C, C to A – in general n/2 reconnections, where n – number of nodes).
Approach:
It is necessary to develop approach of node insertion to the correct place for creation of the correct ring-topology.
Solutions:
Main idea is a sorting according to latency.
- group nodes in arcs on an ARC_ID. (manualy?)
- implement NodeComparator (nodes on the same host : nodes on the same subnet : other nodes). We will use it when we connect a new node.
- dev list thread
Update Dec, 29 Yakov Zhdanov:
- introduce CLUSTER_REGION_ID node attribute. This can be done by adding public static final constant to TcpDiscoverySpi.
- Alter org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing#nextNode(java.util.Collection<org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNode>) to order basing on per node attribute value
- Node comparison should be stable and consistent. E.g. if CLUSTER_REGION_IDs are equal then we should compare nodes' IDs. This way we have consistent order on all nodes in topology.
- Also nextNode() has to group nodes on same host and in same subnet. This can be postponed and implemented after we have other points done.
Attachments
Issue Links
- contains
-
IGNITE-4499 TcpDiscoverySpi is not reliable in some network split scenarios.
- Resolved
- links to