Details
-
Bug
-
Status: Resolved
-
Low
-
Resolution: Resolved
-
None
-
Low
Description
When adding a new node to an existing cluster, the netstats command called while the node is joining show different statistic values between the node receiving the data and the nodes sending the data.
Receiving node:
Mode: JOINING Bootstrap 0a599bf0-01c5-11e8-a256-8d847377f816 /172.20.13.184 /172.20.30.7 Receiving 433 files, 36.64 GiB total. Already received 88 files, 4.6 GiB total [...] /172.20.40.128 /172.20.16.45 Receiving 405 files, 38.3 GiB total. Already received 86 files, 6.02 GiB total [...] /172.20.9.63 Read Repair Statistics: Attempted: 0 Mismatch (Blocking): 0 Mismatch (Background): 0 Pool Name Active Pending Completed Dropped Large messages n/a 0 0 0 Small messages n/a 0 11121 0 Gossip messages n/a 0 32690 0
Sending node 1:
Mode: NORMAL Bootstrap 0a599bf0-01c5-11e8-a256-8d847377f816 /172.20.21.19 Sending 433 files, 36.64 GiB total. Already sent 433 files, 36.64 GiB total [...] Read Repair Statistics: Attempted: 680832 Mismatch (Blocking): 716 Mismatch (Background): 279 Pool Name Active Pending Completed Dropped Large messages n/a 2 123307 4 Small messages n/a 2 637010302 509 Gossip messages n/a 23 798851 11535
Sending node 2:
Mode: NORMAL Bootstrap 0a599bf0-01c5-11e8-a256-8d847377f816 /172.20.21.19 Sending 405 files, 38.3 GiB total. Already sent 405 files, 38.3 GiB total [...] Read Repair Statistics: Attempted: 84967 Mismatch (Blocking): 17568 Mismatch (Background): 3078 Pool Name Active Pending Completed Dropped Large messages n/a 2 17818 2 Small messages n/a 2 126082304 507 Gossip messages n/a 34 202810 11725
In this case, the join process is running since a while and the sending nodes seem to say they sent everything already. This output stays the same for a while though (maybe ~15% of the total joining time).
However, the receiving node values stay like this once the sending nodes have sent everything, until it goes from this state to the NORMAL state (so there's visually no catching up from ~86 files to ~405 files for example, it goes directly from the state showed above to NORMAL)
This makes tracking the progress of the join process a bit more difficult than needed, because we need to compare and deduce the actual state from both the receiving node values and the sending nodes values, which are both "not correct" (sending nodes say everything has been sent but stays in this state for a long time, receiving node says it still needs to download lot of files/data before finishing.)