Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-14192

netstats information mismatch between senders and receivers

    XMLWordPrintableJSON

Details

    • Low

    Description

      When adding a new node to an existing cluster, the netstats command called while the node is joining show different statistic values between the node receiving the data and the nodes sending the data.

      Receiving node:

      Mode: JOINING
      Bootstrap 0a599bf0-01c5-11e8-a256-8d847377f816
          /172.20.13.184
          /172.20.30.7
              Receiving 433 files, 36.64 GiB total. Already received 88 files, 4.6 GiB total
                  [...]
          /172.20.40.128
          /172.20.16.45
              Receiving 405 files, 38.3 GiB total. Already received 86 files, 6.02 GiB total
                  [...]
          /172.20.9.63
      Read Repair Statistics:
      Attempted: 0
      Mismatch (Blocking): 0
      Mismatch (Background): 0
      Pool Name                    Active   Pending      Completed   Dropped
      Large messages                  n/a         0              0         0
      Small messages                  n/a         0          11121         0
      Gossip messages                 n/a         0          32690         0
      

      Sending node 1:

      Mode: NORMAL
      Bootstrap 0a599bf0-01c5-11e8-a256-8d847377f816
          /172.20.21.19
              Sending 433 files, 36.64 GiB total. Already sent 433 files, 36.64 GiB total
                  [...]
      Read Repair Statistics:
      Attempted: 680832
      Mismatch (Blocking): 716
      Mismatch (Background): 279
      Pool Name                    Active   Pending      Completed   Dropped
      Large messages                  n/a         2         123307         4
      Small messages                  n/a         2      637010302       509
      Gossip messages                 n/a        23         798851     11535
      

      Sending node 2:

      Mode: NORMAL
      Bootstrap 0a599bf0-01c5-11e8-a256-8d847377f816
          /172.20.21.19
              Sending 405 files, 38.3 GiB total. Already sent 405 files, 38.3 GiB total
                  [...]
      Read Repair Statistics:
      Attempted: 84967
      Mismatch (Blocking): 17568
      Mismatch (Background): 3078
      Pool Name                    Active   Pending      Completed   Dropped
      Large messages                  n/a         2          17818         2
      Small messages                  n/a         2      126082304       507
      Gossip messages                 n/a        34         202810     11725
      

      In this case, the join process is running since a while and the sending nodes seem to say they sent everything already. This output stays the same for a while though (maybe ~15% of the total joining time).

      However, the receiving node values stay like this once the sending nodes have sent everything, until it goes from this state to the NORMAL state (so there's visually no catching up from ~86 files to ~405 files for example, it goes directly from the state showed above to NORMAL)

      This makes tracking the progress of the join process a bit more difficult than needed, because we need to compare and deduce the actual state from both the receiving node values and the sending nodes values, which are both "not correct" (sending nodes say everything has been sent but stays in this state for a long time, receiving node says it still needs to download lot of files/data before finishing.)

      Attachments

        Activity

          People

            VincentWhite Vincent White
            multani Jonathan Ballet
            Vincent White
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: