Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-15440

Run "nodetool repair -pr" concurrently

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Normal
    • Resolution: Feedback Received
    • None
    • Tool/nodetool
    • None
    • All
    • None

    Description

      Running "nodetool repair -pr" on each node one by one is too slow.
      However, running the command on all nodes at the same time is more resource consuming because this can trigger more job threads on each node due to the token range overlap among the nodes.
      It will be faster if we can run "nodetool repair -pr" concurrently on multiple nodes without token range intersections(overlap).

      ************
      Say, the RF is 3, and we have A-Z nodes. For now, without this feature, we have to do below:

      1. Get the information of each primary nodes' token tranges according to the logs of running "nodetool repair -pr" on each node:
      RangeCollection_A primary node: nodeA
      RangeCollection_B primary node: nodeB
      RangeCollection_C primary node: nodeC
      ...

      2. Get the output of running "./nodetool describering prod_keyspace >> nodetool_describering_prod_keyspace.log":
      ...
      TokenRange(start_token:-1589028858003231727, end_token:-1586606433049008069, endpoints:[10.81.74.134, 10.81.74.132, 10.81.74.133], rpc_endpoints:[10.81.74.134, 10.81.74.132, 10.81.74.133], endpoint_details:[EndpointDetails(host:10.81.74.134, datacenter:hk, rack:1c), EndpointDetails(host:10.81.74.132, datacenter:hk, rack:1a), EndpointDetails(host:10.81.74.133, datacenter:hk, rack:1b)])
      ...

      3. Calculate the overlap of the token ranges among all the nodes.
      For example,
      RangeCollection_N is stored on nodeN(primary node), nodeO, nodeP
      RangeCollection_O is stored on nodeO(primary node), nodeP, nodeQ
      RangeCollection_P is stored on nodeP(primary node), nodeQ, nodeR
      RangeCollection_Q is stored on nodeQ(primary node), nodeR, nodeS
      RangeCollection_R is stored on nodeR(primary node), nodeS, nodeT
      RangeCollection_S is stored on nodeS(primary node), nodeT, nodeU

      4. Then according to the intersections we figure out, we can find a schedule to make sure there is only one job thread running on each nodes.
      For example, the command can be run in the following 3 rounds:
      1st round: run the command on nodeN and nodeQ at the same time.
      2nd round: run the command on nodeO and nodeR at the same time.
      3rd round: run the command on nodeP and nodeS at the same time.

      Attachments

        Activity

          People

            Unassigned Unassigned
            xiangwang xiangwang
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: