1. Cassandra
  2. CASSANDRA-1037

Improve load balancing to take into account load in terms of operations


    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Won't Fix
    • Fix Version/s: None
    • Component/s: None
    • Labels:


      Currently in cassandra, the load balancing takes into account disk space. When using an order-preserving partitioner, there can be hot spots in the various ranges of tokens in terms of operations. We would like to propose improving the load balancing so that it takes that the number of operations into account.

      There are two places where this can be handled:

      1. when the cluster decides on which nodes need to be balanced out.
      2. how to balance an individual node - where to split

      For number 1, the number of operations that a node performed could be factored in to how important it is to balance that node.

      For number 2, we are already using a midpoint in the node when trying to load balance with respect to space. We would propose adding a weight to the midpoint to lean towards splitting so that the operational load could be better handled, not just space.


        Gavin made changes -
        Workflow patch-available, re-open possible [ 12752224 ] reopen-resolved, no closed status, patch-avail, testing [ 12755248 ]
        Gavin made changes -
        Workflow no-reopen-closed, patch-avail [ 12509703 ] patch-available, re-open possible [ 12752224 ]
        Jonathan Ellis made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Fix Version/s 0.8 [ 12314820 ]
        Resolution Won't Fix [ 2 ]
        Jonathan Ellis made changes -
        Fix Version/s 0.8 [ 12314820 ]
        Fix Version/s 0.7 [ 12314533 ]
        Jeremy Hanna made changes -
        Fix Version/s 0.7 [ 12314533 ]
        Jeremy Hanna made changes -
        Field Original Value New Value
        Comment [ This could also take into account any number of variables - like memory usage. The number of operations performed just seemed to be the most logical to start with. Another metric suggested by Eric was memory. The decision about which node needs to be balanced (1) as well as how to balance (2) could be an aggregate of a few metrics - space, ops, and memory, for example. ]
        Jeremy Hanna created issue -


          • Assignee:
            Jeremy Hanna
          • Votes:
            1 Vote for this issue
            4 Start watching this issue


            • Created: