Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-3390

add new feature auto leader rebalancer

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.17.0
    • None
    • None

    Description

      The origin jira: https://issues.apache.org/jira/browse/KUDU-3061, and I create a new Jira issus to record some infomations.

       

       

      Motivation

      The number of leader replicas per tablet server can become imbalanced over time, which lead to load skew on some nodes.

      Two reasons of load skew:

      • The main reason. Scan Requests has two modes: LeaderOnly(default) and CLOSEST_REPLICA. For more accurate results, users will choose the LeaderOnly(default) mode. Mostly, the scan load is positive correlation with leader numbers.
      • The other reason. Write requests, leaders receive write requests and followers receive appendEntries(kudu is UpdateConsensus), the flow of processing is a little different, which is hidden variables, maybe cause imbalanced load. Leader rebalance will make leader and followers balanced and eliminate hidden variables and make service more stable.

      To deal with the situation, now users can use kudu CLI leader_step_down command and write a script program to rebalance the leaders. SREs should make the rebalance script run periodically.

       

      In our application situation, We have more than 1500+ kudu clusters and more and more kudu cluster will be deployed, so it's hard that SREs maintenance the rebalance script tasks.

      kudu has the auto rebalance and has no auto leader rebalance,

      We can do better. Leader kudu-master can do leader rebalance automatically.

      Solution

      We can add an auto leader rebalance task to avoid leader replicas skew. Running a periodic task do leader rebalance at kudu-master.

      Leader rebalance only do leader transfer, do not copy replicas. The basic idea is every tserver leaders' number : replicas' number = 1 : (replica_refactor - 1). This is the argrithms.

      If we need leader rebalance, we'd better enable replicas rebalancer. If enable leader rebalancer but disable auto rebalancer the algorithm work well but the effect is not good. The algorithm can be convergence, and the algorithm's target is every tserver' replicas, number of leader : number of follower is 1 : (replica_refactor -1).

      Leader Rebalance results

      I do some experiments for the effective. I have a cluster, 3 machines: 3 master instances and 3 tserver instances.

      I create a table with 40 tablets(partitions) and 3 replica_factor. And load a lots of data (40000000 records).

      I disabled the leader rebalance function, and manually leader transfer all tablets to a tserver and run writes and scans.

      Then I enabled the the leader rebalance function and runs scans. The workload as below:

      The Scan command: ./kudu_tools/kudu perf table_scan $master_list Student -columns=id,name,brief,age,score -num_threads=4 -nofill_cache -replica_selection="LEADER"

       

      40: 0: 0  means node1 : node2: node3

      47%, 18%, 19% means node1 : node2: node3

       

        leader ratio scan cost cpu usage memory io
      before leader rebalance 40: 0: 0 811.586 s 47%, 18%, 19% no changes 102MB/s ioutil:55%, 8KB/s ioutil:2%, 64KB/s ioutil:3%
      after leader rebalance 13: 14: 13 611.012 s 39%, 45%, 35% no changes 53MB/s ioutil:31%, 80MB/s ioutil:18%, 45MB/s ioutil:24%

      Attachments

        Issue Links

          Activity

            People

              shenxingwuying Yuqi Du
              shenxingwuying Yuqi Du
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: