Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-1017

Region balancing does not bring newly added node within acceptable range

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 0.19.0
    • 0.20.0
    • None
    • None

    Description

      With a 10 node cluster, there were only 9 online nodes. With about 215 total regions, each of the 9 had around 24 regions (average load is 24). Slop is 10% so 22 to 26 is the acceptable range.

      Starting up the 10th node, master log showed:

      2008-11-21 15:57:51,521 INFO org.apache.hadoop.hbase.master.ServerManager: Received start message from: 72.34.249.210:60020
      2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server 72.34.249.219:60020 is overloaded. Server load: 25 avg: 22.0, slop: 0.1
      2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Choosing to reassign 3 regions. mostLoadedRegions has 10 regions in it.
      2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^AH�;,1225411051632
      2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@�Ý,1225411056686
      2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region groups,,1222913580957
      2008-11-21 15:57:53,975 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server 72.34.249.213:60020 is overloaded. Server load: 25 avg: 22.0, slop: 0.1
      2008-11-21 15:57:53,975 DEBUG org.apache.hadoop.hbase.master.RegionManager: Choosing to reassign 3 regions. mostLoadedRegions has 10 regions in it.
      2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region upgrade,,1226892014784
      2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@3^Z�,1225411056701
      2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region streamitems,^@^@^@^@^@         ^L,1225411049042
      

      The new regionserver received only 6 regions. This happened because when the 10th came in, average load dropped to 22. This caused two servers with 25 regions (acceptable when avg was 24 but not now) to reassign 3 of their regions each to bring them back down to the average. Unfortunately all other regions remained within the 10% slop (20 to 24) so they were not overloaded and thus did not reassign off any regions. It was only chance that made even 6 of the regions get reassigned as there could have been exactly 24 on each server, in which case none would have been assigned to the new node.

      This will behave worse on larger clusters when adding a new node has little impact on the avg load/server.

      Attachments

        1. loadbalance2.0.patch
          23 kB
          Evgeny Ryabitskiy
        2. HBASE-1017_v9.patch
          37 kB
          Evgeny Ryabitskiy
        3. HBASE-1017_v8.patch
          35 kB
          Evgeny Ryabitskiy
        4. HBASE-1017_v7.patch
          35 kB
          Evgeny Ryabitskiy
        5. HBASE-1017_v6.patch
          35 kB
          Evgeny Ryabitskiy
        6. HBASE-1017_v5.patch
          37 kB
          Evgeny Ryabitskiy
        7. HBASE-1017_v4.patch
          34 kB
          Evgeny Ryabitskiy
        8. HBASE-1017_v2.patch
          26 kB
          Evgeny Ryabitskiy
        9. HBASE-1017_v12_FINAL.patch
          12 kB
          Evgeny Ryabitskiy
        10. HBASE-1017_v11_FINAL.patch
          10 kB
          Evgeny Ryabitskiy
        11. HBASE-1017_v10.patch
          37 kB
          Evgeny Ryabitskiy
        12. HBASE-1017_v1.patch
          4 kB
          Evgeny Ryabitskiy

        Issue Links

          There are no Sub-Tasks for this issue.

          Activity

            People

              apparition Evgeny Ryabitskiy
              streamy Jonathan Gray
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: