[HBASE-17039] SimpleLoadBalancer schedules large amount of invalid region moves - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 1.3.0, 1.1.7, 1.2.4, 2.0.0
Fix Version/s: 1.4.0, 1.2.5, 1.1.8, 2.0.0
Component/s: Balancer
Labels:
None

Hadoop Flags:

Reviewed

Description

After increasing one of our clusters to 1600 nodes, we observed a large amount of invalid region moves(more than 30k moves) fired by the balance chore. Thus we simulated the problem and printed out the balance plan, only to find out many servers that had two regions for a certain table(we use by table strategy), sent out both regions to other two servers that have zero region.
In the SimpleLoadBalancer's balanceCluster function,
the code block that determines the underLoadedServers might have a problem:

      if (load >= min && load > 0) {
        continue; // look for other servers which haven't reached min
      }
      int regionsToPut = min - load;
      if (regionsToPut == 0)
      {
        regionsToPut = 1;
      }

if min is zero, some server that has load of zero, which equals to min would be marked as underloaded, which would cause the phenomenon mentioned above.
Since we increased the cluster's size to 1600+, many tables that only have 1000 regions, now would encounter such issue.
By fixing it up, the balance plan went back to normal.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HBASE-17039.patch
07/Nov/16 05:43
1.0 kB
Charlie Qiangeng Xu

Issue Links

is related to

HBASE-17059 backport HBASE-17039 to 1.3.1

Closed

Activity

People

Assignee:: Charlie Qiangeng Xu

Reporter:: Charlie Qiangeng Xu

Votes:: 0 Vote for this issue

Watchers:: 12 Start watching this issue

Dates

Created:: 07/Nov/16 04:11

Updated:: 21/Mar/17 11:17

Resolved:: 11/Nov/16 02:54